To assess corrosion growth rates, service providers typically match recent in-line inspection (ILI) results with ILI findings obtained a few years earlier. Various algorithms can be employed to automate this painstaking process. However, matching accuracy tends to degrade with the increasing number of reported anomalies per pipeline joint and their geometric complexity. Further improvements are made by manual visual validation when signal data from both inspections are available. In a scenario where at least one inspection outcome is present in spreadsheet form, validation is complicated by the lack of visualization. Thus, the accuracy of the matching algorithm becomes of paramount importance.
This paper investigates the feasibility of a K-nearest neighbors (KNN) classifier to accomplish the anomaly matching task. KNN is one of the simplest forms of supervised machine learning (ML). To classify a new data entry with a set of predictor variables similar to what a number of known records have, distances between the new entry and the known records are computed using the predictors. Then, a majority class among the K-nearest records is assigned to the new entry. In the context of matching the findings of two in-line inspections, anomalies reported in the recent inspection can be conceptualized as known records and anomalies from the previous ILI as new data entries to be classified. For each anomaly from the previous inspection, the “closest” one from the recent inspection is found to make a match.
This paper demonstrates how various model configurations and data preparation techniques impact the matching accuracy deriving the most influential predictors. The accuracy is measured against manually validated matches. The main outcomes of the study are the conclusion on the overall feasibility of the method as well as anomaly matching accuracies that can be reasonably expected depending on available data.