Abstract

Defect detection and localization are key to preventing environmentally damaging wellbore leakages in both geothermal and oil/gas applications. In this study, a multistep, machine learning approach is used to localize two types of thermal defects within a wellbore model. This approach includes a comsol heat transfer simulation to generate base data, a neural network to classify defect orientations, and a localization algorithm to synthesize sensor estimations into a predicted location. A small-scale physical wellbore test bed was created to verify the approach using experimental data. The classification and localization results were quantified using these experimental data. The classification predicted all experimental defect orientations correctly. The localization algorithm predicted the defect location with an average root-mean-square error of 1.49 in. The core contributions of this study are as follows: (1) the overall localization architecture, (2) the use of centroid-guided mean-shift clustering for localization, and (3) the experimental validation and quantification of performance.

1 Introduction

The integrity of a geothermal or oil well can degrade throughout its lifetime. Failure of casing or cement can be damaging to the environment, especially in the case of oil and gas. For example, production fluid can leach into the surrounding rock and water sources [1]. As of 2018, there were 982,000 wells used for oil and gas production in the United States. This represents a substantial decrease from the 1,035,000 that were used in 2014 [2]. Of the estimated 3.7 million wells that have been drilled in the United States, this means that approximately two-third wells have been abandoned, many of which were improperly sealed or documented [3]. There are many ways for wells to fail and leak, even after being decommissioned. Examples of different failure modes are shown in Fig. 1.

Fig. 1
A wellbore and some of its possible failure methods. Unbolded labels are the components of the wellbore, while the bolded labels are the failure methods.
Fig. 1
A wellbore and some of its possible failure methods. Unbolded labels are the components of the wellbore, while the bolded labels are the failure methods.
Close modal

The main failure modes demonstrated in this figure are poor bonding between wellbore components, leak paths from overly porous cement, and fracturing of wellbore components. Delamination, or debonding, can occur due to corrosion from well gases, fatigue from internal well pressure, or improper cement composition. Corrosion combined with mechanical stresses can cause casing fractures. Poor cement composition and carbonization can cause corrosion and leak pathways through cement [4]. Breaches of wellbore integrity in geothermal wells can cause leaks of toxic gas such as hydrogen sulfide [5,6]. Leaks in high-temperature and pressure geothermal wells can also cause other safety concerns such as steam eruptions [7,8]. Mitigation measures exist and can reduce risk to both water sources and the surrounding environment. However, rapid and targeted mitigation requires the ability to localize defects. Previous efforts to localize defects utilize defect properties ranging from visual/olfactory effects, acoustic properties, and thermal properties [9]. For exposed pipelines, soap bubble screening can be useful for localizing small leaks. Inspectors can use bubbles from a soap solution, which appear at leak locations, to identify where connections need to be reinforced [10].

Olfactory sensing is performed using trained canines. These animals can provide leak localization in buried pipelines. Canines have sensitive olfactory systems, which enable them to detect trace concentrations of compounds associated with wellbore integrity breaches [9,11].

Acoustic sensing involves applying networks of sensors in and around a well to detect the sounds of leaking. Brodetsky and Savic used the acoustic signal from a simulated pipeline leak to create a pattern recognition system capable of distinguishing the sound of a leak from background noise. Sensor networks communicate with a central module and compare it to a training set using a kNN classifier to locate leaks and their approximate locations [12]. In a non-wellbore setting, the sound energy from an array of microsensors can be used to generate hyperspheres, whose intersections in the absence of noise can localize a signal. By using the ratios of sound energy between sensors, an approximate sensor-to-target distance can be found [13].

Thermal sensing leverages temperature differences between the fluid and the surrounding formation. When fluid of a different temperature from the surrounding formation flows through a defect, a thermal signature can be detected. The flowing fluid causes heat transfer between the formation and the fluid. This means that changes in temperature can be used to detect, localize, and measure such defects. Past research has studied how to use temperature measurements to predict location and magnitude of a thermal defect. Specifically, distributed temperature sensing uses fiber optic cabling and a laser pulse to detect changes in temperature [14]. Fiber optic cable is either fed inside or adjacent to the production tubing of the well. Laser pulse backscatter can be analyzed to determine the distance of a thermal difference, while specific signals can be used to determine the temperature of a defect. This system can be outfitted to new and existing wells and uses the fiber optic cabling as both a sensing and transmission system. However, the high-speed nature of laser pulses results in immense data generation [15]. Other temperature measurement techniques such as thermistors and thermocouples are low cost, robust, and small, and require less data processing. Thermistors and other thermal sensors have been used as probes for upper surface hydrogeologic studies and have been discussed in patents for wellbore integrity [16,17].

The approach discussed in this article combines temperature measurements with machine learning methods to localize thermal defects. The process consists of two steps. First, a neural network-based classifier determines defect geometry based on 30 sensor readings. The 30 sensor readings are then used to estimate the defect location. This cannot be done analytically. Therefore, we use a data-driven approach that compares the sensor reading against a large data set of simulated defects (“base data”). This comparison uses a k-nearest neighbors search and centroid-guided mean-shift clustering to calculate an estimate.

The core contributions of this study are as follows: (1) utilizing a two-stage machine learning architecture to classify and then localize thermal defects, (2) training and developing a centroid-assisted mean-shift clustering algorithm, and (3) experimental quantification of performance on a lab-scale wellbore.

This article begins by describing broad requirements for a wellbore defect detection system. We then illustrate how our method addresses these requirements. Next, we describe our overall approach and our algorithms. We follow with a discussion of our system model, as well as the generation of large amounts of synthetic data using finite element modeling. We conclude by describing a small-scale experimental test bed that illustrates the ability of our proposed approach to localize simple defects.

2 Functional Requirements

We seek a localization method that can determine the location of a range of relevant wellbore defects. The functional requirements that are essential for an effective wellbore defect localization strategy are as follows:

  1. Utilizes sensors that are physically robust.

  2. Does not require rapid sampling or substantial data processing.

  3. Can account for changes in defect magnitude.

  4. Restricts sensors to the casing due to installation issues.

For a defect detection method to be effective in a wellbore environment, it must be capable of being placed in an existing wellbore, be able to effectively localize defects using low bandwidth data, and be robust to different magnitudes of defect. It must be able to be placed in an existing wellbore due to the large number of currently existing wells. The capability to place monitors in nonactive wells is crucial, as this capability could prevent leaks from going unnoticed in situations where regular maintenance is not performed.

Low bandwidth data are ideal for the wellbore environment because it allows for simple communication methods between the sensors and a base station computer. In addition, low bandwidth data does not require rapid sampling and reduces the amount of processing required for a given data set. Data transfers can be done very quickly or in large batches at a time, due to smaller amounts of data necessary to convey the sensor readings.

In addition, a defect detection method should be robust to thermal magnitude. If a method cannot accommodate a large variety of working temperatures, it cannot be reasonably deployed in wellbores whose defects can vary widely.

Thermal sensors were chosen for the experiments described in this article due to their physically robust nature, their low power usage, and their capability to generate small amounts of data that do not require substantial processing. Because temperature signals do not change at high frequencies like acoustic signals do, low sampling rates can still capture the propagation of a thermal defect within the test bed created for this process.

On the basis of these functional requirements, we chose to develop a method that combines thermocouple measurements with a two-stage machine learning algorithm. Thermocouple measurements provide distributed temperature readings. The machine learning methods first classify the defect type and then use clustering methods to estimate the defect location. The clustering methods are needed due to limitations in simpler centroid methods. Our method compares the temperature reading for each sensor with a database of defects and their associated sensor temperatures. We use normalized temperatures for defect localization to handle defects of various magnitudes. Mean-shift clustering and the temperature centroid are used to determine the most common defect location based on all the sensors. Our proposed method can be extended to other discrete temperature sensors such as thermistors.

3 Algorithm Overview

It is difficult to design a single method that accounts for all defect geometries. Therefore, we developed a two-stage algorithm that first classifies the defect type and then uses a type-specific localization algorithm. For the scope of this work, we focused on two types of defects: vertical and horizontal line defects. The classification step is performed using a neural network, and the localization method uses mean-shift clustering across sensor layers. Different layers are used for vertical (layers are parallel to the circular face of the wellbore) and horizontal defects (layers are perpendicular to the circular face of the wellbore). Once the classification algorithm determines the defect geometry, the data are fed into the appropriate localization algorithm. A diagram of the data flow is shown in Fig. 2.

Fig. 2
Overview of the two-step defect localization process
Fig. 2
Overview of the two-step defect localization process
Close modal

3.1 Defect Classification.

The classification process begins with feature reduction. We parameterize each defect using ordinary least squares. Ordinary least squares is used to fit the sensor temperatures to their corresponding Cartesian coordinates. The input vector, rs, has 3 rows and 30 columns, where each column represents a different sensor and each row represents the x, y, and z coordinates, respectively. The output vector, Ts, is a 30 x 1 vector representing the normalized temperatures.

Throughout this work, we use normalized, as opposed to absolute, temperature measurements. This is done to remove the effect of defect magnitude and steady-state environmental conditions. First, we compare the sensor temperature against a baseline. For experimental purposes, this was the ambient temperature of the room, but this value could be the temperature of the surrounding rock formation in wellbore application. If ambient temperatures fluctuate (due to weather for example), then a change detection algorithm is also needed to identify when a defect has occurred. However, at depths greater than 20 m, the change in temperature due to weather is insignificant compared to the temperature gradient that naturally occurs with increasing depth [16].

Next we normalize the change in temperature by dividing by the sum of ΔT across all sensors. This calculation is shown in Eq. (1).
(1)
The output vector, Ts, is constructed from the normalized temperature for all n = 30 sensors.
The rs matrix and normalized Ts determine the values for θ, the coefficients of the ordinary least squares fit, using Eq. (2).
(2)

These θ values are used as the features for a three-layer multiclass pattern recognition neural network. A diagram of the defect classifier network is shown in Fig. 3.

Fig. 3
Overview of the two-step defect localization process
Fig. 3
Overview of the two-step defect localization process
Close modal

After passing through the seven hidden layer nodes, the neural net resolves the classification into two output nodes. The two output nodes, generated as a 1 x 2 vector, correspond to whether the neural network classifies the defect as horizontal or vertical. An output of [10] corresponds to a horizontal defect, while an output of [01] corresponds to a vertical defect. When this vector is t[10] or [01], the column with a magnitude above 0.75 is assumed to be the defect orientation. In application, if there is no column that meets this threshold, the orientation is considered inconclusive and should be observed further as the defect develops.

We used the matlab nprtool GUI to generate our neural network. This is a feed forward neural network using a sigmoid activation function for the hidden layer and a softmax activation function for the output layer, trained using a scaled conjugate gradient algorithm. In our case, we trained the neural network for both horizontal and vertical orientations using data generated from a series of finite element simulations at different locations and temperatures. The data generation is described in the following section.

After using simulated data outside of the training set to verify the classifier’s operation, each experimental defect was classified using the neural network. Three horizontal and five vertical defects from the test bed were fed into the neural network, which yielded the results shown in Table 1. These results demonstrate that the neural network operates properly for these experimental parameters. In the future work, sufficient training and the addition of more output nodes could increase the number of defect geometries and conditions.

Table 1

Neural classifier performance on experimental data

DefectOutput 1Output 2
11.000.00
20.870.13
31.000.00
40.010.99
50.001.00
60.001.00
70.001.00
80.001.00
DefectOutput 1Output 2
11.000.00
20.870.13
31.000.00
40.010.99
50.001.00
60.001.00
70.001.00
80.001.00

3.2 Data-Driven Defect Localization.

After the classification process, a cluster-based localization algorithm calculates the estimated defect location for each sensor grouping. The first step of this algorithm is to normalize the sensor readings.

In this case, we normalize the sensor readings relative to their sensor layer. We group the sensors into layers to reconstruct the defect shape from a set of points. For vertical defects, the sensor layers are parallel to the XY plane. For horizontal defects, the sensor layers are parallel to the XZ plane. These groupings are shown in Fig. 4. The equation used for this process is shown in Eq. (1), where n is equal to the number of sensors in the grouping. Horizontal sensor groupings contain ten points each, while vertical sensor groupings contain six points each.

Fig. 4
Sensor groupings for horizontal defects (left) and vertical defects (right)
Fig. 4
Sensor groupings for horizontal defects (left) and vertical defects (right)
Close modal

The next step is to use the layer-normalized data to infer the defect location. Due to the three-dimensional and nonlinear geometry, there is no analytical solution. Therefore, we chose to localize the defect numerically. Specifically, we used a mean-shift clustering approach to compare the normalized sensor readings with a database of readings. This database contains the sensor readings associated with a wide range of defect locations.

To generate the points for the mean-shift clustering, we began by comparing each normalized sensor value with the database values associated with the same sensor. Note that several defects could give the same sensor value. By using a specified number of neighbors, nneighbor, we use a k-nearest neighbors search to find the nneighbor closest temperature matches between the actual readings with the database. Here, nneighbor refers to the k value of the k-nearest neighbors search. The results of the clustering can fluctuate depending on the value of nneighbor, as a greater number of neighbors generates more densely packed temperature matches. This value is tuned in Sec. 4. We compiled these matches for each sensor and then generated planes of potential defect locations for each sensor grouping.

We improve the performance of the clustering method by incorporating physical intuition. Specifically, the temperature change centroid can be used to predict the vector from the origin to the defect. We compute the temperature change centroid using the expression in Eq. (3).
(3)

In this equation, Ts is composed of the ΔT from ambient for each sensor in the sensor grouping, while rs is the coordinates of each sensor in the sensor grouping, either in the XY plane for vertical defects or the XZ plane for horizontal defects. By using a four quadrant inverse tangent with the vector between the centroid and the center of the grouping, we determine the angle of the centroid for each sensor grouping. The centroid angle can then be used to reduce the number of temperature matches for the clustering. In this work, we use all points lying within ±45 deg of the centroid angle.

Once the potential defect locations are compiled, we perform clustering. In this case, we seek to identify geographic groupings of points. The center of the biggest cluster represents our best guess for the defect location. We use mean-shift clustering with a Gaussian kernel of radius r. This finds clusters of adjacent locations [18,19]. In Ref. [19], each iteration is initialized to a random point within the remaining data set. By restricting the initialization to the first point in the set, the algorithm behaves in the same manner every time.

The algorithm declares an initial cluster at the selected starting point and then calculates the square of the Euclidean distance between the starting point and the other active points in the data set. It then marks all points within this cluster as inactive and calculates the squared distance of all points within the cluster to compare them to the initial distance.

If the difference is above a specified error threshold, the process repeats. Once below the threshold, the cluster will either remain as is or merge with other clusters nearby, depending on the distance between its centers and other cluster centers. Then, a new cluster starts using the first index of any data points that are not marked as inactive. The results of this algorithm are shown in Fig. 5.

Fig. 5
Results of mean-shift clustering. Blue points represent the potential defect locations. Circles represent the clusters resulting from the method. The cluster with the most points is shown in red. The center of the red circle is the estimated localization result.
Fig. 5
Results of mean-shift clustering. Blue points represent the potential defect locations. Circles represent the clusters resulting from the method. The cluster with the most points is shown in red. The center of the red circle is the estimated localization result.
Close modal

The result from the sensor grouping localization is the center of the output cluster with the highest number of data points. In the event of ties, the algorithm selects the cluster with the lowest calculated root-mean-square (RMS) between the cluster center and its contents. This breaks the tie in favor of the cluster with the most centralized contents. Using the results from each sensor grouping in the test bed, the algorithm determines the shape of the defect and its approximate location.

With this clustering approach, the number of neighbors and the cluster radius are free variables. These are tuned as hyperparameters. We select the cluster radius and number of neighbors using the base data set (Sec. 4).

4 Base Data and Simulated Performance

Our localization method compares the sensor readings to a database of known defects. Generating a large set of candidate defects and temperature readings is challenging. Therefore, we leverage finite element simulations to create a large amount of synthetic data. Figure 6 shows a flowchart of the simulation generation and localization processes.

Fig. 6
Simulation generation and localization process. After the geometry is built in comsol, we generated a set of baseline data. After collecting the experimental data, the baseline data were used within the localization process to determine the points for clustering.
Fig. 6
Simulation generation and localization process. After the geometry is built in comsol, we generated a set of baseline data. After collecting the experimental data, the baseline data were used within the localization process to determine the points for clustering.
Close modal

4.1 comsol Simulation.

In this study, we performed simulations using comsol’s “Heat Transfer through Solids and Fluids” module. The geometry of the simulation is shown in Fig. 7.

Fig. 7
All defects as they appear in the test bed and simulation. Defects 1–3 are horizontal, while defects 4–8 are vertical. In simulation, only one defect is included at a time. The test bed is oriented with the YZ plane parallel to the ground.
Fig. 7
All defects as they appear in the test bed and simulation. Defects 1–3 are horizontal, while defects 4–8 are vertical. In simulation, only one defect is included at a time. The test bed is oriented with the YZ plane parallel to the ground.
Close modal

These simulated models can be designed to match the application-specific geometry. In this study, we focus on a lab-scale test bed. The actual test bed is described in Sec. 5. The simulated model must match the actual system as closely as possible to ensure good localization performance. Therefore, the key to the model design is the measurement of geometry and the identification of relevant physical properties. These include density, thermal conductivty, specific heat capacity, and convection coefficients for external faces.

Our simulated model consists of concentric annuli composed of AISI 1026 steel, concrete, shale, and cardboard from the interior outwards. We identified the thermal properties for the steel casing as those of AISI 1026 steel [20], while the properties of cardboard were an average of hard and soft wood, as found in Ref. [21]. The shale properties were determined with laboratory testing, while the properties of the cement were selected from Ref. [22].

The thermal properties of the shale were measured at Sandia National Labs’ Geomechanics facility. Using a Hot Disk sensor with Kapton film, the lab determined the shale’s conductivity, diffusivity, and volumetric specific heat both parallel and perpendicular to the layers of the shale. Data were collected for these qualities on a curve starting at 50°F, peaking at 100°F, and ending back at 50°F. To obtain the heat capacity values for the substance, we averaged the volumetric specific heat values taken at 75°F in each direction. We then divided these values by the density of the material to yield the heat capacity. In simulation, we used the resulting parallel heat capacity in the X and Y directions, and we used the perpendicular data in the Z direction.

We approximate defects using vertical and horizontal tubes containing heated water. The defect properties used in the simulation are for a cylinder of 54.4 °C water with a diameter of 1/8 in., matching the OD of the tubing used for the defects in the test bed. The velocity of the water is approximately 0.632 m/s using this diameter and the system’s volumetric flowrate of 0.3 LPM. We chose 25 °C as the ambient temperature to match the ambient conditions in our laboratory.

We determined the convection around the flat faces of the simulation model using the average transfer coefficient for forced convection past a plate. We used the outer diameter of the test bed as the characteristic length and set the fluid velocity around the plate to 0.25 m/s, which is within the range of mean air velocity recommended for thermal comfort according to table A.5 of ISO 7730, Ergonomics of the Thermal Environment [23]. In addition, we assigned the convection coefficients for the internal and external cylindrical faces of the simulation geometry to a single value, which was determined by the optimization of parameters from Sec. 4.2.

In the comsol model, the entire diameter of the defect in the simulation is water, as opposed to a copper annulus with water inside. To determine that this assumption was valid, we calculated the exterior temperature of a copper tube that contains water of the same thermal properties as the simulation. Figure 8 shows a diagram of the system.

Fig. 8
The system used for copper temperature calculations. The copper is Copper-122 Alloy, the same variety as the tubing for the test bed.
Fig. 8
The system used for copper temperature calculations. The copper is Copper-122 Alloy, the same variety as the tubing for the test bed.
Close modal
To determine the physical properties of the water flowing through the tubing, we used the engineering equation solver (ees) software to calculate the properties of water at the test bed heater temperature (327.55 K) and 1 atm. The velocity for the flow was calculated to be 2.65 m/s, calculated from the test bed’s volumetric flowrate of 0.3 LPM and the inner diameter of the copper tubing used in the test bed. The Reynold’s number for the system was then calculated using Eq. (4), with D being the inner diameter of the tubing. The Dittus-Boelter correlation (5) determines the Nusselt number of the flow, which we used to calculate the convection coefficient hw [24]. The hw was determined from the Nusselt number using Eq. (6). The resulting Nusselt number for this system is 48.96, resulting in an hw value of 20395 W/(m2K). Table 2 shows the calculated properties for the system.
(4)
(5)
(6)
Table 2

System properties for Fig. 8 

PropertyValueUnit
ID0.0015494m
OD0.003175m
TW327.55K
Text298.15K
hext5W/(m2 K)
ρ986kg/m3
μ5.088 × 10−4Pa s
k0.6454W/(m K)
Pr3.298
V2.65m/s
ReD7962.4
NuD48.96
hw20395W/(m2 K)
PropertyValueUnit
ID0.0015494m
OD0.003175m
TW327.55K
Text298.15K
hext5W/(m2 K)
ρ986kg/m3
μ5.088 × 10−4Pa s
k0.6454W/(m K)
Pr3.298
V2.65m/s
ReD7962.4
NuD48.96
hw20395W/(m2 K)

The thermal resistance method for radial systems, found in Ref. [21], was then used to calculate an external temperature. We used an initial external h of 5 W/(m2K) (assumed using [25]) to calculate the temperature difference between the interior water flow and the exterior of the copper, with a result of 0.0153 K. The temperature difference remained below 0.1 K for external h values as high as 32 W/(m2K). This was determined to be an acceptable temperature difference for this experiment, and so the model was simplified.

Simulations were all performed in steady state, to be compared to the steady-state time reading for each experimental defect. In application, the localization algorithm could also be applied to the transient case. However, focusing on steady-state reduces the training data required to generate accurate simulation results and yields results unaffected by any unmodeled transient effects.

4.2 Simulation Parameter Calculation.

To calculate the convection coefficients for the interior and exterior cylindrical surfaces, we assumed mixed free and forced convection with an ambient temperature of 25°C, constant surface temperature, and air speed of 0.25 m/s. For the free convection components of each surface, Eq. (7) was used to obtain the Rayleigh number, evaluated at the center of the test bed. In this equation, β is the expansion coefficient, evaluated as the inverse of gas temperature, while α is the thermal diffusivity, measured in (m2)/s. The variable g here is the acceleration due to gravity. Equations (8) and (9) display the formulas used to determine the Nusselt numbers for free and forced convection on the exterior of the test bed, while Eqs. (10)(12) were used to determine the Nusselt numbers for the interior of the test bed. Equation (13) combined the free and forced convection Nusselt numbers into a single value, which was then converted into the respective interior and exterior convection coefficients via Eq. (6). The final convection coefficients used in the simulation were hinterior = 4.85 W/(m2K) and hexterior = 2.93 W/(m2K).
(7)
(8)
(9)
(10)
(11)
(12)
(13)

After determining all simulation parameters, we began generating a large set of simulated defects for our database. These base data are used to compare against the measured defects for localization. We performed a parametric sweep for both the horizontal and vertical cases. In the horizontal case, the X values ranged from –4.5 in. and −1.75 in., and then 1.75 in. to 4.5 in., while Z values were between 1” and 18”. A coarse interval of 0.25 in. in the X direction and 1 in. in the Z direction was used in the sweep. In the vertical case (swept using cylindrical coordinates), the radius varied from 1.8 in. to 4.7 in. at an interval of 0.1 in., while the angle ranged from 0 to 360 deg at an interval of 10 deg.

The data set described above is still relatively coarse. Finer simulations could be performed but would be very time consuming. Coarse data can be a common problem, especially with experimental training data. Therefore, we chose to increase the resolution of our base data set using interpolation. The interpolation process starts by generating a finer mesh with the desired resolution. Then, we use create a scattered interpolant composed of the coarse data’s defect coordinates (XZ for horizontal, polar coordinates converted into XY for vertical), as well as normalized sensor temperature. This generates a mapping between location and sensor temperature for each sensor, which will generate sensor temperatures for any locations between the coarse locations. The fine mesh of defect locations is fed through this interpolant to generate a fine mesh of normalized sensor temperatures. This process is repeated for all 30 simulated sensors to generate the high-resolution normalized base data.

4.3 Localization Algorithm Hyperparameter Tuning.

Once the final baseline data set is generated, it can be used to tune the hyperparameters for the localization algorithm. To ensure the most robust localization results, the cluster radius and number of neighbors are optimized for the most consistent localization error. To accomplish this, we generate 24 simulated data samples for each defect orientation and run them through the appropriate normalization and localization process generating an error value for each sensor grouping (RMS). The median of these errors yields a single error score across all 24 samples. This error is computed over a range of number of neighbors and radii. The range for neighbors is 50 – 400. The range for radii is 0.025 in. to 1.0 in. The resulting data are shown in Figs. 9 and 10. The best choice of hyperparameters varies according to the designer’s preferences. One option is to choose the setting with the lowest error. However, to increase robustness, we chose cases where several different parameter combinations converge to similar values. In the radius location where the most number of neighbors converge, we selected the value for number of neighbors that was in the middle of the set of converged values. By using this method, the parameters selected for the horizontal were nneighbors = 250, radius = 0.15 in., and the vertical parameters were nneighbors = 100, radius = 0.25 in. These choices are noted with a bold circle in Figs. 9 and 10.

Fig. 9
Horizontal parameter tuning results. The parameters chosen are 250 neighbors with a radius of 0.15 in.
Fig. 9
Horizontal parameter tuning results. The parameters chosen are 250 neighbors with a radius of 0.15 in.
Close modal
Fig. 10
Vertical parameter tuning results. Parameters chosen are 100 neighbors with a radius of 0.25 in.
Fig. 10
Vertical parameter tuning results. Parameters chosen are 100 neighbors with a radius of 0.25 in.
Close modal

5 Experimental Setup

We tested the defect localization process outlined earlier using a small-scale wellbore model fabricated at Sandia National Laboratories. The test bed is composed of a 1026 steel casing, placed within a hollow cylinder of Mancos shale and embedded in Portland cement. The test bed defects are 1/8 in. OD copper tubing. The OD of the test bed is 12.5 in., the ID is 3 in., and the total length is 18 in. Figure 7 shows the test bed and all eight defects as they are placed the test bed. The coordinates of each defect are displayed in the results tables in Sec. 6.

To place the horizontal defects, the defect tubing was placed within holes drilled radially into the shale. The defect tubing was then potted in place using OmegaBond101 conductive epoxy. The vertical defects were placed using a 3D printed component to hold the tubing in the correct locations. After appropriate placement of the defects, Portland cement was poured into the space between the casing and the shale. The fabrication process from start to finish is displayed in Fig. 11.

Fig. 11
Test bed assembly process: (a) shale interior with horizontal defects potted in place, (b) pouring of portland cement, (c) patching of cracks with conductive epoxy, and (d) addition of defect fittings and external shell
Fig. 11
Test bed assembly process: (a) shale interior with horizontal defects potted in place, (b) pouring of portland cement, (c) patching of cracks with conductive epoxy, and (d) addition of defect fittings and external shell
Close modal

The fluid system of the test bed is heated using a Polyscience Professional Creative Series sous vide machine, which was selected due to its small footprint and precision of ±0.7°C. The heated water is pumped into an aluminum manifold, where it is distributed to specific defects using ball valves. The pump, a Masterflex peristalstic pump system from Cole Parmer, moves heated water through the defects and returns it to the basin, where it is recycled through the system. The system detects the flowrate at the entrance to the manifold using a Seametrics SPT-038 low flow flowmeter and displays the determined flowrate on a Seametrics flowmeter display for the operator to read. Figure 12 shows a diagram of the fluid system.

Fig. 12
Fluid diagram showing components that feed heated water through the test bed
Fig. 12
Fluid diagram showing components that feed heated water through the test bed
Close modal

The data acquisition system is housed in a NI 9178 8-module chassis. The modules included are three NI 9214 thermocouple modules to read the sensor data and one NI 9269 analog output module to control the peristaltic pump. To acquire experimental data, 30 SA1-T-72 T-type thermocouples were placed inside the casing, and four were placed between the casing and the shale. The thermocouples inside the casing were placed at the same locations as the sensors in the comsol simulations. To collect data from these sensors, we designed a custom labview VI, which operates the peristaltic pump in real time and collects thermocouple data at 10 min intervals. This time interval was selected so we could observe the propagation of each defect over time while remaining within the low bandwidth functional requirement.

Factors within the test bed and the surrounding environment that generate uncertainty in the system include the room conditions, the thermocouples, the heater, the flowmeter, and the placement of the thermocouples. The room where the test bed was placed was temperature controlled and thus was regulated. In addition, the shale of the test bed was insulated with cardboard, which is modeled in the simulation. The thermocouples, as type T, were calibrated before purchase and had an accuracy of ±1.0 °C. The heater had a precision of ±0.7 °C, while the flowmeter had an accuracy of ±1% of the max scale, or 0.189 LPM. The largest source of uncertainty was from the thermocouple placement. Since the sensors were placed within the casing by hand, ensuring proper placement was challenging, with an angular accuracy of ±5 deg, and an axial accuracy of ±0.125 in. Despite these sources of uncertainty, the localization algorithm performed accurately, maintaining an average RMS error of 1.49 in. across the eight defects.

The test procedure for each experiment is as follows:

  1. Turn on the power to the flowmeter display.

  2. Record the ambient temperature.

  3. Heat the water to 54.4°C.

  4. Open the desired valve to enable heated flow through the defect.

  5. Begin data collection by starting the data collection VI.

  6. Turn on the pump, adjust until flowmeter reads desired value (0.27–0.33 liters per minute).

  7. Let the system run for 24 h.

  8. Stop data collection by setting pump output to 0, then selecting the “STOP” button on the data collection VI.

  9. Verify successful data save.

  10. Turn off all equipment.

This data collection process was repeated for all eight defects in the test bed.

During an example experiment, the embedded sensors measured a maximum temperature difference of 4.92 K within just the first 10 min of flow, with a maximum temperature of 309.91 K at the end of the run time. A thermal camera was used to show the behavior of vertical defect 4. The image in Fig. 13 shows how the heated fluid affects the temperature in the test bed.

Fig. 13
Top: Cross section thermal image of the test bed under ambient conditions. Bottom: Cross section thermal image of the test bed after 260 min of heated flow.
Fig. 13
Top: Cross section thermal image of the test bed under ambient conditions. Bottom: Cross section thermal image of the test bed after 260 min of heated flow.
Close modal

Currently, all localization is performed offline using a matlab script. The script takes an average of 2.7 s to run a single-defect localization from a set of sensor values. The majority of this processing time comes from the scattered interpolant, as well as the mean-shift clustering. However, the code used for the these processes has a very small file size overall, as the localization functions are each less than 15 kb in size, and the localization database files from comsol take up 1.31 MB in total. Due to these small file sizes, the localization process could easily be ported into labview or into a Raspberry Pi for future test beds capable of running matlab. Localization speed can be potentially sped up through the use of more optimized coding and compilation methods.

6 Experimental Results

We quantify our overall algorithm performance using a set of eight experimental defects. The sensor readings are monitored over time, allowed to reach steady state, and then fed into the two-step algorithm. The outputs of the algorithm are the predicted defect locations for each sensor layer. We quantify localization accuracy using RMS error and standard deviation.

After classification (results in Table 1), each defect is localized according to the output of the neural network. For all eight defects, the distance between each sensor grouping’s guess and the experimental defect location is displayed in Tables 3 and 4. The graphical representation of the localization process is shown in Figs. 14 and 15 for defects 3 and 5, respectively, to provide both horizontal and vertical examples.

Fig. 14
Results for horizontal defect 3. The diamonds are plane guesses, and the line segment is the location of the copper tubing in the test bed.
Fig. 14
Results for horizontal defect 3. The diamonds are plane guesses, and the line segment is the location of the copper tubing in the test bed.
Close modal
Fig. 15
Results for vertical defect 5. The diamonds are plane guesses, and the line segment is the location of the copper tubing in the test bed.
Fig. 15
Results for vertical defect 5. The diamonds are plane guesses, and the line segment is the location of the copper tubing in the test bed.
Close modal
Table 3

Horizontal localization results

DefectExperimental Coord. (X,Z)L1 Coord.L1 Dist. (in.)L2 Coord.L2 Dist. (in.)L3 Coord.L3 Dist. (in.)
1(2.25, 13.63)(−2.16, 13.53)4.44(2.02, 14.06)0.49(1.92, 12.44)1.23
2(−1.88, 9)(1.79, 6.63)4.37(−1.81, 7.53)1.47(−2.11, 7.08)1.93
3(−2.375, 4.75)(3.65, 4.58)6.03(−3.08, 4.44)0.77(−1.84, 6.00)1.36
DefectExperimental Coord. (X,Z)L1 Coord.L1 Dist. (in.)L2 Coord.L2 Dist. (in.)L3 Coord.L3 Dist. (in.)
1(2.25, 13.63)(−2.16, 13.53)4.44(2.02, 14.06)0.49(1.92, 12.44)1.23
2(−1.88, 9)(1.79, 6.63)4.37(−1.81, 7.53)1.47(−2.11, 7.08)1.93
3(−2.375, 4.75)(3.65, 4.58)6.03(−3.08, 4.44)0.77(−1.84, 6.00)1.36
Table 4

Vertical localization results

DefectExperimental Coord. (X,Y)L1 Coord.L1 Dist. (in.)L2 Coord.L2 Dist. (in.)L3 Coord.L3 Dist. (in.)
4(0.94, −1.62)(0.89, −2.35)0.73(1.08, −1.65)0.14(1.02, −1.68)0.10
5(−0.52, −1.93)(−0.73, −2.01)0.22(−0.71, −1.84)0.22(−1.29, −1.66)0.82
6(1.38, 2.38)(2.76, 3.58)1.83(1.12, 1.81)0.63(1.22, 1.70)0.70
7(0.19, 2.12)(0.11, 2.38)0.27(0.60, 1.88)0.48(0.35, 1.97)0.22
8(−0.62, 2.29)(−1.55, 2.08)0.96(−0.05, 2.33)0.57(−0.34, 2.21)0.29
DefectExperimental Coord. (X,Y)L4 Coord.L4 Dist. (in.)L5 Coord.L5 Dist. (in.)
4(0.94, −1.62)(0.95, −1.73)0.11(0.89, −1.75)0.14
5(−0.52, −1.93)(−0.30, −2.00)0.23(−0.41, −1.98)0.12
6(1.38, 2.38)(1.02, 1.67)0.79(1.28, 1.93)0.46
7(0.19, 2.12)(0.38, 2.04)0.21(0.57, 1.96)0.41
8(−0.62, 2.29)(−0.47, 2.02)0.32(0.21, 2.27)0.83
DefectExperimental Coord. (X,Y)L1 Coord.L1 Dist. (in.)L2 Coord.L2 Dist. (in.)L3 Coord.L3 Dist. (in.)
4(0.94, −1.62)(0.89, −2.35)0.73(1.08, −1.65)0.14(1.02, −1.68)0.10
5(−0.52, −1.93)(−0.73, −2.01)0.22(−0.71, −1.84)0.22(−1.29, −1.66)0.82
6(1.38, 2.38)(2.76, 3.58)1.83(1.12, 1.81)0.63(1.22, 1.70)0.70
7(0.19, 2.12)(0.11, 2.38)0.27(0.60, 1.88)0.48(0.35, 1.97)0.22
8(−0.62, 2.29)(−1.55, 2.08)0.96(−0.05, 2.33)0.57(−0.34, 2.21)0.29
DefectExperimental Coord. (X,Y)L4 Coord.L4 Dist. (in.)L5 Coord.L5 Dist. (in.)
4(0.94, −1.62)(0.95, −1.73)0.11(0.89, −1.75)0.14
5(−0.52, −1.93)(−0.30, −2.00)0.23(−0.41, −1.98)0.12
6(1.38, 2.38)(1.02, 1.67)0.79(1.28, 1.93)0.46
7(0.19, 2.12)(0.38, 2.04)0.21(0.57, 1.96)0.41
8(−0.62, 2.29)(−0.47, 2.02)0.32(0.21, 2.27)0.83

To analyze the results of the defect localization, two different metrics are considered. The first metric is the RMS error of the layers from the experimental location. The second metric is the standard deviation of the points. We considered these metrics because defect localization is desired to be both close to the actual defect and closely clustered between sensor groupings. All eight defect RMS errors and standard deviations are plotted in Fig. 16. We also show the results for the case where the centroid method had not been introduced (Fig. 17). These results illustrate how the centroid information can substantially improve localization performance.

Fig. 16
Compiled results with clustering and centroid
Fig. 16
Compiled results with clustering and centroid
Close modal
Fig. 17
Compiled results for the horizontal and vertical defects in the data set with only clustering (no centroid used)
Fig. 17
Compiled results for the horizontal and vertical defects in the data set with only clustering (no centroid used)
Close modal

Overall, the vertical localization performed better than the horizontal in both RMS error and standard deviation. Utilizing the centroid technique did not alter results for five out of the 8 of the defects. However, the centroid reduced standard deviation for the three defects that it did alter. The centroid technique also reduced RMS error in two of these three defects.

With the combined centroid and clustering method, seven of eight defects maintained RMS localization error less than 3 in., and four of eight defects maintained standard deviation less than 0.5 in. These values indicate localization guesses that are precise, as the standard deviation of these values is less than 4% of the outer diameter of the test bed. The only defect with an RMS error greater than 3 in. is defect 3. However, all of the horizontal defects had much higher errors and standard deviations than the vertical defects. The RMS error across the three horizontal defects averaged to 3.05 in., or 24.4% of the outer diameter of the test bed, while the five vertical defects had an average RMS value of 0.55 in., or 4.4% of the outer diameter of the test bed. In addition, the standard deviation for horizontal defects averaged to 2.18 in., while the vertical averaged to 0.3 in. Higher error and standard deviation values for these horizontal defects were most likely due to the first layer localization process yielding clustering results on the incorrect side of the bed. Further development is needed to improve these results. These results could likely be improved via accommodating for heat loss, using more advanced convection parameters in the simulation, and modeling any artifacts resulting from the test bed construction.

7 Conclusion

The core contributions of this study are the design of a two-stage machine learning architecture to classify and then localize thermal defects, the training and development of a centroid-assisted mean-shift clustering algorithm, and the experimental validation of algorithm performance via a lab-scale wellbore. When fed into the classifier, all eight experimental defects were classified into the correct orientations. The defects were successfully located using the localization algorithm to an average RMS error of 1.49 in. and average standard deviation of 1.01 in.

Nevertheless, the localization was not perfect. This likely stems from mismatches between the base data and the experimental system. Specifically, the test bed had cracks during fabrication that were repaired with epoxy. This likely caused inconsistency in heat transfer properties. Other issues could stem from insufficient insulation on the brass components leading to loss of heat to the environment, as well as the table underneath the test bed not being modeled in simulation. Measurement uncertainty, as discussed in Sec. 5, could have also contributed to the localization errors.

This work has shown a promising architecture for defect localization. We envision extensions to other types of defects and to transient conditions. We believe our focus on localizing defects within each layer enables reconstruction of complex defects such as angled lines or helices. Additional defect types can also be tested by creating new layer structures and adding them to the classifier. The methods outlined in this article currently rely on steady-state conditions. This can delay defect localizations (especially in large systems). We believe that this approach can be extended to transient cases, but will require a more extensive data base that includes passage of time. This is an area for further exploration.

Acknowledgment

This work was supported by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) Program. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This article describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the US Department of Energy or the United States Government. SAND2021-15197 J.

Conflict of Interest

There are no conflicts of interest.

References

1.
Kang
,
M.
,
Dong
,
Y.
,
Liu
,
Y.
,
Williams
,
J. P.
,
Douglas
,
P. M. J.
, and
McKenzie
,
J. M.
,
2019
, “
Potential Increase in Oil and Gas Well Leakage Due to Earthquakes
,”
Environ. Res. Commun.
,
1
(
12
), p.
121004
.
Publisher: IOP Publishing
.
2.
EIA
,
2020
, “
Gas Wells by Production Rate–US Energy Information Administration (EIA)
,” https://www.eia.gov/petroleum/wells/annual/archive/2020/index.php
3.
Allison
,
E.
, and
Mandler
,
B.
,
2018
, “
Petroleum and the Environment: An Intriduction
,”
The American Geosciences Institute
, https://www.americangeosciences.org/geoscience-currents/petroleum-and-environment-introduction.
4.
Kiran
,
R.
,
Teodoriu
,
C.
,
Dadmohammadi
,
Y.
,
Nygaard
,
R.
,
Wood
,
D.
,
Mokhtari
,
M.
, and
Salehi
,
S.
,
2017
, “
Identification and Evaluation of Well Integrity and Causes of Failure of Well Integrity Barriers (a Review)
,”
J. Nat. Gas Sci. Eng.
,
45
(
9
), pp.
511
526
.
5.
Meder
,
E.
,
2013
, “
Hydrogen Sulfide Emissions of Geothermal Development in Hawaii
,” Bachelor's Thesis,
University of Hawaii at Manoa
,
Honolulu, HI
.
6.
Kerr
,
B.
,
2018
, “
Volcanic Activity Threatens Hawaii Geothermal Plant Long at Center of Resident Concerns
,”
Washington Post
, https://www.washingtonpost.com/national/volcanic-activity-threatens-hawaii-geothermal-plant-long-at-center-of-resident-concerns/2018/05/11/61c55c0a-5533-11e8-abd8-265bd07a9859_story.html
7.
Strickland
,
E.
,
2009
, “
Geothermal Explosion Highlights a Downside of a Leading Alt-Energy Source
,”
Discover Magazine
.
8.
Mills
,
H.
, and
Humphreys
,
B.
,
2013
, “
Habanero Pilot Project—Australia’s First EGS Power Plant
,”
35th New Zealand Geothermal Workshop: 2013 Proceedings
,
Rotorua, New Zealand
,
Nov. 17–20
. https://www.geothermal-energy.org/pdf/IGAstandard/NZGW/2013/Mills_Final.pdf
9.
Murvay
,
P.-S.
, and
Silea
,
I.
,
2012
, “
A Survey on Gas Leak Detection and Localization Techniques
,”
J. Loss Prevent. Process Indus.
,
25
(
6
), pp.
966
973
.
10.
EPA
,
2003
, “
Directed Inspection and Maintenance at Gas Processing Plants and Booster Stations
,” EPA430-B-03-018, https://www.epa.gov/sites/default/files/2016-06/documents/ll_dimgasproc.pdf
12.
Brodetsky
,
I.
, and
Savic
,
M.
,
1993
, “
Leak Monitoring System for Gas Pipelines
,”
1993 IEEE International Conference on Acoustics Speech, and Signal Processing
,
Minneapolis, MN
,
Apr. 27–30
, pp.
17
20
. https://ieeexplore.ieee.org/document/319424
13.
Li
,
D.
, and
Hu
,
Y. H.
,
2003
, “
Energy-Based Collaborative Source Localization Using Acoustic Microsensor Array
,”
EURASIP Journal on Advances in Signal Processing
.
14.
Smolen
,
J.
, and
van der Spek
,
A.
,
2003
, “
Distributed Temperature Sensing: A DTS Primer for Oil & Gas Production
,” Technical Report No. EP 2003-7100, Shell Oil Company, Houston, TX.
15.
Feo
,
G.
,
Sharma
,
J.
, and
Cunningham
,
S.
,
2020
, “
Integrating Fiber Optic Data in Numerical Reservoir Simulation Using Intelligent Optimization Workflow
,”
Sensors
,
20
(
11
), p.
3075
.
16.
Bense
,
V.
,
Read
,
T.
,
Bour
,
O.
,
Le Borgne
,
T.
,
Coleman
,
T.
,
Krause
,
S.
,
Chalari
,
A.
,
Mondanos
,
M.
,
Ciocca
,
F.
, and
Selker
,
J.
,
2016
, “
Distributed Temperature Sensing as a Downhole Tool in Hydrogeology
,”
Water. Resour. Res.
,
52
(
12
), pp.
9259
9273
.
17.
Cooke Jr
,
C. E.
,
1996
, “
Temperature Logging for Flow Outside Casing of Wells
,
Apr. 23, US Patent 5,509,474
.
18.
Nedrich
,
M.
,
2015
, “
Mean Shift Clustering
,” https://spin.atomicobject.com/2015/05/26/mean-shift-clustering/
19.
Gong
,
H.
,
2015
, “
Github Repository: Meanshift_matlab
,” https://github.com/hangong/meanshift_matlab
21.
Bergman
,
T.
, and
Lavine
,
A.
,
2017
,
Fundamentals of Heat and Mass Transfer
, 8 ed.,
John Wiley & Sons Inc
,
Hoboken, NJ
.
23.
International Organization for Standardization
,
2005
, “
Ergonomics of the Thermal Environment–Analytical Determination and Interpretation of Thermal Comfort Using Calculation of the PMV and PPD Indices and Local Thermal Comfort Criteria
,” ISO 7730:2005.
24.
Ghiaasiaan
,
S. M.
,
2018
,
Convective Heat and Mass Transfer
, 2nd ed.,
CRC Press
,
Boca Raton, FL
.
25.
Awbi
,
H.
,
1998
, “
Calculation of Convective Heat Transfer Coefficients of Room Surfaces for Natural Convection
,”
Energy Buildings
,
28
(
10
), pp.
219
227
.