Abstract

Prognostics and health management (PHM) of bearings is crucial for reducing the risk of failure and the cost of maintenance for rotating machinery. Model-based prognostic methods develop closed-form mathematical models based on underlying physics. However, the physics of complex bearing failures under varying operating conditions is not well understood yet. To complement model-based prognostics, data-driven methods have been increasingly used to predict the remaining useful life (RUL) of bearings. As opposed to other machine learning methods, ensemble learning methods can achieve higher prediction accuracy by combining multiple learning algorithms of different types. The rationale behind ensemble learning is that higher performance can be achieved by combining base learners that overestimate and underestimate the RUL of bearings. However, building an effective ensemble remains a challenge. To address this issue, the impact of diversity in base learners and extracted features in different degradation stages on the performance of ensemble learning is investigated. The degradation process of bearings is classified into three stages, including normal wear, smooth wear, and severe wear, based on the root-mean-square (RMS) of vibration signals. To evaluate the impact of diversity on prediction performance, vibration data collected from rolling element bearings was used to train predictive models. Experimental results have shown that the performance of the proposed ensemble learning method is significantly improved by selecting diverse features and base learners in different degradation stages.

1 Introduction

Bearing faults constitute up to 44% of the total faults in large induction motors [1]. Common causes of bearing failure include inappropriate lubrication, misalignment, load imbalance, fatigue, corrosion, vibrations, and excessive temperature [2]. Prognostics and health management (PHM) of bearings is crucial for reducing unplanned machine downtime for rotating machinery as well as for improving system safety and reliability [38]. PHM techniques for bearings can be classified into two categories: model-based and data-driven methods [5,9,10]. Model-based methods such as Kalman filter [11,12] and particle filter [2] develop closed-form models based on underlying physics. In contrast, data-driven PHM methods such as artificial neural networks [3], deep convolution learning [13], relevance vector machines [14], and principal component analysis [15] make predictions based on hidden patterns and inference without explicit mathematical models. Because the physics of complex bearing failures under varying operating conditions is not well understood yet, data-driven methods based on machine learning have been increasingly used to predict the remaining useful life (RUL) of bearings.

Various machine learning algorithms have been demonstrated on RUL prediction of bearings. However, little research has been conducted to develop ensemble learning-based PHM approaches to predicting the RUL of bearings [16,17]. As one of the most effective machine learning algorithms, ensemble learning methods fuse machine learning algorithms of different types (also known as base learners) to achieve better prediction accuracy than the individual machine learning algorithms. The objective of this study is to develop an enhanced ensemble learning algorithm by selecting diverse base learners and features at varying degradation stages of a bearing. In particular, the impact of diversity in base learners and features on RUL prediction accuracy is investigated. We hypothesize that the accuracy and robustness of a predictive model can be improved by combining multiple weak learners as well as selecting varying features at varying degradation stages. The remainder of this paper is organized as follows: Sec. 2 presents a literature review on PHM for bearings. Section 3 presents a computational framework based on an enhanced ensemble learning algorithm. This framework consists of classification of degradation stages, dynamic base learner selection, and dynamic feature selection. Section 4 presents a case study to demonstrate the effectiveness of the proposed framework. Section 5 presents conclusion and future work.

2 Related Work

This section provides an overview of model-based and data-driven approaches to predicting the RUL of bearings.

Li et al. [18] proposed an improved exponential model for predicting the RUL of rolling element bearings. Particle filtering was used to reduce the random error of the stochastic bearing degradation process. The proposed method was demonstrated on the FEMTO bearing data set. Li and Liang [19] introduced an approach based on improved rescaled range (R/S) statistic and fractional Brownian motion to predict bearing degradation trends. Classical R/S methods are sensitive to heteroscedasticity and short-term dependence. To address this issue, an improved R/S statistic model with an auto-covariance estimator was introduced. The FEMTO bearing data set was used to demonstrate the proposed method. Li et al. [20] developed a stochastic defect-propagation model for predicting the RUL of rolling element bearings. An augmented stochastic differential equation system was developed by taking into account uncertainties in parameter estimation. The proposed method was demonstrated using both numerical simulations and vibration signals collected from experiments. Qian and Yan [2] developed an enhanced particle filter-based approach for predicting the RUL of rolling element bearings. Particles were used to determine an adaptive importance density function and a backpropagation neural network in each recursive step. Experimental results have shown that the proposed method outperformed traditional particle filters and support vector regression. Boskoski et al. [21] developed an approach to RUL prediction of bearings based on Rényi entropy-based features and the Gaussian process model. The FEMTO bearing data set was used to evaluate the proposed approach. Experimental results have shown that the proposed approach was capable of predicting the RUL of bearings. Singleton et al. [22] introduced an extended Kalman filter-based method for predicting the RUL of bearings. The FEMTO bearing data set was used to demonstrate that the proposed method achieved high prediction accuracy. Lei et al. [23] proposed a model-based method for bearing RUL prediction. A health indicator was introduced by fusing information correlates with the degradation process from multiple features. The proposed health indicator and the parameters that is initialized by maximum likelihood estimation were used to predict the RUL of bearings using particle filtering.

Dong and Luo [15] proposed a data-driven approach to tracking the degradation process of bearings. Principal component analysis was used to fuse the features as well as to reduce data dimensionality. A least-squares support vector machines (SVM) were proposed to predict the degradation process using the fused features. The proposed method was demonstrated on a run-to-failure bearing data set. Ben Ali et al. [3] introduced a data-driven approach to RUL prediction of bearings by combining a simplified fuzzy adaptive resonance theory map neural network and Weibull distribution. A new feature called root-mean-square entropy estimator was introduced to track bearing degradation processes. Condition-monitoring data collected from double row bearings were used to validate the proposed method. Experimental results have shown that the proposed method achieved a high classification rate of bearing failures. Gebraeel et al. [24] proposed a neural network-based approach to RUL prediction of bearings. Experimental data were collected from a group of identical thrust bearings running at specified conditions. The results proved the significant accuracy of the proposed method. Guo et al. [25] defined a health indicator based on a recurrent neural network (RNN) to predict the RUL of bearings. The most sensitive features were extracted based on correlation and monotonicity. Liao et al. [26] introduced a data-driven approach based on a restricted Boltzmann machine (RBM) and an unmonitored self-organizing map. The proposed method was validated using the experimental data collected from a spindle testbed. Experimental results have shown that the RBM was capable of predicting the RUL of bearings with high accuracy. Huang et al. [27] proposed a data-driven approach to RUL prediction of bearings by combining self-organizing map (SOM) and back propagation neural networks. The SOM was used to determine the minimum quantization error indicator using vibration features. The back propagation neural network was trained using the indicators. A bearing run-to-failure experiment was conducted to demonstrate the effectiveness of the proposed method. Li et al. [17] proposed an ensemble-based approach to RUL prediction by combing different algorithms with different degradation-dependent weights. A degradation-dependent weight vector was determined by minimizing the cross-validation error. A simulation data set on bearing degradation was used to demonstrate the proposed method. The results have shown that the proposed method was accurate for RUL prediction.

In summary, little research has been reported on predicting the RUL of bearings using ensemble learning by taking into account bearing degradation stages as well as the impact of diversity in base learners and features at varying degradation stages on prediction accuracy. To fill this research gap, the impact of diverse base learners and features in different degradation stages on predicting the RUL of bearings is investigated.

3 Methodology

Ensemble learning combines diverse base learners to achieve better prediction performance than its constituent base learners. The underlying rationale is that feeding diverse features into diverse base learners in varying degradation stages can maximize prediction accuracy. In the context of PHM, the systemic overestimation or underestimation of one type of base learners can be overcome by the strengths of the base learners of different types. Several methods can be used for combining base learners, including randomization to reduce predictive variance [28] and boosting to reduce predictive bias [29]. In this study, the base learners were combined by solving the least squares problem under non-negative constraints, which is also known as a non-negative least squares (NNLS) problem as shown in Eq. (1) [30]
(1)
where A is an m × n matrix, bRm is a column vector of response variables, xi ≥ 0 is the weight assigned to each base learner, and 2 denotes the Euclidean norm.
By minimizing the objective function, base learners will be assigned with optimal weights. Features fed into different base learners will also be selected to minimize the objective function. To measure the performance of an estimator, the root-mean-square error (RMSE) shown in Eq. (2) was selected as the error metric. The RMSE is more sensitive to outliers than the square root of mean squared error (MSE) because the effect of each error on RMSE is proportional to the size of the squared error [31]
(2)
where yt^ is the predicted value of the regression-dependent variable yt at time t.
An algorithm could either overestimate or underestimate the true values. The RMSE incorporates both the variance and bias of each base learner and thus could reduce the error by the weighted summation of the overestimations and underestimations. The relationship between the variance and the bias is shown in Eqs. (3) and (4) [31]
(3)
(4)
where θ^ is the estimator with respect to an unknown parameter θ.

The performance of the ensemble learning method is highly dependent on the selection of base learners [32]. While previous research has demonstrated that improving diversity in an ensemble can improve prediction performance [33], we conducted a systematic study on the impact of diversity in base learners and features in different degradation stages on the accuracy of RUL prediction as shown in Fig. 1.

The computational framework of the proposed ensemble learning approach is illustrated in Fig. 2. The input of the framework is health-monitoring data such as vibration signals. The diversified ensemble approach introduces a new step where the degradation process of a bearing is classified into multiple stages. The base learners and features are selected based on the degradation stages. In this study, base learners of three different types, including decision tree-based, instance-based, and linear model-based, are selected. Section 3.1 presents the classification of degradation stages of bearings. Sections 3.2 and 3.3 introduce dynamic base learner selection and dynamic feature selection.

3.1 Classification of Degradation Stages.

Bearing performance deteriorates over time due to extreme operating conditions such as inappropriate lubrication, misalignment, fatigue, vibrations, and excessive temperature. Because condition-monitoring signals exhibit varying patterns in different degradation stages [34], classifying degradation stages can significantly improve prediction accuracy. In the context of PHM, degradation stages are determined by detecting abrupt variations in health-monitoring data. In statistics, change point detection identifies whether an anomalous behavior in time series data has occurred. Degradation stages are determined using the change point detection method as shown in Eq. (5) [35]
(5)
where K is the number of change points to be detected from the input data, k0 and kr are the first and the (r + 1)th sample of the signal, respectively. β is a proportionality constant. By determining the empirical estimate χ and the deviation measurement Δ of the mean value of the segment divided by the (r + 1)th sample, the deviation function i=mnΔ(xi;χ([xmxn])) can be solved by Eq. (6)
(6)
where xm · · · xn are all the samples between the mth and the nth samples.

After K change points are detected, the input data will be divided into K + 1 segments. For each change point, the mean values of two adjacent segments near the change point will be compared. If the mean value of the segment after the change point is more than twice the mean value of the previous segment, the change point will be considered as an anomaly point and the two segments will be classified into two stages.

Traditional statistical features such as root-mean-square (RMS), kurtosis, peak-to-peak value, and skewness have been widely used to classify degradation stages. Feature fusion methods such as Gaussian mixture models [36] and fuzzy c-means clustering [37] have also been used to characterize the degradation stages of bearings. According to the literature [38], RMS is one of the most effective and efficient features for classifying degradation stages. RMS is proportional to the area under the curve of the vibration signal, which can be calculated by Eq. (7)
(7)
where x1xrxn are all the samples.

3.2 Diversity in Base Learners.

Because the performance of base learners varies in different degradation stages of bearings, different base learners will be used to construct an ensemble in individual degradation stage. Some base learners might overestimate the RUL of bearings; others might underestimate the RUL of bearings. The ensemble with the best performance should combine base learners that overestimate and underestimate the RUL. The weights assigned to the selected base learners in different stages will be determined by minimizing the cross-validation error using NNLS. In this study, 16 candidate base learners from three different categories, including decision tree-based, instance-based, and linear model-based algorithms, were tested. Five of the tested algorithms were selected as base leaners to achieve the minimum cross-validation error. More details about selected base learners will be shown in Secs. 3.2.13.2.5.

3.2.1 Extra Trees.

Extra trees (ET) randomly select the cut points of a given numerical attribute and uses the whole learning sample to grow the decision and regression trees [39]. Extra trees differ from other randomization methods in that it aims to improve accuracy by building random trees at higher levels of randomization. Extra trees build a set of unmodified decision or regression trees by a top-down process. Three key parameters in extra trees include the number of attributes randomly determined at each node (K), the minimum sample size of a split (n_min), and the number of trees (M). K contributes to the strength of the attribute selection process. The strength of the average noise of the output is determined by n_min. The strength of variance reduction for the aggregation process is determined by M. The final prediction is obtained by combining the predictions of the trees, the regression prediction (the average of each prediction), and the classification (the voting result of prediction). The typical form of approximation by Extra Trees is shown as Eq. (8) [39]
(8)
where N is the sample size, I(i1,,in)(x) is the characteristic function of the hyper-interval, and the real-valued parameters λ(i1,,in)X depend on input xj and output yj of the method.

3.2.2 Random Forests.

Random forests (RF) combine independent decision trees with a random vector with the same distribution [40]. The trees in the forest are growing and splitting with the training data and their own random vectors. When the number of trees in the forest is large enough, the error converges. The features are selected randomly in each splitting node so that the method is more robust for handling noise. In this study, the RF method is used for regression. Thus, one third of the variables were selected for splitting at each node. The principle of each split is to minimize the following objective function (9) [40]
(9)
where j represents a splitting variable and s is the cutting point, R1(j, s) = {X|Xjs}; c1 = ave(yi|xiR1(j, s)); R2(j, s) = {X|Xjs}; c2 = ave(yi|xiR2(j, s)).

This splitting process will continue to repeat until it satisfies a stopping criterion. After all the decision trees in the forest have reached the threshold and stop splitting, a final prediction can be made by taking the average of the predictions from the regression trees.

3.2.3 XGBoost.

XGBoost is a scalable tree boosting algorithm that takes advantage of cache access patterns, data compression, and database shards to solve scale problems with minimal use of resources. Data are split into horizontal partitions (shards) to hold database server instance separately which could spread the load. An algorithm is used to handle sparse data, and approximate learning is done by a theoretically justified weighted sketch. The objective function shown in Eq. (10) [41] could be optimized by using second-order approximation
(10)
where l is a differentiable convex loss function that measures the difference between the prediction y^(t1) and the target yi. The second term Ω penalizes the complexity of the model. The first- and second-order gradient on the lost functions are gi=y^(t1)l(yi,y^(t1)) and hi=y^(t1)2l(yi,y^(t1)).

The efficiency of this method is guaranteed by parallel and distributed computing. The method is demonstrated to be both faster and more accurate than most classical tree bagging algorithms [41].

3.2.4 Support Vector Machines.

SVM minimize the upper bound of the generalization error by maximizing the margin between the hyperplane and the data [42]. Each class of the data lays on a different side of the two-dimensional plane divided by a line called a hyperplane. The performance of SVM mainly depends on the selection of a good kernel function [43]. It can also select the model automatically by obtaining the optimal number and locations of the basis functions during the training [44]. SVM yielded lower error rates than other instance-based methods. The proposed method applied the polynomial kernel as K(x,xi)=1+(x×xi)d, and the exponential is K(x,xi)=exp(γ×(xxi)2).

3.2.5 Generalized Additive Models.

A generalized additive model (GAM) is a generalized linear model that involves a summation of non-parametric smooth functions (also known as additive predictors) [45]. The smooth functions are estimated using a scatterplot smoother in an iterative procedure. This model is considered as an empirical method minimizing the Kullback–Leibler distance of the real model or an empirical model optimizing the expected log likelihood. The expected value Y to the exponential distribution of predictor variables xi is calculated by the link function shown in Eq. (11) [45]
(11)
The exponential distribution function is f(x)=Φ(p=1nϕp(xp)) where Φ is a smooth monotonic function.

3.3 Diversity in Features.

To select the optimal features in different stages, each extracted feature will be scored using the selected criteria. A threshold is determined for each criterion in a degradation stage. To determine the threshold of a criterion, the proposed method trains ensemble models using features with different scores assigned by the criterion and compares the RMSE of each model. The score with the smallest cross-validation RMSE is set as the threshold. Features with scores greater than all thresholds are selected for a stage.

To understand the impact of feature diversity on prediction accuracy, we tested three types of features: 13 time-domain features, 16 frequency-domain features, and 8 time-frequency domain features. The significance of each feature was evaluated, and the cross-validation error of each combination was compared. Different features were selected for different base learner selections. The results of feature selections were also different for different stages. The results have shown that feature diversity could improve the performance of the proposed method.

In theory, feature selection for ensembles is not necessarily the same as feature selection for a single base learner because the overall performance drives different criteria into each base learner. As stated earlier, an ensemble leverages the strength of base learners in different regions along the trajectory. This behavior could be accomplished in part through selecting different features. In this study, three most popular measures, including trendability, monotonicity, and prognosability, are used to evaluate the significance of the features.

Trendability: Trendability measures similarity within trajectories of a feature corresponding to time series [46]. The constant features will have zero correlation with time, and therefore zero trendability, and the features with linear functions will have strong correlations with time, showing large trendability. Features with good trendability represent the state and degradation of the system in the time series. The expression of trendability is shown in Eq. (12)
(12)
where x is a vector of observations of the feature, y is the time index of the feature, and n is the number of observations.
Monotonicity: Monotonicity measures the consistent increase or decrease of a feature. It is measured by the absolute difference between the numbers of positive derivative and negative derivative of the feature [46]. The expression of monotonicity is shown in Eq. (13)
(13)
where n is the number of the observations x. The value of Eq. (6) ranges from 0 to 1, where 0 means the feature is non-monotonic and 1 means the feature is monotonic decreasing or increasing.
Prognosability: Prognosability measures the variance of the critical value of failures in a population of systems [47]. It is the deviation of the final failure values of each path divided by the range of the mean path. The expression of prognosability is shown in Eq. (14)
(14)
where xj is the measurements of a feature on the jth system, variable M represents the number of monitored systems, Nj represents the number of measurements on the jth system.

4 Case Study

4.1 Experimental Setup.

The experimental data used in this case study were collected using the PRONOSTIA platform designed by the FEMTO-ST Institute [48]. This data set was also used in the IEEE PHM 2012 challenge. The PRONOSTIA testbed can accelerate the degradation process of bearings such that critical failure will occur within several hours under constant or varying operating conditions. This testbed consists of a rotational component, a load generation component, and a measurement component. A synchronous motor with a gearbox and a speed controller is used to control the rotational speed of the bearings. A pneumatic jack and a digital electro-pneumatic pressure regulator are used to control the load up to 4000 N. More details about the PRONOSTIA testbed are shown in Figs. 3 and 4.

4.2 Data Description.

The IEEE PHM 2012 challenge data sets were collected under three different operating conditions. One of the data sets was used to demonstrate the proposed method. The raw data were collected under a rotating speed of 1800 rpm and a load force of 4000 N. Seven sub-data sets, including Bearing1_1 to Bearing1_7, were used for training and validating the predictive model. Each sub-data set contains vibration signals in both horizontal and vertical directions (Fig. 5) that were collected using a set of high-frequency accelerometers. The sampling frequency for the vibration signal was 25.6 kHz. 2560 samples were recorded in the first 0.1 s of every 10 s. To avoid damages to the testbed, run-to-failure tests were terminated when the amplitude of the vibration signal exceeded 20 g. Table 1 shows the monitored useful life of each bearing.

4.3 Feature Extraction.

Thirty-seven (37) features, including thirteen (13) time-domain features and twenty-four (24) frequency-domain features, were extracted from the horizontal and vertical vibration signals, respectively. In total, seventy-four (74) features were extracted. The time-domain features include maximum, minimum, standard deviation, root-mean-square, kurtosis, skewness, mean, peak–peak value, variance, upper bound, entropy, standard division of inverse sine, and standard division of inverse tangent [49]. The frequency-domain features were extracted using a fast-Fourier transform for each sampling period. The frequency-domain features include the maximum, frequency of maximum amplitude, bandwidth, energy, and entropy. These frequency-domain features were applied on both the frequency–time spectrum and the power–density spectrum.

4.4 Degradation Stages.

A bearing may experience varying degradation patterns/stages during its in-service life. Detecting change points and degradation stages can improve RUL prediction accuracy. As shown in Fig. 5, the bearing degradation stages are correlated with statistical features of raw data. 37 features were investigated. Figure 6 shows three example features. RMS is the most effective feature that can detect change points.

The proposed method was able to detect three different stages, including (1) normal condition, (2) smooth wear condition, and (3) severe wear condition, before failure occurs [38]. Table 2 shows two different cases where three and two degradation stages were detected. For example, Bearing1_1 and Bearing1_3 have three degradation stages, including normal condition, smooth wear, and severe wear. Other bearing data sets have two degradation stages, including normal condition and severe wear. Figure 7 shows three degradation stages detected in Bearing1_1 and two degradation stages detected in Bearing1_7 data sets.

To demonstrate that classifying degradation stages can improve prediction performance, the prediction accuracy of the ensemble learning algorithm with and without classifying degradation stages was compared as shown in Fig. 8. Bearing1_2 to Bearing1_7 were used for training; Bearing1_1 was used for testing. When training a predictive model without classifying degradation stages, only one predictive model was built using all the training data. This predictive model was not able to track the changes in degradation patterns at varying degradation stages as shown in Fig. 8(a). In contrast, a predictive model was trained for each degradation stage after degradation stages were detected. The predictive model trained for each degradation stage was able to track the degradation pattern at each stage with higher accuracy as shown in Fig. 8(b). Table 3 shows more details about prediction performance in terms of relative error and R2.

Figure 9 shows a comparison of prediction performance at stage 3 for Bearing1_1 and Bearing1_3 data sets where three degradation stages were observed. Classifying degradation stages improves prediction performance significantly. For the Bearing1_1 data set, the relative error improved from 343.47% to 163.19%. The R2 error improved from 0.3344 to 0.9482. For the Bearing1_3 data set, the relative error improved from 156.8% to 67.5%. The R2 error changed from 0.723 to 0.721. The results also showed that classifying degradation stages reduced overestimation.

Figure 10 shows a comparison of prediction performance at stage 3 for Bearing1_2 and Bearing1_7 data sets where two degradation stages were observed. The results have shown that prediction performance at stage 3 for both data sets was improved significantly in terms of both R2 and relative error by classifying degradation stages. For Bearing1_2, the relative error improved from 969.6% to 186.3%. The R2 error improved from 0.1773 to 0.8350. For Bearing1_7, the relative error improved from 1297% to 91.47%. The R2 error improved from 0.3726 to 0.7526. Similar to the results shown in Fig. 9, classifying degradation stages significantly reduced overestimation.

Figure 11 shows a comparison of prediction performance at stage 3 for Bearing1_1 to Bearing1_7 data sets. Six of the seven data sets were used for training; the remaining data set was used for testing. The results have shown that a significant performance improvement was achieved for Bearing1_1, Bearing1_2, Bearing1_6, and Bearing1_7 data sets in terms of both relative error and R2 by classifying degradation stages. A minor performance improvement was achieved for Bearing1_3, Bearing1_4, and Bearing1_5 data sets.

4.5 Impact of Diversity in Base Leaners.

To further improve prediction accuracy by increasing base learner diversity, different base learners were selected in different degradation stages using the method described in Sec. 3.2. The hypothesis is that the performance of different base learners varies in different degradation stages. In this case study, five base learners were selected from three different categories, including decision tree-based, instance-based, and linear model-based algorithms. Different weights were assigned to the selected base learners in different degradation stages. Table 4 lists the optimal weights that minimized the cross-validation error in each degradation stage. The results have shown that only three methods from tree-based algorithms were selected for the fixed model trained without stage classification. The proposed method was demonstrated to be more diverse than fixed models in base learner selection. Table 5 shows a comparison between fixed base learner selection and dynamic base learner selection. The results have shown that by increasing diversity in base learner selection, the performance of the prediction model was improved. For Bearing1_1 data set, the relative error improved from 26.25% to 25.16%. The R2 error improved from 0.9062 to 0.9482.

4.6 Impact of Diversity in Features.

To further improve prediction accuracy by increasing feature diversity, different features were selected in different degradation stages using the method described in Sec. 3.3. The hypothesis is that feature selection for ensembles is not necessarily the same as feature selection for a single base learner. An ensemble model leverages the strength of base learners in different degradation stages. As mentioned in Sec. 3.3, three most popular measures, including trendability, monotonicity, and prognosability, were used to evaluate the significance of extracted features. Models with different thresholds were trained with data from Bearing1_2 to Bearing1_7. The cross-validation errors of all the models were compared, and the threshold of each criterion was determined to minimize the cross-validation error in the training domain. Features satisfied with the thresholds of all criteria were selected. To evaluate the dynamic feature selection, performance of the model using selected features will be tested by each of the seven bearing data sets (leave one out training and testing). The results have shown that dynamic feature selection can improve the prediction accuracy and reduce training time. The dynamic feature selection method consists of two steps:

  • Step 1: Remove linearly dependent features for stages 1, 2, and 3. The linear dependency of the extracted 74 features was evaluated using the S Function in a linear regression model [50]. Twenty (20) features were removed for stages 1 and 3; twenty-one (21) features were removed for stage 2. Because these features are linearly dependent on the remaining features, removing these features did not affect the prediction accuracy of the predictive model.

  • Step 2: Evaluate the importance of the remaining features for stages 1, 2, and 3 based on three criteria, including monotonicity, trendability, and prognosability. A threshold for each criterion is determined based on the RMSE of the predictive model.

For stage 1, the thresholds for monotonicity, trendability, and prognosability are 0.01, 0.02, and 0.08, respectively because the smallest RMSEs were achieved by selecting these thresholds as shown in Fig. 12. Twenty (20) features were selected for stage 1.

For stage 2, the thresholds for monotonicity, trendability, and prognosability are 0.16, 0.02, and 0.12, respectively because the smallest RMSEs were achieved by selecting these thresholds as shown in Fig. 13. Forty (40) features were selected for stage 2.

For stage 3, the thresholds for monotonicity, trendability, and prognosability are 0.1, 0.24, and 0.15, respectively because the smallest RMSEs were achieved by selecting these thresholds as shown in Fig. 14. Forty-five (45) features were selected for stage 3.

Figure 15 shows a comparison between fixed and dynamic feature selection for stage 1. Six (6) data sets were used for training; the remaining one was used for testing. As shown in Fig. 15, by using dynamic feature selection, the accuracy of the predictive model for stage 1 slightly improved in terms of relative error and R2.

Figure 16 shows a comparison between fixed and dynamic feature selection for stage 3. Six (6) data sets were used for training; the remaining one was used for testing. As shown in Fig. 16, the accuracy of the predictive model for stage 3 did not improve using dynamic feature selection in terms of relative error and R2. However, removing redundant features can increase computational efficiency.

Table 6 shows a comparison of overall performance for Bearing1_1 between fixed and dynamic feature selection. By using dynamic feature selection, the relative error has improved from 25.16% to 21.35%. The R2 error has improved from 0.9482 to 0.9647. The average training time has been reduced from 137 min to 45 min. The results have shown that the prediction accuracy can be improved by increasing the diversity in base learners.

4.7 Performance Comparison.

Three performance metrics, including relative error, R2, and a score function, were used to evaluate the prediction accuracy of machine learning algorithms. Relative error and R2 have been widely used to evaluate prediction accuracy. The score function is another model evaluation metric where different penalties are allocated for underestimates (negative absolute error) and overestimates (positive absolute error) [48]. A smaller penalty is assigned for underestimates, while a greater penalty is assigned for overestimates. In other words, to ensure system safety, underestimates are more desirable than overestimates. The score function ranges between 0 and 1. Greater scores indicate better prediction performance. The score function is defined in Eq. (15)
(15)
where Ai is the score of accuracy, i is the number of bearing test data sets, and Eri is the relative error. The final score is defined in Eq. (16)
(16)

The performance of the proposed method was compared with that of two deep learning algorithms reported in the literature (Table 7). Guo et al. [25] developed a predictive model using RNN on Bearing1_1 and Bearing1_2. The predictive model was validated on Bearing1_3 to Bearing1_7 data sets. The relative errors for Bearing1_3, Bearing1_4, Bearing1_5, Bearing1_6, and Bearing1_7 are 43.28%, 67.55%, −22.98%, 21.23%, and 17.83%. The average relative error is 32.48%. Our method achieved an average relative error of 25.73% and a score of 0.95 using the same training and test data sets. In addition, Liao et al. [26] trained a predictive model using a restricted RBM on Bearing1_1 to Bearing1_5. The predictive model was validated on Bearing1_6 and Bearing1_7 data sets. A score of 0.57 was achieved. Our method achieved an average relative error of 26.63% and a score of 0.96 using the same training and test data sets.

5 Conclusions and Future Work

A novel ensemble learning-based approach to PHM was developed by selecting diverse base learners and features in different degradation stages. To demonstrate the proposed method, the IEEE PHM 2012 challenge data were used to predict the RUL of rolling element bearings. The degradation process of the bearings was classified into three stages, including normal, smooth wear, and severe wear conditions, based on the variation in RMS of the vibration signals. The predictive model was built for each degradation stage. The base learners of the ensemble learning algorithm were dynamically selected from machine learning algorithms of three different types, including decision tree-based, instance-based, and generalized linear models. To increase the diversity in features, the features fed into the proposed method were also dynamically selected for each degradation stage. The experimental results have shown that dynamic feature selection and dynamic base learner selection in different degradation stages can increase the diversity in features and base learners, thereby improving the performance of ensemble learning.

The proposed method with increased diversity in base learners and features was capable of estimating the RUL of bearings with higher accuracy than that of two deep learning algorithms (i.e., RNN and RBM). In the future, we will feed different features to different base learners at varying degradation stages in order to further improve prediction accuracy. In addition, more advanced change point detection techniques such as deep learning will be tested. Moreover, uncertainty quantification methods will be used to provide quantile predictions.

Acknowledgment

The research reported in this paper is partially supported by the NASA Ames Research Center (Grant No. 80NSSC18M108). Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of NASA Ames Research Center.

References

1.
Zhang
,
P. J.
,
Du
,
Y.
,
Habetler
,
T. G.
, and
Lu
,
B.
,
2011
, “
A Survey of Condition Monitoring and Protection Methods for Medium-Voltage Induction Motors
,”
IEEE Trans. Ind. Appl.
,
47
(
1
), pp.
34
46
. 10.1109/TIA.2010.2090839
2.
Qian
,
Y.
, and
Yan
,
R.
,
2015
, “
Remaining Useful Life Prediction of Rolling Bearings Using an Enhanced Particle Filter
,”
IEEE Trans. Instrum. Meas.
,
64
(
10
), pp.
2696
2707
. 10.1109/TIM.2015.2427891
3.
Ben Ali
,
J.
,
Chebel-Morello
,
B.
,
Saidi
,
L.
,
Malinowski
,
S.
, and
Fnaiech
,
F.
,
2015
, “
Accurate Bearing Remaining Useful Life Prediction Based on Weibull Distribution and Artificial Neural Network
,”
Mech. Syst. Sig. Process
,
56–57
, pp.
150
172
. 10.1016/j.ymssp.2014.10.014
4.
Cui
,
L. R.
,
Loh
,
H. T.
, and
Xie
,
M.
,
2004
, “
Sequential Inspection Strategy for Multiple Systems Under Availability Requirement
,”
Eur. J. Oper. Res.
,
155
(
1
), pp.
170
177
. 10.1016/S0377-2217(02)00822-6
5.
Lee
,
J.
,
Ni
,
J.
,
Djurdjanovic
,
D.
,
Qiu
,
H.
, and
Liao
,
H. T.
,
2006
, “
Intelligent Prognostics Tools and e-Maintenance
,”
Comput. Ind.
,
57
(
6
), pp.
476
489
. 10.1016/j.compind.2006.02.014
6.
Si
,
X. S.
,
Wang
,
W. B.
,
Hu
,
C. H.
, and
Zhou
,
D. H.
,
2011
, “
Remaining Useful Life Estimation—A Review on the Statistical Data Driven Approaches
,”
Eur. J. Oper. Res.
,
213
(
1
), pp.
1
14
. 10.1016/j.ejor.2010.11.018
7.
Pecht
,
M.
, and
Jaai
,
R.
,
2010
, “
A Prognostics and Health Management Roadmap for Information and Electronics-Rich Systems
,”
Microelectron. Reliab.
,
50
(
3
), pp.
317
323
. 10.1016/j.microrel.2010.01.006
8.
Sadoughi
,
M.
, and
Hu
,
C.
,
2019
, “
Physics-based Convolutional Neural Network for Fault Diagnosis of Rolling Element Bearings
,”
IEEE Sensors J.
,
19
(
11
), pp.
4181
4192
.
9.
Lee
,
J.
,
Wu
,
F.
,
Zhao
,
W.
,
Ghaffari
,
M.
,
Liao
,
L.
, and
Siegel
,
D.
,
2014
, “
Prognostics and Health Management Design for Rotary Machinery Systems—Reviews, Methodology and Applications
,”
Mech. Syst. Sig. Process
,
42
(
1–2
), pp.
314
334
. 10.1016/j.ymssp.2013.06.004
10.
Xia
,
T.
,
Dong
,
Y.
,
Xiao
,
L.
,
Du
,
S.
,
Pan
,
E.
, and
Xi
,
L.
,
2018
, “
Recent Advances in Prognostics and Health Management for Advanced Manufacturing Paradigms
,”
Reliab. Eng. Syst. Saf
,
178
, pp.
255
268
.
11.
Jin
,
X. H.
,
Sun
,
Y.
,
Que
,
Z. J.
,
Wang
,
Y.
, and
Chow
,
T. W. S.
,
2016
, “
Anomaly Detection and Fault Prognosis for Bearings
,”
IEEE Trans. Instrum. Meas.
,
65
(
9
), pp.
2046
2054
. 10.1109/TIM.2016.2570398
12.
Qian
,
Y.
,
Yan
,
R.
, and
Hu
,
S.
,
2014
, “
Bearing Degradation Evaluation Using Recurrence Quantification Analysis and Kalman Filter
,”
IEEE Trans. Instrum. Meas.
,
63
(
11
), pp.
2599
2610
. 10.1109/TIM.2014.2313034
13.
Guo
,
L.
,
Lei
,
Y. G.
,
Li
,
N. P.
, and
Xing
,
S. B.
,
2017
, “
Deep Convolution Feature Learning for Health Indicator Construction of Bearings
,”
2017 Prognostics and System Health Management Conference (PHM-Harbin)
,
Harbin
, China,
July 9
, pp.
318
323
.
14.
Di Maio
,
F.
,
Tsui
,
K. L.
, and
Zio
,
E.
,
2012
, “
Combining Relevance Vector Machines and Exponential Regression for Bearing Residual Life Estimation
,”
Mech. Syst. Sig. Process
,
31
, pp.
405
427
. 10.1016/j.ymssp.2012.03.011
15.
Dong
,
S. J.
, and
Luo
,
T. H.
,
2013
, “
Bearing Degradation Process Prediction Based on the PCA and Optimized LS-SVM Model
,”
Measurement
,
46
(
9
), pp.
3143
3152
. 10.1016/j.measurement.2013.06.038
16.
Li
,
Z.
,
Goebel
,
K.
, and
Wu
,
D.
,
2019
, “
Degradation Modeling and Remaining Useful Life Prediction of Aircraft Engines Using Ensemble Learning
,”
ASME J. Eng. Gas Turbines Power
,
141
(
4
), p.
041008
. 10.1115/1.4041674
17.
Li
,
Z.
,
Wu
,
D.
,
Hu
,
C.
, and
Terpenny
,
J.
,
2017
, “
An Ensemble Learning-Based Prognostic Approach With Degradation-Dependent Weights for Remaining Useful Life Prediction
,”
Reliab. Eng. Syst. Saf.
,
184
, pp.
110
122
. 10.1016/j.ress.2017.12.016
18.
Li
,
N. P.
,
Lei
,
Y. G.
,
Lin
,
J.
, and
Ding
,
S. X.
,
2015
, “
An Improved Exponential Model for Predicting Remaining Useful Life of Rolling Element Bearings
,”
IEEE Trans. Ind. Electron.
,
62
(
12
), pp.
7762
7773
. 10.1109/TIE.2015.2455055
19.
Li
,
Q.
, and
Liang
,
S. Y.
,
2018
, “
Degradation Trend Prognostics for Rolling Bearing Using Improved R/S Statistic Model and Fractional Brownian Motion Approach
,”
IEEE Access
,
6
, pp.
21103
21114
. 10.1109/ACCESS.2017.2779453
20.
Li
,
Y.
,
Kurfess
,
T. R.
, and
Liang
,
S. Y.
,
2000
, “
Stochastic Prognostics for Rolling Element Bearings
,”
Mech. Syst. Sig. Process
,
14
(
5
), pp.
747
762
. 10.1006/mssp.2000.1301
21.
Boškoski
,
P.
,
Gašperin
,
M.
,
Petelin
,
D.
, and
Juričić
,
D.
,
2015
, “
Bearing Fault Prognostics Using Renyi Entropy Based Features and Gaussian Process Models
,”
Mech. Syst. Sig. Process
,
52–53
, pp.
327
337
. 10.1016/j.ymssp.2014.07.011
22.
Singleton
,
R. K.
,
Strangas
,
E. G.
, and
Aviyente
,
S.
,
2015
, “
Extended Kalman Filtering for Remaining-Useful-Life Estimation of Bearings
,”
IEEE Trans. Ind. Electron.
,
62
(
3
), pp.
1781
1790
. 10.1109/TIE.2014.2336616
23.
Lei
,
Y. G.
,
Li
,
N. P.
,
Gontarz
,
S.
,
Lin
,
J.
,
Radkowski
,
S.
, and
Dybala
,
J.
,
2016
, “
A Model-Based Method for Remaining Useful Life Prediction of Machinery
,”
IEEE Trans. Reliab.
,
65
(
3
), pp.
1314
1326
. 10.1109/TR.2016.2570568
24.
Gebraeel
,
N.
,
Lawley
,
M.
,
Liu
,
R.
, and
Parmeshwaran
,
V.
,
2004
, “
Residual Life, Predictions From Vibration-Based Degradation Signals: A Neural Network Approach
,”
IEEE Trans. Ind. Electron.
,
51
(
3
), pp.
694
700
. 10.1109/TIE.2004.824875
25.
Guo
,
L.
,
Li
,
N. P.
,
Jia
,
F.
,
Lei
,
Y. G.
, and
Lin
,
J.
,
2017
, “
A Recurrent Neural Network Based Health Indicator for Remaining Useful Life Prediction of Bearings
,”
Neurocomputing
,
240
, pp.
98
109
. 10.1016/j.neucom.2017.02.045
26.
Liao
,
L. X.
,
Jin
,
W. J.
, and
Pavel
,
R.
,
2016
, “
Enhanced Restricted Boltzmann Machine With Prognosability Regularization for Prognostics and Health Assessment
,”
IEEE Trans. Ind. Electron.
,
63
(
11
), pp.
7076
7083
. 10.1109/TIE.2016.2586442
27.
Huang
,
R. Q.
,
Xi
,
L. F.
,
Li
,
X. L.
,
Liu
,
C. R.
,
Qiu
,
H.
, and
Lee
,
J.
,
2007
, “
Residual Life Predictions for Ball Bearings Based on Self-Organizing Map and Back Propagation Neural Network Methods
,”
Mech. Syst. Sig. Process
,
21
(
1
), pp.
193
207
. 10.1016/j.ymssp.2005.11.008
28.
Evans
,
C.
,
Paul
,
E.
,
Dornfeld
,
D.
,
Lucca
,
D.
,
Byrne
,
G.
,
Tricard
,
M.
,
Klocke
,
F.
,
Dambon
,
O.
, and
Mullany
,
B.
,
2003
, “
Material Removal Mechanisms in Lapping and Polishing
,”
CIRP Ann.
,
52
(
2
), pp.
611
633
. 10.1016/S0007-8506(07)60207-8
29.
Friedman
,
J. H.
,
2001
, “
Greedy Function Approximation: A Gradient Boosting Machine
,”
Ann. Stat.
,
29
(
5
), pp.
1189
1232
. 10.1214/aos/1013203451
30.
Lawson
,
C. L.
, and
Hanson
,
R. J.
,
1995
,
Solving Least Squares Problems
,
Siam
,
Philadelphia, PA
.
31.
Wackerly
,
D.
,
Mendenhall
,
W.
, and
Scheaffer
,
R. L.
,
2014
,
Mathematical Statistics with Applications
,
Cengage Learning
,
Independence, KY
.
32.
Melville
,
P.
, and
Mooney
,
R. J.
,
2004
, “
Diverse Ensembles for Active Learning
,”
Proceedings of the Twenty-First International Conference on Machine Learning
,
Banff, Alberta
,
Canada, July 4
, ACM, p.
74
.
33.
Melville
,
P.
, and
Mooney
,
R. J.
,
2005
, “
Creating Diversity in Ensembles Using Artificial Data
,”
Inform. Fusion
,
6
(
1
), pp.
99
111
. 10.1016/j.inffus.2004.04.001
34.
Wang
,
D.
, and
Tsui
,
K. L.
,
2017
, “
Statistical Modeling of Bearing Degradation Signals
,”
IEEE Trans. Reliab.
,
66
(
4
), pp.
1331
1344
. 10.1109/TR.2017.2739126
35.
Lavielle
,
M.
,
2005
, “
Using Penalized Contrasts for the Change-Point Problem
,”
Signal Process.
,
85
(
8
), pp.
1501
1510
. 10.1016/j.sigpro.2005.01.012
36.
Yu
,
J.
,
2011
, “
Bearing Performance Degradation Assessment Using Locality Preserving Projections and Gaussian Mixture Models
,”
Mech. Syst. Sig. Process
,
25
(
7
), pp.
2573
2588
. 10.1016/j.ymssp.2011.02.006
37.
Pan
,
Y.
,
Chen
,
J.
, and
Li
,
X.
,
2010
, “
Bearing Performance Degradation Assessment Based on Lifting Wavelet Packet Decomposition and Fuzzy c-Means
,”
Mech. Syst. Sig. Process
,
24
(
2
), pp.
559
566
. 10.1016/j.ymssp.2009.07.012
38.
Hu
,
J.
,
Zhang
,
L.
, and
Liang
,
W.
,
2013
, “
Dynamic Degradation Observer for Bearing Fault by MTS–SOM System
,”
Mech. Syst. Sig. Process
,
36
(
2
), pp.
385
400
. 10.1016/j.ymssp.2012.10.006
39.
Geurts
,
P.
,
Ernst
,
D.
, and
Wehenkel
,
L.
,
2006
, “
Extremely Randomized Trees
,”
Mach. Learn.
,
63
(
1
), pp.
3
42
. 10.1007/s10994-006-6226-1
40.
Breiman
,
L.
,
2001
, “
Random Forests
,”
Mach. Learn.
,
45
(
1
), pp.
5
32
. 10.1023/A:1010933404324
41.
Chen
,
T.
, and
Guestrin
,
C.
,
2016
, “
Xgboost: A Scalable Tree Boosting System
,”
Proceedings of the 22nd acm Sigkdd International Conference on Knowledge Discovery and Data Mining
,
San Francisco, CA
,
Aug. 13
, ACM, pp.
785
794
.
42.
Cortes
,
C.
, and
Vapnik
,
V.
,
1995
, “
Support-Vector Networks
,”
Mach. Learn.
,
20
(
3
), pp.
273
297
.
43.
Smola
,
A. J.
,
Scholkopf
,
B.
, and
Müller
,
K. R.
,
1998
, “
The Connection Between Regularization Operators and Support Vector Kernels
,”
Neural Networks
,
11
(
4
), pp.
637
649
. 10.1016/S0893-6080(98)00032-X
44.
Scholkopf
,
B.
,
Sung
,
K. K.
,
Burges
,
C. J. C.
,
Girosi
,
F.
,
Niyogi
,
P.
,
Poggio
,
T.
, and
Vapnik
,
V.
,
1997
, “
Comparing Support Vector Machines With Gaussian Kernels to Radial Basis Function Classifiers
,”
IEEE Trans. Sig. Process
,
45
(
11
), pp.
2758
2765
. 10.1109/78.650102
45.
Hastie
,
T. J.
, and
Chambers
,
J. M.
,
2017
,
Statistical Models in S
,
Routledge
,
UK
, pp.
249
307
.
46.
Javed
,
K.
,
Gouriveau
,
R.
,
Zerhouni
,
N.
, and
Nectoux
,
P.
,
2013
, “
A Feature Extraction Procedure Based on Trigonometric Functions and Cumulative Descriptors to Enhance Prognostics Modeling
,”
2013 IEEE International Conference on Prognostics and Health Management
,
Gaithersburg, MD
,
June 24
.
47.
Coble
,
J. B.
, and
Hines
,
J. W.
,
2009
, “
Identifying Optimal Prognostic Parameters from Data: A Genetic Algorithms Approach
,”
Annual Conference of the Prognostics and Health Management Society 2009
,
San Diego, CA
,
Sept. 27
.
48.
Nectoux
,
P.
,
Gouriveau
,
R.
,
Medjaher
,
K.
,
Ramasso
,
E.
,
Morello
,
B.
,
Zerhouni
,
N.
, and
Varnier
,
C.
,
2012
, “
PRONOSTIA: An Experimental Platform for Bearings Accelerated Life Test
,”
2012 IEEE International Conference on Prognostics and Health Management
,
Denver, CO
,
June 18
.
49.
Javed
,
K.
,
Gouriveau
,
R.
,
Zerhouni
,
N.
, and
Nectoux
,
P.
,
2013
, “
A Feature Extraction Procedure Based on Trigonometric Functions and Cumulative Descriptors to Enhance Prognostics Modeling
,”
2013 IEEE Conference on Prognostics and Health Management (PHM)
,
Gaithersburg, MD
,
June 24
, IEEE, pp.
1
7
.
50.
Chambers
,
J. M.
, and
Hastie
,
T. J.
,
1992
,
Statistical Models in S
,
Wadsworth & Brooks/Cole Advanced Books & Software
,
Pacific Grove, CA
.