Max Jakob Award Paper

Artificial Neural Networks (ANNs): A New Paradigm for Thermal Science and Engineering PUBLIC ACCESS

[+] Author and Article Information
Kwang-Tzu Yang

Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN 46556kwang-tzu.yang.1@nd.edu

J. Heat Transfer 130(9), 093001 (Jul 09, 2008) (19 pages) doi:10.1115/1.2944238 History: Received August 21, 2007; Revised October 10, 2007; Published July 09, 2008

The use of artificial neural network (ANN), as one of the artificial intelligence methodologies, in a variety of real-world applications has been around for some time. However, the application of ANN to thermal science and engineering is still relatively new, but is receiving ever-increasing attention in recent published literature. Such attention is due essentially to special requirement and needs of the field of thermal science and engineering in terms of its increasing complexity and the recognition that it is not always feasible to deal with many critical problems in this field by the use of traditional analysis. The purpose of the present review is to point out the recent advances in ANN and its successes in dealing with a variety of important thermal problems. Some current ANN shortcomings, the development of recent advances in ANN-based hybrid analysis, and its future prospects will also be indicated.

In the past nearly three decades, we have been witnessing an overwhelming ground swelling in the development of computer-based algorithms in a group known as soft computing. This development has been driven by increasingly broader applications, which are difficult to deal with by conventional approaches, particularly those in engineering. Since such algorithms in soft computing are mostly based on simplistic models of human intelligence and evolutionary experience, they are also broadly known as artificial intelligence (AI) methodologies. They generally have the characteristics of very simple computational steps, often accompanied by a very large number of repeated computational cycles. This is very much in contrast to hard computing, which generally deals with numerical solutions to differential equations based on hard science, such as conservation laws and the like. Examples of soft-computing methodologies include artificial neural networks (ANNs), optima search algorithms such as genetic algorithm (GA) and genetic programming (GP), fuzzy-logic control, expert systems, data mining, and others. It is interesting to note that all these methodologies are based on very different natural human-related phenomena. Each of them has received in-depth development in the recent past in specific targeted applications. One important area of soft computing is in problems of thermal science and engineering. Up to the very recent past, these problems have largely been treated by traditional hard-computing approaches, along with experiments carried out for the purpose of validating the analysis or for performance correlations. However, there is a clear indication that thermal problems are becoming increasingly more complex and that the need for modeling single steady phenomena is rapidly migrating to the need for dealing with dynamics, system performance, optimization, and control. Unfortunately, the traditional approaches are simply not robust enough to handle such increased complexity, and new methodologies are definitely needed for this purpose.

As given in a recent review (1), several of the AI methodologies have shown very promising results to deal with just the type of complexity mentioned above. However, their applications to thermal problems are still rather tentative. Among the various soft-computing methodologies, only the ANN analysis and fuzzy-logic control have seen some sustained interest in recent years. Even then, studies in fuzzy-logic control have almost all concentrated in HVAC applications, primarily due to the fuzzy constraint related to human comfort. ANNs, on the other hand, have addressed a much wider application based on thermal applications to date and are specifically addressed in the present review. The purpose of this paper is to present the basic ANN methodology, its attributes and shortcomings, and implemental issues, and then to be followed by different groupings of thermal problems that have been treated by the ANN analysis and their corresponding results. Also discussed are ANN-based hybrid methodologies with other AI analyses to achieve more promising results not possible with the ANN analysis alone. Finally, some future prospects of ANN applications related to emerging critical thermal problems will also be presented. It is hoped that the present review of the ANN methodology and applications will enable us to appropriately call the ANN methodology a new paradigm for thermal problem studies and will thus encourage many more thermal engineers and practitioners to seriously consider ANN for treating future critical thermal problems, which are difficult to treat by traditional means.

Despite the apparent popularity of fuzzy-logic control in the relatively narrow HVAC applications (see examples in Refs. 2-4), ANNs are now unquestionably the leading soft-computing methodology for the general thermal problems. There are several significant reasons for this. First of all, it has a powerful ability to recognize accurately the inherent relationship between any set of input and output without a physical model, and yet the ANN results do account for all the physics relating the output to the input. This ability is essentially independent of the complexity of the underlying relation such as nonlinearity, multiple variables and parameters, and noisy and uncertain input and output data. This essential ability is known as pattern recognition as the result of learning. Secondly, the methodology is inherently fault tolerant, due to the large number of processing units in the network undergoing massive parallel data processing. Thirdly, its learning ability also gives the methodology the ability to adapt to changes in the parameters. This ability enables the ANN to deal with time-dependent dynamic modeling and adaptive control by means of neurocontrollers. This ability significantly enables thermal engineers to delve into system analysis and control, a complexity which simply cannot be treated by any traditional analysis at the present time. Finally, the basic ANN methodology has much flexibility to incorporating elements of other soft-computing methodologies such as fuzzy logic and GA, for example, to further improve its capability to deal with additional complexity in thermal problems.

On the other hand, despite these capabilities of the basic ANN methodology, it must be pointed out that input-output data sets must be available in the learning process to train the neural networks. Even though this requirement seems to be a serious shortcoming of the ANN analysis, it is, however, not really the case. The reality is the existence and availability of a large amount of experimental data sets for various thermal phenomena and device performances accumulated over a long time. They are mostly in the form of heat-transfer correlations. Such available data sets are the perfect vehicle for use with the ANN analysis. Furthermore, experimental data for thermal problems will always be available, in general, as they normally are required to validate theoretical models and analysis. In addition, experimental data obtained under specific dynamic conditions can also be used to train dynamic ANNs. Furthermore, the neural network can be trained in real time when the experimental data are being obtained at the same time, a feature useful in the development of dynamic adaptive-control schemes. Here again, the complexity of the problem under consideration is not an issue. In the next section, the ANN analysis in its basic methodology will be described, along with the discussion of the various issues of implementation. Examples of thermal problems with increasing complexity that have been treated by the ANN analysis with promising results are then shown and discussed.

The description that follows is essentially that of Schalkoff (5). The structure and function of the ANN attempt to mimic that of the biological neural network. The most popular fully interconnected ANN consists of a large number of processing units known as nodes or artificial neurons, organized in layers. There are, in general, three groups of node layers, namely, the input layer, one or more hidden layers, and an output layer, each of which is occupied by a number of nodes. All the nodes of each hidden layer are connected to all nodes of the previous and following layers by means of internode synaptic connectors or simply connectors. Each of the connectors, which mimic the biological neural synapsis, is characterized by a synaptic weight. The nodes of the input layer are used to designate the parameter space for the problem under consideration, while the output-layer nodes correspond to the unknowns of the problem under consideration. The parameters in the input layer need not to be all independent, and this is also true in the output layer.

At each hidden-layer node, the node input consists of a sum of all the node outputs from the nodes in the previous layer modified by the individual interconnector weights and a local node bias, which represents the propensity of the combined input to trigger a response at the node. It is thus clear that the weights are simply weighting functions that determine the relative importance of the signals from all the nodes in the previous layer. At each hidden node, the node output is determined by an activation function, which plays the role to determine whether the particular node is to activate (“fire”) or not. It is thus seen that by the connector and node operations, information, which starts at the input layer, moves forward toward the output layer. Such a network is known as a fully connected feed-forward network.

When the information reaches the output layer, errors can be determined by comparing the calculated feed-forward data with the experimental output data to determine the error at each of the output node. These errors are then used to adjust all the node biases and connector weights in the entire network to minimize the errors by means of a learning or training procedure. The most popular training procedure for fully connected feed-forward networks is known as the supervised backpropagation learning scheme (5-6) where the weights and biases are adjusted layer by layer from the output layer toward the input layer. The whole process of feeding forward with backward learning is then repeated until a satisfactory error level is reached or become stationary.

The step-by-step ANN methodology just qualitatively described is now presented. Figure 1 shows the structure or configuration of such a network, consisting of the input, hidden, and output layers with node and layer designations, where i refers to the layer number with i=1 for the input layer and i=I for the output layer, I being the total number of layers. Similarly, j is the node number in any layer, counting from the top in Fig. 1. Since the node numbers are likely to vary from layer to layer, the maximum j value is designated by Ji, depending on the layer number, and JI is thus the number of unknowns in the output layer. As a result, each node is designated by (i,j). A somewhat different designation is used for all the connectors, since there are two nodes involved. The node on the left is designated by subscripts, while the right node in the forward direction is designated by superscripts. For instance, the synaptic weight w1,12,3 refers to the connector from Node (1,1) to Node (2,3), and so on. In addition, symbols are also needed to deal with the nodal input and output at each node. Firstly, the nodal input to Node (i,j) is written asDisplay Formula

where θi,j is the nodal bias at (i,j), and yi1,k is the nodal output at (i1,k) at the previous layer. This equation clearly indicates that each signal coming from the previous layer is tampered by the weight in the same connector before they are added, and also finally modified by the local node bias, to form the input to the local node (i,j). It is thus seen that the weights and biases perform significant roles in influencing the node operation, and that the information to be processed represents the combined influence of all nodes from the previous layer. The node output, yi,j, is then driven by the input, xi,j, through the activation function or sometimes known as the threshold functionDisplay Formula
which plays the same role of the biological neuron, as to whether it should fire or not on the basis of the strength of the input signal. When the input signal is weak, the artificial neuron simply produces a small output. On the other hand, when the input signal exceeds a certain threshold, the artificial neuron fires and then sends a strong signal to all the connectors and then to all the nodes in the next layer. Several relevant activation functions have been proposed in the past (5-6), including the step function, the logistic sigmoid function, the hyperbolic tangent, the Gaussian, the wavelet, and others. One interesting practice is that the activation function may be changed from one hidden layer to another. However, the one that is the most popular and preferred is the continuous version of the step function, known as the logistic sigmoid function, which possesses continuous derivatives to avoid computational difficulties. It is also highly nonlinear, a behavior that could prove to be beneficiary in dealing with highly nonlinear input-output relations. It is generally written as
where the constant c determines the steepness of the function. Finally, it is noted that the node output yi,j represented by the sigmoid function always lies between 0 and 1 for all xi,j. Therefore, from a computational point of view, it is desirable to normalize the network input and output data with the largest and smallest of each of the data sets used in the ANN analysis.

As already mentioned, the most critical step in the ANN methodology is the learning or training process in which the errors determined at the output layer are successively reduced by systematically adjusting the weights and biases throughout the network, eventually to a level low enough to satisfy the user. Among the several available training algorithms, the most commonly used in the multilayer fully connected ANN is the feed-forward backpropagation training procedure based on a steepest-gradient error-correction process. The reader is referred to Haykin (6) for other available routines. In the usual thermal problems treated by the ANN analysis, the training data are based on experiments with the matched input-output data sets. The inputs represent the problem parameters, while the outputs are the desired unknowns. Each single experiment with the corresponding input-output data is called a run. For a given chosen network architecture of layers and nodes, the very first step in the training process is to assign initial values to all the synaptic weights and biases in the network. The values may be either positive or negative, and in general practice, are taken to be less than unity in absolute values. The second step is to complete all the node input and output calculations based on Eqs. 1,2,3 for all the layers. When i=I, the values of yI,j are then the network output data based on the given input data from that run. The backpropagation procedure then starts with an error function quantified byDisplay Formula

where tI,j is the normalized output target for the j-node of the last output layer, and this equation is simply a finite-difference approximation of the derivative of the sigmoid function. Once all δI,j are calculated, the computation then moves back to the previous layer I1. Since the target outputs for this layer do not exist, a surrogate error is utilized and calculated instead for the hidden layer I1, as given byDisplay Formula
These similar calculations are then continued from layer to layer in the backward direction until Layer 2. After all the errors δi,j are known, the changes in the weights and biases can then be determined by the generalized delta rule (7)Display Formula
Display Formula
for all i<I, from which all the adjustments in the weights and biases can be determined. The quantity λ is known as the learning rate that is used to scale down the degree of change made to the connectors and nodes. The larger the learning (or training) rate, the faster the network will learn, but there is a chance that the ANN may not reach the desired outcome due to oscillatory error behaviors. Its value is normally determined by numerical experimentation, and a commonly arrived value is in the range 0.4–0.5. In some practice to further modulate the error-correction rates, a momentum term is added to Eqs. 6,7 characterized by a momentum rate based on the old weight and bias changes in the previous learning iteration (7).

A cycle of training consists of computing a new set of weights and biases successively for all the experimental runs in the training data. The calculations are then repeated over many cycles while recording an overall error quantity for a specific run within each cycle, given byDisplay Formula

After a cycle of the experimental runs is completed, a maximum or average cycle error can be determined. It is important to note that the weights and biases are continually updated throughout the runs and cycles. The training is terminated when the last cycle error falls below a prescribed threshold or becomes stationary. The final sets of weights and biases can then be used for prediction purposes, and the corresponding ANN becomes a model of the input-output relation of the thermal problem. However, it is noted that there is also another completely different unsupervised training procedure (6) based on a stochastic searching methodology, also known as self-organizing maps. In most cases, this methodology is used to deal with problems with a large number of outputs, generally not suitable for thermal problems.

It is now clear that the overall ANN analysis involves just a few deterministic and algebraic steps repeated many times on the computer, while keep tab on the propagation of cycle errors in the training process. On the other hand, the methodology does involve a relatively large number of free parameters and choices. They include the number of hidden layers, the number of nodes in each layer, the initial weights and biases, the learning rate, the minimum number of training data sets, and sometimes also the choice of input parameters. All these do have a material effect on the ANN results. While literature does provide some semirational suggestions and recommendations (5,8-9), past experience and numerical trials and experimentation still represent the best guides. As pointed out by Yang (1) and Zeng (9), studies are being pursued currently to provide a more rational basis for some of the choices. Several guidelines may be of some interest. Despite the simple computational steps, overall effort is still an important issue, and does depend on the total number of nodes in the network, as a large number of nodes also tend to slow down the training process. The general idea is to seek a number of hidden layers and a number of nodes in each hidden layer as low as possible, but still permit efficient flow of information from the input layer to the output layer. In a new thermal problem, there are likely more input (parameter) nodes than output nodes. One reasonable practice is that the first hidden layer should have the same number of nodes as that in the input layer, and this number decreases toward the output layer, and the number of hidden layers depends on problem complexity. One flexibility in the ANN methodology is that both numbers of the hidden layer and the corresponding nodes can be increased at will from training cycle to training cycle, if the cycle errors do not decrease as expected. On the other hand, it must be cautioned that too many nodes may suffer the same fate as using polynomial curve-fitting schemes by collocation at specific data points, thus creating large errors in attempting interpolation between successive data points. One interesting strategy regarding the node-number issue is suggested by Kramin (10) by first training a large network, which can then be reduced in size, and by removing those nodes, which do not significantly affect the training result. There is another suggestion based on a reversed strategy by adding nodes systematically as training proceeds (11). These practices suggest that the network architecture, relative to the node number, can be freely modified throughout a single training cycle. A more rational procedure of optimizing the ANN architecture based on evolutionary programming is also available (12), as will be discussed separately in a later section. The issue of assigning initial weights and biases is always difficult in a new application. Without past information or data, the current practice is simply to generate a set of initial data from a random number generator of bounded numbers. A more rational, but complex method is that suggested by Lehtokangas et al. (13) based on an orthogonal least-squares algorithm. The choice of the training rate λ has not been rationally studied so far, and at the same time, the only guide is by numerical experimentation. As mentioned previously, a value somewhere between 0.4 and 0.5 can be used as a starting point, and reasonable results can be expected. Finally, the sigmoid activation function, as already pointed out, is a surrogate for a step function with continuous derivatives, thus avoiding possible computational difficulties. However, it also possesses asymptotic limits of [0,1], and may cause difficulties when these limits are approached. Therefore, the usual practice is to normalize all physical variables in an arbitrarily restricted range such as [0.15,0.85] to limit the computational efforts. Finally, there is another common and recommended practice in the way that the experimental data sets are utilized for training. Since the ANN is to be trained to recognize the input-output relations, which are generally somewhat noisy and do contain experimental uncertainties, it is desirable to include as many training data sets as possible. However, it is also important to set aside about one-quarter of the entire data sets to serve as testing data sets to evaluate the accuracy of the ANN results.

The application of ANN analysis to thermal problems has not had a long history. It is fair to say that such applications so far have only been largely exploratory and tentative, despite the fact that results have been uniformly promising in all cases. Only more recently, there is increasing interest in the use of ANN analysis for thermal- and energy-related applications, for three reasons. One reason is common to all fields of engineering, in that the underlying technical knowledge starts to lag behind what is needed in the ever-increasing complexity of the field of application. There is a constant push for analysis based on new paradigms to meet the demand. This is certainly the case in thermal engineering problems. The ANN analysis as a new paradigm represents an excellent candidate for this purpose. The second reason is the very nature of thermal problems, which involve a multitude of fundamental disciplines, their interactions, and likely complex geometry. The traditional approach and associated numerical analysis are so far only capable of treating a small segment of problems required in current critical applications. It is therefore understandable that experiments have played such an important role in the development of thermal science and engineering. Much of the raw experimental data still exist today. On the other hand, these experimental data have been correlated with dimensionless groups and are treated as physical models for performance prediction and design. As will be discussed later, there is a fundamental inadequacy in these correlated results, which has only been realized recently. On the other hand, such existing database and other experimental data to follow can be strategically used to develop excellent ANN-based thermal models. The third and last reason for the increased interest in the ANN analysis is the significant recent advances in the development of the ANN methodology itself. Such advances and further continued demonstration of the attainable excellent results are attracting an increasing number of thermal engineers to apply the ANN analysis to critical and challenging thermal problems.

Three broad categories of thermal problems have been successfully treated by the new ANN paradigm, despite the tentative nature of the applications in all these categories. For the reasons just cited, many more such problems with increasing complexity are expected to be similarly treated in the near future. The first category deals with steady behaviors of complex thermal phenomena and performance of complex thermal devices. In these cases, there is high complexity in interacting fluid-flow and heat-transfer processes along with the effect of complex geometry. Much of the current knowledge is based on heat-transfer correlations and unsatisfactory approximate theories. The ANN analysis can be applied to obtain ANN-based models, which are significantly more accurate than the traditional correlated models. The second category deals with time-dependent dynamic thermal phenomena and the corresponding thermal devices. Unfortunately, traditional approaches are almost powerless in developing viable dynamic models for such thermal problems, mainly due to the effect of added thermal masses for applications under dynamic conditions. The development of accurate ANN-based dynamic models requires, in addition to appropriate dynamic experimental data, only simple extensions of the basic ANN methodology. An additional advantage is that, as will be shown in more detail later, it is possible to use recurrent networks to train the network adaptively in real time, so that the dynamic model is being established as the physical dynamic process is evolving. The last category deals with even more complex dynamic thermal systems, which require robust control to ensure the proper dynamic performance of the systems. In such systems, it is also necessary to have the ability for real-time adaptive control to meet all the performances requirement when parameters undergo unknown changes. It will be shown later that this control capability can indeed be developed and experimentally verified by using the ANN analysis with real-time online adaptive training of a neurocontroller.

In the following sections, specific examples of ANN-analyzed thermal problems in the above-mentioned application categories will be given, along with some details of the associated ANNs and the results. These examples will also be supplemented with additional cited references.

Of all the steady thermal problems, one of the most important critical applications deals with heat exchangers and their performance with a great variety of geometry and operational conditions. Even for the simplest heat exchangers, only tentative large-scale CFD-based analysis has been attempted to predict their performance. The common practice, however, is to develop approximate theoretical models based on the use of overall coefficients of heat transfer made of individual heat-transfer coefficients for each fluid obtained from correlations with experimental data in terms of dimensionless numbers (14). Simplifying assumptions are made, in general, as a part of the approximate model, such as, among others, constancy of the correlated coefficients and thermophysical properties, and greatly simplified geometrical parameters. As a result, it is not uncommon to find that the resulting heat-transfer rates do not predict well the actual heat-transfer performance of the heat exchangers under consideration. Despite the availability of large-scale computing analysis, there is still no real viable alternative in recent times, if not for the new paradigm of the ANN-based analysis. As will be shown in the following few examples, the ANN analysis provides a very accurate paradigm for modeling the heat exchanger performance, which does not use any simplifying and artificial assumptions, but still capture all the physical effects that relate the input physical parameters to the heat exchanger performance (1).

Among the very first applications of the ANN methodology to thermal problems is understandably the analysis of experimental convective heat-transfer data (15-16) and to the performance and design of fin-tube heat exchangers (17-18). In the studies of Refs. 15-17, the attention is given to the use of the standard coefficients of heat transfer, while the ANN applied in Ref. 18 is directed toward the experimental heat-transfer rates directly. As will be pointed out later, the use of coefficient of heat transfer is also subjected to uncertainty, including the physical effects not accounted for in the correlations, such as fluid-flow nonuniformity and maldistribution, variable properties and the use of reference temperature, and nonuniform temperature differences. The study in Ref. 18 essentially bypasses these uncertainties by going directly to the all-important heat-transfer rate so that all effects are included in the input (parameter space)–output (heat-transfer rate) relation, leaving only the uncertainties associated with the experimental measurements. This issue is of critical importance for thermal engineering and applications. If the new paradigm of using the ANN analysis can be demonstrated that it produces much more accurate models relating any input-output data sets, even just for the steady thermal problems, that would suggest that all past correlations could be upgraded for much improved performance prediction and better design of the associated thermal devices. The net result would have a very large economic benefit to the industry involved. On the other hand, thermal engineers doing the ANN analysis still would have to have the physical insight to determine the proper input parameters to use, before the ANN analysis can be successful.

The early ANN applications in thermal problems were mostly exploratory, tentative, and with a narrow focus. In order to explore the full potential of the ANN paradigm, there is a need to carry out systematic studies to assess the viability of the ANN analysis in thermal problems with increasing complexity. For this purpose, we have undertaken a series of ANN studies in our own laboratory in the recent past in all three application groups including carrying out the needed detailed experiments to provide the training and testing data. Several such studies under steady-state conditions will now be shown, along with the ANN results. Whenever possible, pertinent additional literature will be cited to give a broader picture of the new ANN-based paradigm.

We have chosen the fin-tube compact heat exchangers (1,19-20) as the thermal devices for our ANN studies. They are very commonly used in many diverse thermal applications and detailed raw experimental data for some of them are also available for training and testing purposes. In addition, it is simple to carry out our own experiments under well-controlled conditions in our own laboratory, when needed, to support our overall ANN development studies. Another important reason for taking the fin-tube compact heat exchangers as our thermal devices is that they are geometrically complex so that their specific effects could be properly accounted for in the ANN analysis.

Our first set of ANN studies deals with three different heat exchangers used in very different applications with increasing geometrical complexity, which are shown in Fig. 2, all operating under steady conditions. The ANN modeling results can then be compared with those of the traditional least-squares power-law correlations and those directly from the experiments. The first heat exchanger in Fig. 2 is a simple single-row air-water finned coil for air-heating applications. Careful steady-state experiments were carried out in an open wind tunnel to measure the air-side and water-side terminal temperatures and the two flow rates with error bands within 0.7% in the measured heat-transfer rates from detailed uncertainty analysis (18). The final 259 sets of test data were correlated by the traditional dimensionless numbers using the least-squares regression analysis (19-21). Such correlations are known not to be unique due to the uncertainty caused by the unknown tube-wall temperatures, which were not measured in the experiments. The predicted rate of heat transfer by this correlation for the set of test data has an error band of ±10%, generally considered quite good. However, this measured heat-transfer error band obviously cannot all be attributed to the measurement errors alone. In fact, it can be traced to the deficiencies associated with the specific correlations. The physics in the flow and heat-transfer phenomena are known to be very complex, and as already noted before, some of the physics, such as flow mal-distribution and geometric complexity, are definitely not accounted for in the correlations, and hence some accuracy is lost.

Now we will see if the ANN analysis will provide a better model for the same data sets. This ANN analysis is based on the methodology detailed in the last section, 197 data sets out of the total 259 tests were used for training the ANN, while the rest were used to test the ANN predictions. For the ANN analysis, there were four input nodes in the input layer, corresponding to the normalized air and water flow rates, and the air and water inlet temperatures, and a single output node for the physical heat-transfer rate in the output layer. It is important to note that all inputs to the ANN are real physical quantities and, in particular, no property values are needed. The network configuration designation is usually given by numerals referring to the nodes in the successive layers. Some details of the ANN analysis are now given. For the case of the 4-5-2-1-1 configuration, for example, training was carried out to 200,000cycles, each covering one complete sequence of feed-forward and backward propagation for all the training data sets. The mean-square errors in the maximum and average errors within each cycle could then be calculated, as shown in Fig. 3. It is seen that the maximum error asymptoted at about 150,000cycles after undergoing a local minimum, while the corresponding average error reached its minimum at about 100,000cycles. In either case, the error levels were quite small. In another configuration of 4-2-1, a similar training based on the same training data sets was carried out. The predicted heat-transfer rate from the ANN analysis was based on the final set of adjusted weights and biases, as shown in Table 1. These final values illustrate quite typically those in usual ANN applications. In the current study, 14 different ANN configurations, as shown in Table 2, were utilized to show their relative error and standard deviation sensitivities for the 62 test data sets, where the quantity R is the mean value of the ratios of the experimental heat-transfer rate to that of the ANN predictions, indicating the average accuracy of the predictions, and σ is the standard deviation of the heat-transfer ratios from their mean, which is an indication of the degree of scatter of the ANN predictions. It is seen that the network configuration with R closest to unity is 4-1-1-1, while the network 4-5-5-1 is the one with the smallest σ. It does seem that the criterion on σ is a more important one, since the 4-5-5-1 network has errors still confined to a range less than ±3.7% for all testing data sets, even though most of the scatter still lies in the range less than ±0.7%. Of particular interest is that the error band in the test heat-transfer rate prediction from the ANN analysis is now in the range of measurement uncertainties of the experimental data, and thus signifies that most of the significant physics in the heat-transfer process of the test heat exchanger is accounted for by the ANN results, thus attesting to the excellent functional pattern-recognition ability of the ANN methodology. A more striking demonstration of the ANN results for the 4-5-5-1 network is the parity plot of the comparison between the ANN results and that from the traditional least-squares correlation analysis, as clearly shown in Fig. 4.

If the ANN approach is to be a viable one to deal with steady thermal problems with different degrees of complexity, it should also be capable to treat heat exchangers with greater geometrical complexity and also with operating conditions that involve more difficult physics. The compact multirow, multicolumn, fin-tube heat exchanger shown in Fig. 2 for air-cooling purposes utilizing chilled water flowing inside the tubes seems to satisfy these conditions. This heat exchanger was studied in great detail by extensive careful experimental measurements and correlations in terms of the Colburn j-factors by McQuiston (22-23). Since the chilled water temperature could cause the air temperature to fall below its dew point, condensation would occur on the fin surfaces, and the data collected contained all three different fin-surface conditions, namely, dry surface, surface with dropwise condensation, and surface with film condensation. Dropwise and film condensation cases were differentiated by purely subjective visualization. Also because of the condensation phenomena, the film spacing became an added important parameter, and was also treated as an input parameter. In addition, on the water side only high Reynolds-number turbulent-flow conditions were considered so that the two coefficients of heat transfer could be conveniently decoupled. The experimental data were correlated for the air-side heat- and mass-transfer coefficients in the form of the Colburn j-factors under different surface conditions. The dry surface data were later recorrelated by Gray and Webb (24) to improve the correlation accuracy. Also, these correlations were more recently further significantly improved, based on the same functional form for the j-factors as those used originally by McQuiston (23) by seeking global minimum error conditions (25). In Ref. 25, an ANN analysis was also carried out to assess its accuracy as a model to compare with that of the correlations. Several network configurations were tried with the best results given by a fully connected feed-forward network of 5-5-3-3 and using the backpropagation learning algorithm given in the last section, as shown in Fig. 5. It is seen that the five input nodes correspond to the air-inlet, dry-bulb and wet-bulb temperatures, the chilled water inlet temperature, the airflow Reynolds number, and the fin spacing. The three output nodes correspond to those of js for the sensible heat transfer, jt, for the total heat transfer, and Q for the total heat-transfer rate. The j-factors were used only so that the results could be directly compared to that of the original correlation results of McQuiston (23). Of the total 327 experimental data sets reported, 91 sets were associated with the dry-surface conditions, while 117 and 119 sets were related to dropwise and film condensations, respectively. These data sets were utilized to train separate ANNs. Also, the entire 327 data sets were also used to train another single ANN. The purpose here was to determine whether the ANNs trained with separate data sets involving different physics would perform better than the ANN trained with the complete data set. Every training process was based conservatively on 800,000 training cycles. The results in the rms percentage deviations of the four ANNs from those of the experiments are shown in Table 3, including those of the original correlations of McQuiston (23) and the improved dry-surface correlation of Gray and Webb (24). It is of interest to note that in the total heat transfer, being a physical quantity, the ANN recognized its correct relation with the physical input data, and that such a low level of error in the total heat transfer is again close to the expected experimental uncertainties. Similar results can also be seen for the other two surface conditions. In addition, the ANNs give better predictions for dry surfaces than those for wet surfaces, as the physics involved in the latter surfaces is certainly more complex. On the other hand, when the ANNs are trained with the entire data sets by disregarding the surface conditions, all deviations tend to increase, as the ANN attempted to negotiate, with more difficulty, with the different physics involved. However, even in this case, the predicted total heat-transfer rate, which is the ultimate unknown in practice, had deviation bounds only of the order of ±2.7%. The parity plots of all heat-transfer rates from the ANN results and the respective experimental data are shown in Figs.  678, and the accuracy of the ANN results is seen to be truly remarkable (25). Before discussing the study of the heat exchanger in Fig. 2 in another ANN application, it is of interest to mention here another significant use of the ANN methodology for the discovery of new knowledge. The analysis of the humid air just described provides an example of this capability in combination with an AI clustering algorithm (1,26) By prescribing three clusters, the algorithm divided the entire data set into three clusters, which, interestingly, correspond to the data for the three surface conditions. Therefore, subjective visual information is really not needed for the prediction of detailed performance of this heat exchanger. Since this is an example where the ANN analysis is used in conjunction with another AI algorithm, it will be revisited later in a section on hybrid algorithms involving ANN.

Now we mention another example involving a very complex, multirow, multicolumn, fin-tube heat exchanger used as an evaporator in a refrigeration application, again under steady operating conditions. The heat exchanger geometry is shown in Fig. 2. The Refrigerant R-22 flowed inside the tubes, while air was again flowing through the air passages. The extreme geometric complexity was in the finned passages with decreasing finned spacing of airflow to limit the possibility of air-passage blockage due to frost formation. In addition, the refrigerant underwent boiling evaporation inside the tubes (27). This is one instance that only 38 sets of data were experimentally obtained covering a large number of parameters. An ANN analysis was attempted to see how far the training with very limited data could be pursued to a reasonable conclusion. To accommodate all the free parameters involved, an ANN was chosen with a fully connected network configuration of 11-11-6-1, as shown in Fig. 9. There were seven geometrical ratios and four operating parameters in the input layer and one single output node for the total heat-transfer rate. It is understandable that all 38 data sets were used in the training. The ANN prediction of the total rate of heat transfer against the available data is shown in Fig. 1 in the form of a parity plot. It is easily noted that the accuracy of the results is remarkable. The rms error of the percentage difference between the predictions and measurements is less than ±1.5%, again of the same order as the estimated experimental uncertainties. However, it is unrealistic to expect that this one example would have general validity. In fact, it is known and expected that errors from the ANN analysis would increase as the number of training data sets decreases, and also that the ANN analysis would be expected to perform poorly if it tries to predict results outside the domain of the training data set (28). However, if the empty domain is small, then the ANN predictions would not suffer much. This may indeed be the case for the refrigeration coil just considered since in ANN applications for dealing with real-world complex thermal problems, there is always a tendency to limit experimental data sets suitable for training. Therefore, the limited data issue discussed here is an important one. Fortunately, ANN-based error-estimate methodologies are available to determine the relative importance of each data point in the limited data set. A good example is the one based on statistical cross validation to determine the domain where additional measurements are needed (27-28).

While compact fin-tube heat exchangers and their performance are representative examples of complex thermal devices and phenomena for ANN analysis, there are other steady thermal problems that have also received recent attention in applying the ANN analysis. A brief review of such applications will be given here. First of all, it would be appropriate to cite recent reviews of specific areas of steady-state ANN applications. Specific examples with either significant thermal applications or extended ANN analysis will then be shown and discussed. The broad-based early review of Sen and Yang (8) has already been noted earlier. There are also two additional reviews dealing with energy systems (29) and multiphase flows with and without heat transfer (30).

The review of Kalogirou (29) includes a brief background of ANN and a brief account of the basic ANN methodology, and describes several classes of energy-related problems, which were successfully treated by applying the described basic ANN methodology. They include, among others, solar energy systems for modeling the heating-up response and design of a steam-generating plant, for the estimation of a parabolic trough-collector intercept factor and local concentration ratio, and the determination of hourly solar irradiance dependent on astronomical, and meteorological-climatic data. Also described are ANN applications in HVAC systems for determining building thermal loads and their prediction, for controlling the temperature in operating buses based on the ambient temperature, number of passengers, and time of day, and also applications in combustion phenomena including turbulent combustion modeling and waste incineration processes. Several additional applications mentioned include forecasting and prediction in energy management practices and prediction of frost buildup in evaporator coils for designing efficient defrosting practices. While in all these energy-related applications using the ANN analysis, excellent results were the norm (29), the success of each application clearly still depends on the choice of the input parameters, the availability of training data sets, and computational experimentation to determine the optimum configuration of the network and other free parameters in the basic methodology.

Another review of significant ANN applications to thermal problems has been more recently given by Sen and Yang (30) in the specific area of multiphase system with and without heat transfer. The difficulty of theoretically modeling such problems by standard traditional approaches is well known, and available approximate models have been based only on detailed experiments and their data correlations. It is generally recognized that despite the availability of large accumulation of good experimental data, the correlations have been rather tentative and uncertain, and in many cases led to predictions with substantial errors compared to the experimental results. It is therefore not surprising to find the rising development of ANN-based models of such multiphase systems, in view of its essential capability of recognition of complex patterns and the availability of good experimental databases. The review in Ref. 30 identified seven specific areas of ANN applications, which will now be briefly discussed in terms of their application significance and appropriate modifications of the basic ANN methodology to improve the model development. At the outset, it is pertinent to mention that because of the complexity in multiphase systems and the flexibility in the ANN methodology, it is natural to combine it with additional algorithms to effect better or more optimum results. More discussions on this issue will be made in a later section on ANN-based hybrid algorithms.

The first area mentioned in Ref. 30 deals with the phenomena of two-phase flow in pipes. Two-phase gas-liquid flows are among the most complex and difficult phenomena in heat transfer, characterized by interfacial interactions and relative movement between the phases. The case with a single component is even more complex. In view of applications such as thermohydraulics in nuclear reactors, petroleum processing, and biomedical processes, predictive models based on traditional correlations are only valid for distinct flow regimes such as bubbly, slug, churn, annular, and stratified flows. Therefore, such correlation models cannot be used unless the specific flow regime is first identified. For two-component, two-phase flows (air-water, for instance) in pipes, the nonintrusive impedance measurements of the area-averaged void fractions give different signal characteristics for different flow regimes. These data are then used to train the ANN. In one study (31), because of the relationship between the impedance signal and the specific flow regime is not crisp, the ANN output was placed in series with a fuzzy-logic classifier to determine the likely flow regime. The three-layer 2-12-6 ANN adopted is also of some general interest. The two inputs are the diagonal and neighboring impedances, and the six-node output layer consists of two nodes identical to those in the input layer for identification purposes, one each providing the standard deviations of the diagonal and neighboring impedances, and the last two nodes are related to the media of the inputs. The training data consisted of 200 data points for each of the four flow regimes. Additional data points were used for testing the ANN predictions. In a subsequent study of the same problem (32), two freestanding ANNs were used for flow-regime identification. One is again a three-layer connected configuration, but with only one single output node, which is the flow-regime indicator. The input layer had four nodes consisting of the mean and standard deviations of the diagonal and neighboring impedance data, and the hidden layer had 12 nodes. The second ANN is based on a self-organized (unsupervised) neural network with two layers only to cluster the two input data into four categories corresponding to the four flow regimes. Both studies (31-32) produced good results in the flow-regime identification. These examples also show the flexibility and possible modification of the basic ANN methodology. The same review also mentioned another ANN flow-regime identification study dealing with oil-gas-water multiphase flow in a horizontal pipe (33). The instantaneous differential-pressure signals from the flow were measured with a piezoresistance pressure transducer with very fast response. The signals were preprocessed and denoised by a wavelet methodology to give characteristic vectors of various flow regimes, which were then used as inputs to the ANN to classify the flow regimes. The analysis was used to just identify three flow regimes, namely, stratified flow, intermittent flow, and annular flow. The input layer had nine nodes characterizing the preprocessed signals, and the output had three nodes with (1,0,0) for stratified flow, (0,1,0) for intermittent flow, and (0,0,1) for annular flow. The lone hidden layer had five nodes. A nonlinear least-squares algorithm was used to improve the learning speed and overall training efficiency. A total of 200 data sets were used for training, while additional 95 sets were for testing. The results show that the flow-identification accuracy was 95% for the stratified flow and 92% for both intermittent and annular flows.

As reviewed in Ref. 30, the second specific area of multiphase thermal systems treated by the ANN analysis deals with two-phase flow with heat transfer, as characterized by the added complexity due to temperature differences. The ANN application to predict the critical heat flux in round vertical-tube flow of water under low pressure and oscillating flow conditions for either natural or forced circulations was carried out in a more recent study by Su et al. (34). A fully connected 7-10-1 ANN with the standard feed-forward algorithm was chosen, but using a hyperbolic-tangent activation function, representing a slight variation of the basic methodology. The training process was aided by the use of both optimized learning and momentum rates. The inputs included pressure, mean mass flow rate, relative amplitude, inlet subcooling, oscillation period, and geometrical ratio of the heated length to tube diameter. One additional input node was a numeral unity, providing a threshold to nodes in the next layer. This example thus illustrates another slight modification to the standard algorithm. The ten-node hidden layer also included a unity node for the same purpose. The single-node output layer was a dimensionless ratio of the critical heat flux with oscillation to that without oscillation given by the test data. The study utilized two separate trained networks: one with natural and the other for forced circulation data sets. It was demonstrated that the average parity ratios of the training sets were well within 10%, while the average error of the testing data was on the order of 1.0%, again comparable to the experimental uncertainties.

The third topic area reviewed in Ref. 30 deals with even more complex phenomena of multiphase flow and heat transfer in such specialized systems as bubble columns, packed towers, fluidized beds, and heat transfer and flow in fluid-particle two-phase systems. In each of the problem areas, ANNs were utilized to develop predictive models for heat and mass transfer because of the nonlinear mapping capabilities of the ANNs. Examples of these applications will concentrate on the different algorithmic aspects of the ANNs used, as deviated from the standard methodology. The first example treats the correlation of heat-transfer rate fluctuations in 3D bubble columns with quantitative dynamic behavior of bubble and liquid motions (35). The ANN was trained with a three-layer configuration by the time-series data of local heat transfer from hot-wire measurements. The ANN analysis was that of the standard methodology. Usually, the choice of inputs depends on the physical insights. However, when the phenomena are very complex as in bubble columns, the input parameters might not be obvious. Here in numerical experimentation, the number of input-layer nodes was varied from 2 to 8, and the middle hidden layer from 5 to 50. The trials, together with the single output node for the local time-averaged heat-transfer rate of the liquid phase during a given time period, were used in the training process to determine the optimum number of nodes of the first two layers. These trials resulted in an optimum number of nodes of the first two layers and a 6-10-1 configuration. About one-half of the hot-wire data in the time series in a 20cm column at gas velocities of 2.3cms, 6.2cms, and 9.0cms were selected as the ANN training data sets, while the remaining data of the 20cm, 40cm, and 80cm columns, measured at the same position over the whole range of gas velocities, were used for testing. The ANN prediction results for all three-column sizes showed that the parity errors between the predicted and measured coefficients of heat transfer were within 20%, while the average error was less than 3%. One interesting result is that none of the measured data for the larger column sizes were used in the training data set, indicating that the relation between the local heat transfer and dynamic liquid and bubble motions was independent of the column size. The result clearly demonstrates that the ANN analysis in this instance is capable of generating new knowledge.

A last example in this third topic area deals with the infiltration of nonaqueous phase liquid (NAPL) through a vertical, homogeneous, soil column initially saturated with water, as studied by Morshed and Powers (36). An ANN analysis was utilized to develop an appropriate model for the elevation and volume of NAPL. One novelty in this study was that the training and testing data sets were generated by a theoretical model. These data sets were preprocessed by dimensional analysis to identify dimensionless terms associated with the input-output relations. Numerical experiments were carried out to determine the optimum number of nodes in the middle hidden layer in a 4-5-10 configuration. A hyperbolic-tangent activity function was used for the hidden layer and a linear activation function was used for the output layer. One-half of the total 410 data sets were used for training, while the rest of the data sets were used for testing. The results showed that for the training data set, the regression coefficient had an average of 0.989 (1.0 for perfect match). For the testing set, the same coefficient ranged from 0.962 to 1.0, which indicated a great success of the ANN model. In the parity comparison, 195 of the 205 data points fell within the range of 10% of the experimental values.

The fourth topic reviewed in Ref. 30 treats heat transfer in evaporators and boiling heat transfer in liquid mixtures. One example has already been given earlier in Ref. 27 dealing with very limited data for a Freon evaporator, and hence will not be repeated here. The more recent study of Kelleher et al. (37) attempted to predict nucleate pool boiling heat transfer from a vertical tube bank with and without fins on the tubes in Refrigerant R-114 with various amounts of oil present by utilizing the ANN analysis. In practice, such oil is present due to carryover of lubricant oil from the compressor. Different sets of test data served as training and testing data. The input layer of the three-layer fully connected ANN configuration consisted of four nodes, representing the temperature over saturation, number of active tubes, mass percent of oil in the test refrigerant, and whether the tubes were finned or staggered. The output layer had a single node related to the boiling heat flux. Entirely satisfactory ANN predictions realized with parity errors again approach the uncertainties of the experimental data. Another is the study of Liu et al. (38), which deals with boiling heat-transfer enhancement using organic additives. The primary objective was to evaluate and predict the boiling heat flux as a function of the addictive molecular structure. The ANN model was trained with 30 additives and tested additional 11 additives used in the testing samples. The molecular structure of any additive was represented by four parameters in the ANN input, and the single output was the relative increase in the boiling heat flux at the same temperature difference with the additive. There were two hidden layers, for each of which the number of nodes was allowed to vary between 1 and 14. Numerical experimentation revealed that the optimized number of nodes for the first and second hidden layers were 3 and 6, respectively. Another deviation from the common practice was that the hyperbolic-tangent activation function was applied to the nodes in the first hidden layer, the sigmoid function for nodes in the second hidden layer, and a linear function for nodes in the output layer. Good results were obtained, showing 100% parity accuracy for the training data sets and over 90% for the testing cases. Analysis of the results showed that the molecular weights of the additives and the polar groups had the greatest effects on the boiling heat-transfer enhancement. This gives another example of the knowledge discovery ability of the ANN analysis.

Furthermore, industrial thermal processes are generally very complex and involve the interactions of different components. The performance of the entire process is difficult to model, which is, however, needed for performance prediction and control. Only possible information must necessarily come from direct experiments. This is indeed a natural situation for the ANN analysis. Since some individual processes also involve multiphase phenomena, their ANN analysis and results were also covered in the recent review (30), which included an example of the simulation of heat transfer during spray cooling in conjunction with predicting cooling times in metal processing (39). The hear-transfer data were obtained experimentally for a range of pressure ratios for surface temperatures up to 800°C. In this study, a commercial neural network code was used, but it also included a GA with a statistical estimator to assure the relevance of the inputs to the ANN. Two separate models were created from the code: a spray characterization model and a heat-transfer model. In the characterization model, the ANN had five input nodes corresponding to air pressure, water pressure, air and water flow rates, and nozzle height. The lone output node is denoted water mass flux. In the heat-transfer model, there were three inputs of time, surface temperature, and the water mass flux together with a coefficient of heat transfer as the only output. While the first ANN yielded good results compared to the experimental data, the second ANN produced unsatisfactory results, suggesting that the relevant experimental data should definitely be improved. This case shows the feasibility of using strategically more than one ANN in series to simplify the network structures and also suggests the possibility of the presence of faulty or inaccurate experiments. Basically, the same physical problem was treated with the ANN analysis by Ward et al. (40), using a single four-layer network with air and water pressures and the surface temperatures as the inputs, and the coefficient of heat transfer as the lone output, and there were two hidden layers, each with eight nodes to start with. One novelty here was that the errors in the output layer were used to adjust the network configuration and the activation functions until the overall root-mean-square error became acceptably low. This is another example where the network parameters, including the configuration, are allowed to systematically vary until satisfactory results are obtained. This same reference also discussed several other ANN-based analyses of industrial processes such as prediction of pollutant emissions from pulverized coal combustion, modeling of chemical vapor deposition (41), and performance prediction of a stoker-fired boiler (42).

The very last area of multiphase processes reviewed in Ref. 30. is that involved in the thermal processing of food. It is known that first-principles thermal-processing models for food are difficult to develop and essentially nonexistent, and the primary reason is due to the lack of adequate constitutive relations and the corresponding thermophysical properties. However, the critical need for such models for process optimization and control, process design, and scaling-up processes is quite self-evident. Experimental data, however, are amply available to provide the only processing information for food production, in general. Therefore, it is only natural to inquire if good modeling results can also be realized by using the ANN analysis. The study carried out by Mittal and Zhang (43) should give an indication of this prospect. It dealt with the prediction of freezing times for food products, and treated food products of arbitrary shape. Therefore, it is not really surprising to find that the training data were derived from a theoretical model such as the one developed by Pham (44). The ANN adopted was the commercially available code know as WARDNETS , and had ten input nodes of product thickness, width, length, surface convective coefficient of heat transfer, thermal conductivity of frozen product, product density, specific heat of unfrozen product, moisture content of product, initial product temperature, and the ambient temperature. The network configuration is that of 10-40-40-40-1, where the lone output was the freezing time. Different activation functions such as Gaussian, Gaussian complement, and hyperbolic tangent were all tried to determine the best choice. Similarly, the learning and momentum rates were also arbitrarily chosen at first to determine the best learning performance. A total of 44,351 data sets were generated from the theoretical model, from which 60% of the data sets, randomly chosen, were used for training and the rest for testing. The ANN results showed that for more than 83% of the test data sets, the parity error was less than 5%, while the average parity error was only 3.54%, but the maximum error was still as high as 86%. Also, an attempt was made to compare the ANN prediction with some 150 data points of experiment, and it was found that in 87% of these cases, the parity error was less than 10%.

For steady-state thermal problems, there are still many other individual efforts in exploring the applications of the basic ANN analysis for purpose of model development and functional identification. For the interested reader, the following references may offer a glimpse of additional thermal problems that have been successfully treated by the ANN methodology: thermal placement in power electronics (45), thermodynamic properties (46), thermal imaging processing (47), delay-time determination in HVAC systems (48), inverse radiation heat transfer for design and control (49), and subcooled water critical heat flux (50).

In thermal engineering, one of the most neglected areas of research is the modeling of dynamic behaviors of thermal devices and systems. All such devices in real applications operate in dynamic conditions in responding to changes in the operating parameters and boundary conditions. In addition, the performance of one device is also affected by other devices and components that are directly or indirectly connected to it, due to the thermal masses involved. Therefore, we do need dynamic models for their performance under dynamic conditions. Furthermore, there is even a more critical role for such dynamic models. As applications become more complex, especially those involved in thermal systems and processes with multitude and interconnected thermal devices, it becomes amply clear that such systems and processes must meet a certain performance requirement. One viable solution is to develop robust and adaptive control schemes and implement them to satisfy such requirement. The popular proportional-integral-derivative (PID) controllers are simply not robust enough, since in typical complex thermal problems nonlinear behaviors are the norm. Many known control schemes that are robust, stable, adaptive, and optimal all require dynamic plant models for their implementation. Here lies, then, another reason for the need for dynamic models. It has been known that traditional analysis cannot be depended on developing dynamic models even for simple thermal devices. On the other hand, it now appears that the ANN-based paradigm, with the dynamic training data obtained either online or offline, offers a real viable solution (12).

The central scheme in the dynamic modeling by ANN is the addition of time as a variable for both training and prediction. One equivalent, but more efficient method of training is to provide the variables at time t as inputs, and the values of the same variables at t+Δt as the outputs, and also keep the incremental variables small so that simple ANN configurations can be used. The training is successively carried out to cover the entire time period of a single experimental run. Such a network where the input and output change their places is generally known as a recurrent network (51-54). As an illustration, a series of tests dealing with the dynamic heating and cooling of air by the same single-row fin-tube heat exchanger shown in Fig. 2, used previously under static conditions, was carried out to obtain the dynamic data on the outlet temperatures of both air and water flows, by varying the water inlet temperatures in small incremental steps while keeping the other variables such as the flow rates of both fluids and the inlet air temperature all constant (8). More specifically, the water inlet temperature was varied in increments of 5.56°C from 32.2°C up to 65.6°C. These data were then used to train the ANNs for dynamic applications, using a simple 3-5-2 network. The three input nodes corresponded to the inlet and outlet water temperatures and the air outlet temperature at time t, and the two output nodes corresponded to the air and water outlet temperatures at the next time instant. For testing the ANN predictions, three additional experiments were performed. In the first testing experiments, the system was first brought up to a steady temperature of 60°C, and the heater was then set at 37.8°C. The resulting sudden decrease of the air-water outlet temperatures is shown in Fig. 1. The ANN predictions were excellent and the slight oscillations in the measured water outlet temperatures were due to local flow turbulence. In the second testing experiments, the water inlet temperature was ramped manually and the results are shown in Fig. 1. The ANN result was equally excellent. In the third testing experiments, we designed a system to see if the ANN analysis could predict the system behavior when an input variable was different from the one in the training data. In this case, the changes in the air and water outlet temperatures were measured when the airflow rate was first increased to a value greater than the one used in the training data, and then decreased to a lower value. It may be of interest to note that this collective deviation from that of the training data could also be interpreted as a source of noise that could indeed exist in real applications. The results are shown in Fig. 1, and the predictions are not as good. However, it is still remarkable that the ANN analysis still predicts the correct overall trends with rather small errors. It is noted, however, that all these three examples specifically deal with training and testing of first-order problems, in that the training only requires one single time step between the input and output. In general problems, it is conceivable that a higher-order training may be needed due to system complexity. However, as demonstrated by Diaz et al. (52), it is rarely necessary to train the network with data from two previous time steps, as long as the chosen time step is reasonably small. Also in the examples shown in this section, the training is carried out offline, meaning that the training of the network is carried out independently from the testing cases. However, in many thermal applications, operating conditions are either uncertain or unknown. In such cases, the training must be carried out adaptively in that the training is done online, and continuing as the new operating data become available. This obviously important feature of the ANN analysis is critical in the development of desirable control schemes for complex thermal systems and processes, as will be demonstrated in the next section.

As already mentioned before, dynamic thermal control represents the key to achieve optimum performance of complex thermal systems. As the thermal systems become even more complex in time, as universally expected in real applications to come, the need for optimal dynamic control also becomes increasingly more critical. A common practice is to use feedback control involving a system (plant) model, in conjunction with a standard PID controller. Such control, unfortunately, requires constant manual supervision to achieve, if at all, optimality. Many dynamic control strategies (8) have been proposed to alleviate the shortcomings with the PID controller, with or without a plant model. On the other hand, there are many advantages to use plant models based on ANNs, and among them is the fact that an inverse ANN model, which is nothing but the model with the input and output switched, can serve as a controller or a neurocontroller (55-58). This ANN characteristic opens up many possible ways to develop innovative control strategies for complex thermal system applications. To quote just a few examples, we find them in the use of recurrent networks (59-61), control of heat exchanger performance (51-52,62), HVAC control (59-60,63-64), and thermal process control (40,65-67). The specific purpose of these ANN control strategies is to ensure the system performance to satisfy certain prescribed requirements known as performance targets, such as stabilization after occurrence of disturbance, prescribed temperature bounds, minimum system energy consumption, and the like. The ultimate goals are that the desired control strategies and their implementation should be robust and generic, adaptive for automatic accommodation to changes in operational parameters, optimal for satisfying performance requirement under all expected operating conditions, and in real time for quick implementation of the control strategies. Another basic requirement is to ensure controllability at all times. It should be noted at the outset the current development of dynamic adaptive control strategies based on ANN for most complex thermal systems is still in the exploratory stage, and the emphasis is largely placed on developing desirable controllers for simple thermal systems such as individual thermal devices to ensure meeting their performance requirement. As it will be shown, such strategies are essentially available at the present time. However, as the number of controlled variables increases, the design of the corresponding control strategies also becomes increasingly complex, and this is the area where research studies would likely be concentrated in the future (67-68).

For the purpose of the present review, it is appropriate to illustrate the present state of the development of neurocontroller strategies and their performance by means of examples, based on studies in our own laboratory, along with some commentary on some of the ANN applications. In general, even in a simple single input–single output (SISO) control system, there are two basic issues in implementing an ANN-based control strategy. One is the decision of the basic control scheme, nonadaptive (conventional) or adaptive, and the other is to determine how the ANN, as a neurocontroller, is trained. The difference between the conventional and adaptive control schemes is that the former reacts to disturbances acting upon the controlled variables, while the latter disturbances acting upon the parameters of the process (69). Adaptive control consists of automatically adjusting in real time the parameters of the controller, the weights and biases of the neurocontroller in this case, so that a desired level of performance of the control system is achieved when the process parameters being controlled are unknown or vary with time. Training of neurocontroller can be either offline or online. The type of training shown previously to develop the three dynamic models in Figs.  111213 is a typical example of offline training. The other uses a neurocontroller that adapts to the changing conditions of the system (63). Thermal devices and their networks are systems that are subject to variations in their behaviors in time. An example is the problem of fouling, which presents a change in the operational behaviors of thermal systems. Consequently, a neurocontroller designed and built for such systems will have to adapt itself to the new operational conditions, or otherwise the control action will be biased. Many operating thermal devises and systems also have inherent unique complexity and nonlinearity. For instance, the system to be controlled not only includes the thermal device itself, but also its associated hardware such as piping, pumps or fans, heaters or coolers, and also instrumental hardware. Furthermore, there is always unavoidable delay between what happens at the thermal device and at the measurement station at some distance away, where the control signal is generated. Such delay may also vary with the changing operating conditions. Fortunately, the neurocontrollers are very well suited for these difficult tasks, as they can be taught by adaptive training to learn the responses of the system at any given time instant.

The example to be used here to demonstrate a viable neurocontroller-based control strategy and its verification by means of experiments again deals with the single-row water-to-air heat exchanger used in our earlier ANN studies, as shown in Fig. 2. In the experiments, the terminal air and water temperatures were measured by thermocouples. The airflow in the open wind tunnel was controlled by a variable-speed drive that could be operated either manually or automatically from a PC. The airspeed could be controlled within a certain range and was measured by a pitot tube located upstream of the heat exchanger. The calibration of the airflow measurements was performed using average air velocities based on ASHRAE test codes. A single point temperature measurement was used upstream of the heat exchanger and five thermocouples connected in parallel were used to obtain the outlet air temperature. Information about the four terminal temperatures of air and water flows, the water mass flow, and the airspeed, and the time instant were sent to the PC that also served as a controller. The inlet water temperature was varied by a heater with a PID-controlled electric resistance, and the water mass flow rate was changed by an electronic valve so the percentage of valve opening could be controlled as desired from the PC. The data acquisition board used could obtain up to 16 different channels of data simultaneously. LABVIEW was used to acquire and send data to the experimental system and a program written in C interfaced with it to perform the desired control function. Other details of the experiments and data acquisition can be found in Refs. 52,69. Here the thermal system was confined to a SISO system to control the air outlet temperature, and for ease of experimentation the airflow rate was taken as the control variable with all other variables fixed.

There are several control schemes that use ANN as the dynamic controller of a physical system (55). It is noted that in an earlier study, an attempt was made to implement a neurocontroller-based control scheme in which the training of the controller was carried out completely offline (52,70). However, since such schemes do not include adaptive capability, it has only limited applications, and the reader is referred to Refs. 55,69 for details. The emphasis here is placed on the adaptive-training capability, while the ANN performs its control functions such as minimizing the target error, maintaining the stability of the changing neurocontroller, and satisfying some other optimal conditions that may be imposed on the system control. In addition, the adaptive neurocontroller scheme is developed within the framework of the internal model control (IMC) (55,61,71), because of its good characteristics of adaptation, robustness, and stability. The idea behind the IMC, shown schematically in Fig. 1, is to have an ANN plant model of the heat exchanger system, designated as ANN1, in parallel with the real system. The difference between the output of the real system and the model is used as the feedback for the neurocontroller, ANN2, located in the forward path of the control scheme. The ANN1 is first trained to learn the dynamics of the plant. ANN2 is then trained to learn the inverse dynamics to be a nonlinear controller. In the adaptive-control experiments, the two neuronets were trained with information of the exit air temperature and the airspeed, while the inlet air and water temperatures and the water flow rates are kept constant. The data are normally obtained by making measurements of the system subjected to small increments in the set point. The Filter F and the Integral controller I would help the system to reach the actual set-point temperature even with the noise embedded in the measurements (52). Adaptation, as described earlier, consists of automatically adjusting in real time the parameters of the neurocontroller so that a desired level of control performance can be achieved, when the parameters being controlled were unknown or changing with time (55,69). The adaptation is done by carrying out single training cycles until the performance criteria are satisfied (72). The effects of different time constants of various components in the system do require a backup control function that can usually be provided by a PID controller until the ANNs adapt to any new operating condition of the plant.

It is also important to note that under the adaptive control mode when the neurocontroller is trained online with new incoming data, one objective is to minimize the target error between the output and the prediction. In doing so, the neurocontroller being trained may produce a dynamic unstable behavior, and therefore the stability of the controller must be continually monitored at all times during training. Such a process to maintain stability has also been developed, as given in Ref. 70. At the same time, when the ANN parameters with respect to the target error also ensure stability, it is also possible to simultaneously consider other optimal criteria, such as minimizing the plant energy consumption and driving the system to an operating point to satisfy all three desired criteria. Other functions need to be optimized can be treated the same way.

Five sets of experiments were carried out on the same single-row fin-coil heat exchanger to verify the validity of the IMC-based neurocontroller strategy. These are individually discussed in the following and it will be shown that satisfactory performance of the neurocontroller was reached in all cases (69). The first experiment corresponded to a sudden change in the set point of the outlet air temperature, with the result shown in Fig. 1. The top curve shows the time variation of the outlet air temperature, and the bottom curve shows the needed time-dependent control variable in the airspeed. The experiment consisted of turning on the controller at an outlet temperature close to 34°C, while maintaining a constant water flow rate of 2.71×104m3s. When the adaptation criteria, i.e., stability and target error, were not matched, the controller started the adaptation process to let a PID controller keep the physical plant as close as possible to the set-point temperature until the adaptation criteria were satisfied. It is seen in Fig. 1 that, during the first 30s in the test, the controller was indeed adapting, and then it stabilized the plant at the desired set-point temperature. At t=70s, the set point was suddenly changed to 33°C. The controller detected an abrupt change in the target error and started another adaptation process. During this period, the PID controller took over again and tried to keep the system close to the new set point. At about t=90s, the neurocontroller regained control of the system and stabilized it at the new set point. It is interesting to observe that the airspeed increased by about 50%.

The second test dealt with the case of water-side disturbance, caused by shutting off the water flow for a specified period of time. As shown in Fig. 1 of the experimental result, the neurocontroller was first turned on and adapted until the adaptation criteria were satisfied. The initial oscillations were mainly due to the action of the PID controller while the neurocontroller adapted. It then kept the system close to the same 34°C for the air outlet temperature. At t=100s, the disturbance was applied for a period of 30s. The neurocontroller worked until t=110s, at which point it handed the control action to the PID while it underwent a second adaptation. At t=130s, the water flow resumed. In the mean time, the PID tried to keep up with the reference temperature by reducing the airspeed to a minimum, but was unable to maintain the outlet air temperature without the water flow. The adaptation of the neurocontroller was complete around t=170s, after which it took over the control action. Figure 1 also showed the water outlet temperature during the same period. Between 100s and 130s, there was no water flow so that the water outlet temperature remained the same. When the water flow resumed, the cold water that remained in the heat exchanger flowed past the thermocouple, and was followed by hot water that was stagnant in the heater. The resulting blip in the water outlet temperature could be seen in the figure. The temperature oscillations were due to portions of cold and hot water repeatedly passing the thermocouple while circulating within the closed loop. It is also seen that the airspeed has a similar oscillatory behavior.

The third experiment was the most critical for the neurocontroller by reducing the air inlet area of the open wind tunnel, thus representing a structural change in the heat exchanger thermal system. Two separate subexperiments were performed; one by reducing the cross-sectional area gradually and the other suddenly. Figure 1 shows the measured results for the first case with gradual air-inlet area reduction. In the first 30s, the system was under PID control, and the neurocontroller gained the control at the 30s mark. From 100sto220s, the air inlet area was blocked gradually to only one-half of the original area. During this period, the neurocontroller increased the airspeed to maintain the system at 34°C. There was a point at about 190s when the ANN model was not able to characterize the system and a new adaptation began, and continued until about 260s. After it had learned, the new relation between the output air temperature and the airspeed took over the control action to stabilize the system. It is observed in Fig. 1 that there were some oscillations of the temperature between 330s and 350s marks, but the outlet air temperature finally settled down to the original set point.

An even more critical test is the same as that of the first air-side disturbance test, except with one-half of the inlet air area suddenly blocked. The results are shown in Fig. 1. At first, the outlet air temperature was close to the original set point. For the first 50s, the controller adapted until it learned the system behavior and kept the temperature stable at the original value. At 150s, half of the air inlet area was blocked suddenly, and the controller adapted until it learned the characteristics of the new system. It is seen in Fig. 1 that the airspeed increased approximately 50%. Finally, at about the 240s mark, the neurocontroller regained its control of the plant and stabilized the system at the original set point.

The next and the last experiment is not a demonstration to show the feasibility of the IMC-based neurocontroller, but to illustrate an important new capability of this controller. In addition to the desired control to achieve the accuracy, adaptation, and stability of this control strategy, this neurocontroller is also able to handle any optimal condition or conditions that may be imposed on the control by the system operation. In this last experiment, it was also deemed desirable to minimize the energy consumption in the thermal system, though any other optimal condition can be similarly treated. It was first determined that from exploratory experiments, it was found that the electric heater was the thermal component that consumed the most energy, and that the lower water and airflow rates would lead to lower use of the thermal energy. From these exploratory experiments, the energy for each sampled measurements would give the direction of decreasing energy use. The controller would then drive the system in this direction. However, if the controller senses that the system is behaving in an opposite way, it will then adapt to the new desirable characteristics of the system. The implementation of this capability is done as follows. In addition to the two adaptation criteria for the weights and biases of the ANNs, i.e., low target error and stable operation, a third is added representing minimizing of energy consumption. When the two basic adaptation criteria are satisfied, the training based on the third adaptation criterion is then turned on. The result of this experiment is given in Fig. 1, showing the lower airspeed after the last training is turned on, thus implying lower water flow rates, while the set-point temperature is still maintained.

While the IMC-based neurocontroller strategy is seen to satisfy all nonlinear dynamic control requirements as demonstrated by the above experimental tests, this demonstration has been limited to SISO systems. In principle, this is not a limitation of the adaptive neurocontroller strategy. However, it is true that for general multiple-input and multiple-output (MIMO) systems, the corresponding control strategy does become very complex, and as already pointed out, it represents a very fruitful area of research for ANN-based applications. Also, it may be important to point out that an even newer research paradigm based on broad-based artificial intelligence (AI) methodologies may be needed to develop new strategies for dealing with very large scale and very complex thermal systems that require versatile scalability to reduce system complexity to manageable levels. A small step in this direction is already emerging, and it is based on the development of integrated hybrid AI methodologies, as discussed in the next section.

As it is generally known, the ANN methodology is one of the most successful AI methodologies or soft-computing algorithms that have found applications in thermal problems and systems (1,8). However, there are other (AI) algorithms that are also very successful in this regard. Good examples include evolutionary algorithms (GA and GP), clustering analysis, fuzzy-logic systems and control, expert systems, data mining, virtual memory, and others. Their applications deal with many tasks such as pattern recognition, decision making, system control, information processing, natural languages, symbolic mathematics, speech recognition, artificial vision, and robotics. One interesting characteristic of these methodologies is that they are mostly complementary rather than competitive, and it is often somewhat advantageous to deploy them in combination than exclusively to achieve higher levels of utility, efficiency, and performance in applications in thermal problems and systems. As discussed earlier regarding the implementation issues in the basic ANN analysis, there are free parameters and steps in the nodal calculations that need to be determined, as well as in the learning algorithm that could be modified or optimized. Even though all of these determinations can be ascertained in given applications by numerical experiments, it is nevertheless not a very efficient process. This is where all the other soft-computing and AI methodologies can be called upon to improve the basic ANN analysis as additional subroutines. Conversely, it is also plausible that the ANN itself can be considered as subroutines to be inserted into the other AL methodologies to improve their own performance.

As all the major (AI) methodologies, including the ANN, are still under active development, the progress in developing hybrid algorithms understandably has not yet been very robust, However, it is appropriate here to cite just a few examples of these ANN-based hybrids to illustrate where we are now and also suggest areas for further studies. Genetic algorithms (Gas), as an evolutionary algorithm (1,73), is a rather popular search engine to find various optima in conjunction with ANN and ANN fuzzy-logic applications. Vonk et al. (74), for instance, proposed a methodology to automatically generate configuration based on evolutionary computation such as GA. Kaminski et al. (75) studied the thermal deterioration processes by combined ANN and GA analysis. Oliveira and Sousa (39) utilized GA with a statistical estimator to determine the relevance of complex input parameters for ANN applications. Tarca et al. (76) adopted an integrated GA-ANN strategy for modeling important multiphase flow characteristics. Clustering, as already demonstrated, involves pattern recognition of subgroups within a large database (25,77-80), and can be used for knowledge generation as well as to determine the relevant input parameters similar to that in Ref. 39. Also, it is of great interest to note a major emerging hybrid AI methodology known as neuro-fuzzy or fuzzy-neuro system, which has been shown to be very successful in dealing with pattern recognition of noisy and fuzzy sets of data. Two texts with somewhat different emphases are those of Jany et al. (81) dealing with the computational methodology and Brown and Harris (82) with adaptive modeling and control. These hybrid algorithms have also been used for several individual applications, such as the GA-based neuro-fuzzy system for temperature control (83), development of a fuzzy-neural module for home comfort in buses (84), prediction of critical heat flux using fuzzy theory and ANN (85), and the study using an ANN module in series with a fuzzy-logic module (31) for flow-regime identification in two-phase flows, as already mentioned in the section on ANN-based steady thermal problems.

The current status of ANN-based hybrid algorithms is definitely gaining popularity as they are capable of dealing with increasingly more complex thermal phenomena and problems. It is definitely expected that, as fundamental research on AI methodology including the ones on ANN, we will see many fruitful developments of hybrid algorithms to address thermal problems that are difficult to handle at the present time.

From the present brief review, it is seen that ANN methodologies and the corresponding new paradigm represent a promising way to approach and solve difficult thermal problems. Shortcomings such as the need for reliable experimental data and uncertain choice of free parameters in the basic ANN methodology do exist, but their effects are expected to be reduced in time as results of more basic studies on the methodology become available. Much of that is on going right now. On the other hand, with the ever-present noisy, imperfect, and uncertain information in real-world applications, it is perfectly logical to suspect that ANN alone may not provide the whole answers to complicated problems, but also rely on combined solutions from the AI universe. This movement is now already under way, and thermal engineers must be prepared to participate and lead, if the thermal science and engineering discipline continues to flourish. A good example is that now it is well within reach to treat very complex thermal systems in scalable design and control such as large thermal plants with highly interconnected subsystems involving multitude of thermal devices, by means of dynamic agent-based control with the full implement of all the major AI methodologies (86-88).

The present paper gives a brief review of the current status of applying the new artificial neural network (ANN) paradigm to difficult and complex thermal problems that cannot be readily solved by traditional approaches. In addition to the description of the basic ANN methodology and their possible variations, examples are given in terms of three major areas of ANN applications, namely, steady thermal problems, dynamic thermal modeling, and adaptive thermal control. Attributes of the ANN results are shown in terms of accuracy and flexibility in its use, and also their computational and experimental validations. Also discussed are hybrid algorithms in which the ANN plays a major role, along with other associated AI methodologies, for improved performance in the applications. Finally, some broad future prospects are also indicated.

The author is grateful to the late Mr. D. R. Dorini for supporting much of the studies mentioned in the review in Hydronics Laboratory. He would particularly thank his colleague and collaborator, Professor M. Sen, for his engineering and mathematical insight and the enthusiasm for AI. He also would like to thank his friend and colleague, R. L. McClain, for his support of our undertakings. In addition, he wishes to acknowledge the significant contributions of former students X. W. Zhao, G. Diaz, A. Pacheco-Vega, and W. H. Cai.


steepness factor


tube diameter, m


error function in each run




height of heat exchanger, m


total layer number


layer number


node designation


number of nodes in the output layer


maximum value of j in Layer i


node number in each layer


sensible heat Colburn j-factor


total heat Colburn j-factor


mass rate of flow, kg∕s


number of circuits in heat exchanger


number of columns in heat exchanger


number of rows in heat exchanger


total rate of heat transfer


Q based on ANN prediction


Q based on correlation


mean value of the ratios of experimental heat-transfer rate to that of ANN prediction


Reynolds number




temperature, °C


time, s


tube sheet thickness, m


target value at the output layer node


air velocity, m∕s


width of the heat exchanger, m


synaptic weight


node input


lateral tube spacing, m


vertical tube spacing, m


node output




fin spacing, m


node error


node bias


learning rate


activation function variable


standard deviation


node activation function


air side


dry bulb






wet bulb

Copyright © 2008 by American Society of Mechanical Engineers
View article in PDF format.



Grahic Jump Location
Figure 1

Schematic of a fully connected multilayer ANN

Grahic Jump Location
Figure 2

Fin-tube heat exchangers as thermal systems

Grahic Jump Location
Figure 3

Training errors for 4-5-2-1-1 ANN for heat exchanger in Fig. 2

Grahic Jump Location
Figure 4

Comparison of ANN and correlation predictions of heat transfer for heat exchanger in Fig. 2, O correlations, +ANN, dotted lines representing ±10% deviations

Grahic Jump Location
Figure 5

A 5-5-3-3 neural network for the analysis of heat exchanger of Fig. 2

Grahic Jump Location
Figure 6

Experimental versus ANN-predicted total heat transfer for the heat exchanger in Fig. 2 under dry-surface conditions

Grahic Jump Location
Figure 7

Experimental versus ANN-predicted total heat transfer for the heat exchanger in Fig. 2 under dropwise condensation conditions

Grahic Jump Location
Figure 8

Experimental versus ANN-predicted total heat transfer for the heat exchanger in Fig. 2 under film condensation conditions

Grahic Jump Location
Figure 9

Configuration of an 11-11-7-1 neural network for the evaporator heat exchanger in Fig. 2

Grahic Jump Location
Figure 10

Experimental versus ANN-predicted total heat transfer for the heat exchanger in Fig. 2 under limited data

Grahic Jump Location
Figure 11

Dynamic prediction by ANN for sudden cooling

Grahic Jump Location
Figure 12

Dynamic prediction by ANN for ramped heating

Grahic Jump Location
Figure 13

Dynamic prediction by ANN for changes in the airflow rates

Grahic Jump Location
Figure 14

General IMC structure with integral control

Grahic Jump Location
Figure 15

Sudden change of the set point in the air-outlet temperature under IMC-based adaptive neurocontrol

Grahic Jump Location
Figure 16

Response to water-side disturbance by water-flow disruption under IMC-based adaptive neurocontrol

Grahic Jump Location
Figure 17

Response to air-side disturbance with gradual reduction of air-inlet area under IMC-based adaptive neurocontrol

Grahic Jump Location
Figure 18

Response to air-side disturbance with sudden reduction of air-inlet area under IMC-based adaptive neurocontrol

Grahic Jump Location
Figure 19

Energy minimization routine to reduce both flow rates while maintaining set temperature as an added ability of the neurocontroller


Table Grahic Jump Location
Table 1
Typical synaptic weights and biases for ANN configuration of 4-2-1 associated with ANN analysis of heat exchanger in Fig.
Table Grahic Jump Location
Table 2
Comparison of relative mean heat-transfer rates and the corresponding standard of deviation by different ANN configurations for heat exchanger in Fig.
Table Grahic Jump Location
Table 3
Comparison of percentage errors in predictions between the ANN and standard power-law correlations for ANN analysis of heat exchanger in Fig.


Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In