## Abstract

Objects moving in water or stationary objects in streams create a vortex wake. An underwater robot encountering the wake created by another body experiences disturbance forces and moments. These disturbances can be associated with the disturbance velocity field and the bodies creating them. Essentially, the vortex wakes encode information about the objects and the flow conditions. Underwater robots that often function with constrained sensing capabilities can benefit from extracting this information from vortex wakes. Many species of fish do exactly this, by sensing flow features using their lateral lines as part of their multimodal sensing capabilities. Besides the necessary sensing hardware, a more important aspect of sensing is related to the algorithms needed to extract the relevant information about the flow. This paper advances a framework for such an algorithm using the setting of a pitching hydrofoil in the wake of a thin plate (obstacle). Using time series pressure measurements on the surface of the hydrofoil and the angular velocity of the hydrofoil, a Koopman operator is constructed that propagates the time series forward in time. Multiple approaches are used to extract dynamic information from the Koopman operator to estimate the plate position and are bench marked against a state-of-the-art convolutional neural network (CNN) applied directly to the time series. We find that using the Koopman operator for feature extraction improves the estimation accuracy compared to the CNN for the same purpose, enabling “blind” sensing using the lateral line.

## 1 Introduction

The locomotion of fish and other aquatic swimmers has many desirable characteristics such as energy efficiency, agility, and stealth [1,2], which have inspired mimicry in bioinspired robots [^{3},^{4}]. Closely related to and aiding the locomotion is the ability of fish to sense and process the spatiotemporal information in the water around them. Objects moving in water or stationary objects in streams create a vortex wake. An underwater robot encountering the wake created by another body experiences disturbance forces and moments. These disturbances can be associated with the disturbance velocity field and the bodies creating them. Essentially, information about fluid flow and the objects that create these flows is encoded in the spatiotemporal evolution of the vortical structures, whether the bodies creating them are cylinders, hydrofoils, underwater robots, or fish. Underwater robots that often function with constrained sensing capabilities can benefit from extracting this information from vortex wakes. Many species of fish do exactly this, by sensing flow features using their lateral lines as part of their multimodal sensing [^{1},^{5},^{6}].

The complexity and high (infinite) dimensionality of fluid flows around a swimmer present significant challenges to emulate fish-like hydrodynamic sensing and extract the relevant information from sensor data of the flow. This particular challenge is not restricted to bioinspired fish-like swimmers, but has been present in the broad areas of fluid flow estimation, model reduction, and active flow control. Proper orthogonal decomposition (POD) [^{7},^{8}] and gappy POD [9] have been tools for model reduction in turbulent flows for decades and have also been applied for unsteady flow sensing past an hydrofoil and estimation of surface pressure [^{10},^{11}]. Model reduction of complex flows using the Koopman operator approach has extended the POD approach to a dynamical systems framework [^{12},^{13}]. Subsequent developments in the application of machine learning in dynamical systems have created algorithms for learning the dynamic modes or Koopman modes of a dynamical system from often sparse data [14–16]. Similar methods combining machine learning with dynamical systems are increasingly playing an important role in model reduction in fluid mechanics, flow reconstruction, and flow classification, see, for instance, Refs. [17–26]. Flow estimation in the near field of a body by selecting from known fluid modes by the dynamic mode decomposition (DMD) using surface pressure data has been studied in Ref. [27], and incorporating traditional filtering into this approach was recently shown to allow updating the estimation in real-time [^{28},^{29}].

This paper considers a different but related problem motivated by underwater robots where on-board sensors such as inertial motion units and pressure sensors can measure only dynamic and kinematic variables of the robot itself or pressure on the surface of the robot but not measure the ambient pressure and velocity fields. We consider the problem of the estimation of the spatial location of an upstream obstacle in a flow past a pitching hydrofoil. It is assumed that pressure on the surface of the hydrofoil can be measured at a small number of fixed locations on the body. This is related to the problems considered in Refs. [30] and [31], where pressure sensors were used to localize a body and a dipole, respectively, by assuming potential flow which allows for the application of analytical methods. However, frequently fluid–structure interaction is driven by significant viscous effects including vortex shedding which make purely potential flow models inaccurate. Vortex induced vibrations and coupled fluid–structure models cannot in general be described by simple mathematical models. This motivates data-driven approaches, such as the one introduced in Ref. [32] to localize a source in three dimensions. However, in that work, only a single time snapshot of velocity data is used to perform each estimation, which is possible because the flow was steady except for the movement of the source; for the fluid–structure interaction problem considered in this paper, using a time series of pressure data is necessary due to the unsteady wake generated by the both obstacle and the trailing hydrofoil itself. The field of soft sensing [33] has techniques for regression from time series data, which typically resolve the high-dimensionality of the time series by taking as features its statistical moments (such as mean and variance) or data from a few selected time instants [34]. This approach discards much of the dynamic information from the data, which is often acceptable for systems that are roughly steady, but is a significant drawback when applied to an unsteady fluid system. Based on the success of modal approaches for extracting information about the underlying dynamics of fluid data, we propose a method that extracts information about the system dynamics from the pressure time series, and uses that information as features to identify the location of the obstacle. Using time series pressure measurements on the surface of the hydrofoil, a Koopman operator is constructed that propagates the snapshots of pressure data forward in time, thereby encoding the system dynamics. Multiple approaches are considered to extract the encoded information for use in estimating the position of an upstream obstacle. These include the “direct mode estimation” approach, where the most important modes (eigenvectors) of the operator are input into a dense neural network (DNN), the “spectral image estimation” approach where the spectrum (or the singular vectors) of the operator is extracted and input into a convolutional neural network (CNN), and the “mode-kernel estimation” approach, where the modes constructed from training data are compared to a dictionary of known modes. This is benchmarked against the time CNN [35], a recent black-box CNN architecture designed for classification of multivariate time series, in the “CNN-based estimation” approach.

The remainder of the paper is organized as follows. In Sec. 2, we define the exact fluid-interaction problem considered and discuss its implementation in simulation. In Sec. 3, we review the theory behind standard and exact DMD and its connection to the Koopman operator, and their relevance to the estimation problem are explained in Sec. 4. The training speed and accuracy of the estimation methods are investigated on the training data in Sec. 5.

## 2 Problem Definition

We consider the problem of flow past a symmetric NACA-0018 hydrofoil of unit chord length, representing a streamlined swimmer, pinned at its leading edge with a linear spring of stiffness $k=6$ N·m/rad and damping coefficient $c=2$ N·m s/rad. When the spring is at rest, the hydrofoil is horizontal with the leading edge pointing left. The hydrofoil is immersed in a freestream flow with velocity $U\u221e=10$ m/s. The fluid is of unit density (equal to that of the hydrofoil) and has viscosity is $\nu =0.001$$m2$/s. A rectangular bluff body of unit height and width 0.1 m is placed upstream of the hydrofoil and disturbs the incoming freestream flow by shedding and unsteady wake in the fluid. This disturbed flow interacts with the downstream hydrofoil, inducing angular motion and a time-varying pressure profile on the surface. The relative position of the two bodies is defined by the tuple $(b,d)$ as shown in Fig. 1. Along the body, $Ns$ pressure sensors are placed that are evenly spaced in the horizontal direction and record the absolute pressure at a frequency of 40 Hz.

The problem considered in this paper is the estimation of the parameters $(b,d)$ using only measurements of pressure made at the $Ns$ locations on the hydrofoil.

This flow simulation domain *D* is a rectangle of width 30 m and height 16 m. The leading edge of the hydrofoil defines the origin of the rectangular domain and is located 1 m to the left of and vertically collocated with the domain's center. The center of the rectangular obstacle is placed *b* m to the left and *d* m above the origin. At the left boundary of the domain, the horizontal velocity, $u=20$ m/s, and a vertical velocity, $v=0$ m/s, are imposed. To allow unimpeded exit of the flow at the right boundary, a zero gradient condition is imposed on the velocity, so $\u2207u=\u2207v=0$. At the top and bottom of the domain, a “slip” condition enforces that no flow passes through the boundary ($v=0$) and that there is no shear force at the boundary ($\u2202u/\u2202y=0)$. On the walls of the plate and hydrofoil, a no-slip condition ensures that the velocity of the flow relative to the bodies is zero at the surface.

where $\theta $ is the angle of the hydrofoil, and $Ip$ is the mass moment of inertia of the hydrofoil about the pin. Through the forcing term and a surface boundary condition, this equation is coupled with the fluid equations (1) and (2), resulting in complex fluid–body interaction. This interaction is solved by iteratively solving the flowfield and body acceleration until convergence is achieved. These states are then integrated forward using the forward Euler method with an adaptive time-step $\Delta t$ selected such that the Courant number $C<0.85$ and $\Delta t\u22640.02$. This is repeated until time $t=50$ s. When the hydrofoil rotates, the mesh points in a region of 0.3 m from its surface rotate rigidly with it, while the mesh points greater than 1 m from the surface remain fixed.

The pressure field from a sample simulation is shown in Fig. 3(a). Regions of alternating low pressure to the right of the flat plate show a 2*S* vortex street. As the vortices interact with the hydrofoil, a high-pressure region is generated that, when combined with the low pressure inside the vortex, generates a net moment on the hydrofoil and thus motion. This pattern repeats in time, symmetrically on both sides of the hydrofoil for $d=0$ and asymmetrically for $d\u22600$. The time-varying pressure measured at four of the $Ns=10$ sensor locations is shown Fig. 3(b). The nonzero value of *d* results in an asymmetrical pressure profile in time: pressure measured at the top center and bottom center of the hydrofoil show differences in temporal evolution besides the obvious phase difference.

## 3 Dynamic Mode Decomposition

*N*grid points $(xi,yi)\u2208\mathbb{R}2$ with sample times $t\u2208{0,\Delta t,2\Delta t,\u2026,Nt\Delta t}$. The numerical solutions in Sec. 2 provide the velocity field $(u,v)$ and the pressure field

*p*and the dynamical system $x\u02d9=u(x,y;b,d)$, $y\u02d9=v(x,y;b,d)$ that depends on the parameters

*b*and

*d*. We will consider a time discretization of this dynamical system governed by a flow map

*F*as

The scalar valued observable function of interest in the problem is the pressure field, $p(x,y)$. The pressure at the *N* mesh points at time $tn=n\Delta t$ is denoted by $hf(tn)=p(xi,yi,tn)\u2208\mathbb{R}N$. A subset of $hf$ measured at the pressure sensors on the hydrofoil (shown in Fig. 1) located at the mesh points $(\xi i,\eta i)$ is denoted by $hs$. Its snapshot at time $tn$ is $hs(tn)=p(\xi i,\eta i,tn)\u2208\mathbb{R}Ns$. As a result, the snapshots $hf(t)$ also contain the snapshots $hs(t)$. We define two operators $Kf$ and $Ks$ such that $hf(tn+1)=Kfhf(tn)$ and $hs(tn+1)=Kshs(tn)$. The matrices $Kf$ and $Ks$ are the representations of Koopman operator $K$ composed with orthogonal projections on spaces spanned by elements of $hf$ and $hs$, respectively.

*m*time snapshots of the surface pressure measurements $p(\xi i,\eta i)$, which are few in number. We construct two matrices $Xs\u2208\mathbb{R}Ns\xd7m$ and $Ys\u2208\mathbb{R}Ns\xd7m$ of these measurements

^{12},

^{40},

^{41}]. Essentially, as the number of sampling points increases, the Koopman operator calculated by DMD or the extended dynamic mode decomposition would converge to the orthogonal projection of the action of the Koopman operator acting on the space of the observables chosen as basis functions, see, for example, Ref. [41] for a discussion. We note that other errors besides $\u2225Ys\u2212KXs\u2225F2$ can be used to calculate $Ks$. For example, instead of the average error considered in Eq. (8), one may prioritize having a tight upper bound on the error as in Ref. [42]. When the argument of Eq. (8) is zero, $Ks$ exactly propagates the snapshots $hs$ forward in time. This problem has a convex solution

*K*, this solution is unique. The modes of the operator $Ks$ are then calculated as the eigenvectors and eigenvalues of $Ks$

*m*time-steps after an initial data snapshot $hs(tn)$ by

is a vector of complex numbers defining the relative magnitude of the modes, as well as their relative phases. The magnitude of $\alpha i$ is roughly equivalent to the concept of “mode energy” in POD, and in the case that most of the “energy” is concentrated in a small number of modes, the flow can be reconstructed with high accuracy using only those few modes. Similarly, neglecting modes with small corresponding values of $\alpha $ causes only minimal reconstruction error. This reconstruction ability is demonstrated in Fig. 4, where an operator is generated using data from time 10 s to 13.75 s, and reconstructs the pressure starting at a time of 10 s. Compared to the actual pressure, the reconstructed pressure is qualitatively correct, with noticeable error only emerging in the high frequency component of the dynamics. This reconstruction is nearly the same even for $Ns=100$, implying that a small number of measurements capture many of essential features of the pressure distribution.

*N*is small, applying this method on the field data would be computationally intractable because of the size of $Kf$ and the resulting complexity of calculating its eigenvalues. For the field measurements, we use “standard DMD,” which introduces a degree of truncation to greatly increase the speed of the calculation. Shifted data matrices are first constructed similarly to Eq. (7)

*U*and

*V*are the left and right singular matrices, respectively, and $\Sigma $ is a diagonal matrix containing the singular values. To improve computation speed, $\Sigma $ can be truncated starting with its smallest singular values to a smaller $r\xd7r$ matrix $\Sigma t$, and the corresponding columns of

*U*and

*V*can be removed, resulting in truncated matrices $Ut$ and $Vt$, respectively. This allows approximating the matrix $Kf$ (in a reduced-dimensional space) as

and $hf(tn)$ is a snapshot of the flow field at $t=n\Delta t$.

### 3.1 Dominant Pressure Modes.

Figure 5(c) shows a plot of $\sigma s(n)$ and $\sigma f(n)$, revealing that $\sigma s(3)>0.9$ and $\sigma f(3)>0.9$, i.e., for the first three modes of both the fluid field pressure and surface pressure contain much of the information about the respective fields. The first three modes $\psi 1,\psi 2$, and $\psi 3$ of the pressure field in the fluid domain are shown in Figs. 6(a)–6(c) which also reveal the physical features of the flow. The first mode $\psi 1$ corresponds to the pressure field of the steady flow, with the next two modes identifying changes in pressure due to the vortex shedding past the obstacle (leading bluff body). The eigenvalues $\Lambda s$ and $\Lambda f$ (in Figs. 5(a) and 5(b)) corresponding to the dominant modes have similar phases: one with a phase of zero, one with a phase near 0.45 (corresponding to the vortex shedding frequency), and another with a phase near 0.90 (corresponding to the second harmonic of the vortex shedding). The modes $\varphi 1$, $\varphi 2$, and $\varphi 3$ for the surface pressure on the hydrofoil are also shown in Figs. 6(d)–6(f) for the case where $Ns=10$.

## 4 Parameter Estimation

We use four different approaches all based on supervised learning to the problem of estimating the parameters $(b,d)$. The first of these is based on using a convolutional neural network (the “TIME” CNN [43]) which takes as input the time series pressure measurements at the $Ns$ locations on the trailing hydrofoil. This approach sets a benchmark that uses one of the latest machine learning methods for time series data. The next three approaches are based on combining calculations of DMD with machine learning.

Supervised learning necessitates collecting multiple sets of simulations with nonoverlapping values of $(b,d)$: one set to train the estimator (the “training” set), one set to validate how well trained the estimator is (the “validation” set), and one set to test the accuracy of the resulting estimator (the “testing” set). The parameter values used for each dataset are shown in Fig. 7. The training set has parameters on an even $6\xd76$ grid ($a=36$ points total) filling the region $4\u2264b\u22649$ and $0\u2264d\u22641$, which is selected because the wake in that region is well-developed and has not fully dissipated. The even spacing of these training points is designed to sample the entire space. However, in practical applications, the parameters would likely not fall on a grid, but rather be scattered throughout the acceptable parameter space. For this reason, the validation and testing data are selected from a uniform probability distribution in the region $4\u2264b\u22649$ and $0\u2264d\u22641$, with $c=20$ points for validation and $e=38$ points for testing.

From each simulation, pressure is known at $N=15,493$ points in the domain, with 80 of those points on the surface of the body. The pressures are interpolated with a spline function to determine the pressures at ten evenly spaced points, which represent physical pressure sensors and is the only data from the simulation for the parameter estimation, in training, validation, and testing.

*k*th pair of parameters $(bk,dk)$ at time $tj$ is denoted by $pk(\xi i,\eta i,tj)$. Data from the simulations are stored in three different sets

where $Tk$ is a matrix corresponding to the *k*th pair of training parameters, $(bk,dk)$, and $Tijk=pk(\xi i,\eta i,tj)$ is the pressure at the *i*th pressure sensor and *j*th time interval. Similarly, the *k*th validation dataset $Vk$ is the matrix $Vijk=p(k+a)(\xi i,\eta i,tj)$, and the *k*th test (or evaluation) dataset $Ek$ is the matrix $Eijk=p(k+a+c)(\xi i,\eta i,tj)$, respectively. We define a sampling function *S* which acts on these sets and returns a random window of 150 time-steps of continuous data from a random element, as well as the parameters $(b,d)$ at that element. For instance, $S(T)=(Tijk,b(k),d(k))$, where *k* is a random variable uniformly distributed over the set ${1,2,\u2026,36}$, *i* is the range of integers where $1\u2264i\u226410$, and *j* is a range ${j0,j0+1,\u2026,j0+150}$ such that $j0$ is a uniformly distributed random integer in the range $t0\u2264j0\u2264tf\u2212150$. The initial time-step is chosen as $t0=400$ which allows transient flow conditions to decay, and the final time is selected as $tf=1000$. Therefore, each matrix of data in training, validation, and test sets have 150 time-steps of data with the length of each time-step being $\Delta t=0.025$ s.

*L*) in the estimation is minimized, or

Below, four different architectures for the feature extractor $F$ are introduced, two of which are nonparametric (so $\omega =\u2205$). After optimizing the parameters for each, the overall estimation error values are investigated to determine which feature extractor is most effective. To isolate the effect of changing $F$, the architecture of $D$ is kept the same for all feature extractors, though $\theta $ is re-optimized for each $F$.

The function $D$ is a DNN, selected due to their general utility in estimation problems. It is fully connected and uses “Rectified linear unit” activation, with sequential layers containing 200, 100, 100, 50, and 50 nodes. The output layer has two nodes, representing estimates of *b* and *d*, and has a linear activation function.

### 4.1 Convolutional Neural Network-Based Estimation.

For time series classification problems, neural network-based parametric approaches have been found to have performance comparable to common nonparametric approaches [43]. The best suited networks for this task are recursive neural networks such as long short-term memory networks and CNNs, which extract features from time series data by repeatedly applying a single-dimensional kernel. We use a CNN as a benchmark feature extractor, using the hyperparameters determined to be optimal for time series classification in Ref. [35], a network architecture known as the “time CNN,” which has been benchmarked against other time series classification algorithms in Ref. [43]. Though we consider estimation problems instead of classification, estimation can in a sense be viewed as a subset of parameter classification, as simply averaging the classification probability distribution on different “bins” of parameter values yields an estimate. As a result, we expect a network architecture designed for classification to be effective at estimation. In summary, the architecture requires two layers of alternating one-dimensional convolutional neural networks with sigmoid activation and average pooling operations. The filters have length 7, and the pooling operations operate on groups of three values, reducing the dimension of the latent space by roughly a factor of 3 with every application. No padding is used on the convolutional steps. The single input time series vector is split into six vectors using six separate filters at the first step, and this is increased to 12 vectors for the second step. After the final pooling operation, the vectors are concatenated into a single feature vector of length 168, which serves as the output of $F$. This procedure is visualized in the top row of Fig. 8.

### 4.2 Direct Mode Estimation.

Though the CNN can extract features from dynamic data, it uses a black-box approach that does not reveal any specific information about what dynamics it has identified. By contrast, the highest magnitude Koopman modes of dynamic data also form a reduced basis of the flow dynamics, which may also serve as features for further estimation. Further, an association between the local modes of the surface pressure and the modes of the entire fluid field are visually obvious, as shown in Figs. 6(d)–6(f). The surface pressure modes in Figs. 6(d)–6(f) calculated from $Ns=100$ observations are shown as a continuous mode, and their values at the smaller number $Ns=10$ of sensor locations are shown as circles. Despite the different quantities of observations, these modes are qualitatively very similar. Because the modes of the larger flow field are clearly strongly affected by $(b,d)$, this motivates the possibility that significant information about $(b,d)$ has been passed from the larger fluid modes to the pressure modes on the surface, which can be evaluated from a sparse set of observations.

Because the dominant three modes have been demonstrated to contain the majority of the information needed to reconstruct the flow, they are selected as features of the data, to be input into the DNN which performs the final estimation. However, to ensure performance, they must first be standardized. This takes two forms: standardizing their order and standardizing their phase. The motivation for standardizing the order is simple: if for one $(b,d)$ pair the first mode input to the DNN corresponds to the zero phase mode, but for an adjacent $(b,d)$ pair the first mode input to the DNN corresponds to the second harmonic, the large difference between those mode shapes due to their representing different physical phenomena conceals the subtle changes in the mode shapes that would be useful in estimating $(b,d)$. More formally, we label modes such that for a mode $\varphi 11$ with parameters $(b,d)$ and $\varphi 12$ with parameters $(b+\u03f51,d+\u03f52)$, for small values of $(\u03f51,\u03f52)$ we expect $\u3008\varphi 11,\varphi 12\u3009\u22481$, and their corresponding eigenvalues $\lambda 11/\lambda 12\u22481+0i$.

*i*that satisfy $\u2111(\lambda i)<0$. Modes are then labeled using the procedure

The values of $\lambda 1$, $\lambda 2$, and $\lambda 3$ are labeled by the same procedure.

where $|\varphi |$ denotes the elementwise magnitude of $\varphi $, and as a result each element is a real scalar. The vector *Z* of length 36 is then the output of the feature extractor function $F$.

### 4.3 Mode-Kernel Estimation.

Because modes can be interpreted as features of the system that contain a condensed representation of the system dynamics, it is likely that on top of parametric approaches (like DNNs), nonparametric algorithms which compare the test data directly with the stored training data may also be applicable. Many such algorithms exist, but here we attempt a kernel-based approach. For each parameter pair *k* in the training dataset, sorted benchmark modes $[\varphi 1\u2009\varphi 2\u2009\varphi 3]|k$ are identified using the procedure for exact DMD and the sorting from Sec. 4.2, except using the entire simulation data $Tk$ instead of a sample of $Tk$. These modes are stored in a library $\Omega $ such that $\Omega k=[\varphi 1\u2009\varphi 2\u2009\varphi 3]|k$.

The estimation procedure works by estimating the similarity of the modes $\varphi 1,\varphi 2,\varphi 3$ from the sample $S(T)$ with the dictionary of benchmark modes using a kernel function. In this case, the kernel function is the inner product, so the similarity between modes is defined as $\beta ik=|\u3008\varphi i,(\Omega k)i\u3009|$, where a value near 1 indicates a high similarity. This similarity can be used to estimate $(b,d)$ through simple algorithms, for example, the *K*-nearest neighbors algorithm. However, because of the high performance of parametric approaches to time series [43], as well as to keep consistency with the other approaches, we instead flatten $\beta $ into a feature vector of length 108 to be output from $F$ into $D$.

### 4.4 Spectral Image Estimation.

is the singular value decomposition of $Ks$. The matrix $UK$ is then passed into a two-dimensional CNN. Because two-dimensional CNNs are most often used for image classification, this can be interpreted as treating $UK$ as an image. For consistency, the hyperparameters of this network are chosen to be similar to those in Sec. 4.1: the convolutional steps have a filter size of 7, and max pooling is performed with a pool size of 3. However, because the operations are performed on an input matrix of size $10\xd710$ instead of a vector of size 150, zero padding is required to allow applying two layers of convolution without the input becoming smaller than the filter. For the same reason, only one pooling operation is applied, located after both of the convolutional layers. After the pooling operation, the result is flattened into a feature vector also of length 108, which is then output from $F$.

### 4.5 Training.

Each of the four estimation approaches is trained separately, with unique weights for each. For methods that have parameters $\omega $ associated with the feature extraction, $\omega $ is trained jointly with $\theta $. Because of the risk of weight optimization finding local minima with different final loss values, a batch of ten estimators is trained for each of the estimation approaches, with the training process of each individual estimator termed a “run.” In total, 40 weight optimizations are performed.

which minimizes the error between predicted and actual values of $(b,d)$ in a mean-squares sense. Instead of performing this summation over the entire set at once, it is efficient to iteratively perform the summation over smaller batches. Here, we chose a batch size of $if\u2212i0=50$. This batch-based optimization allows the use of the *adam* algorithm, which calculates the parameter updates at every step using a combination of the gradient of the current batch and the “momentum” from gradients calculated on past batches. After the entire dataset has been iterated over (known collectively as an “epoch”), an early-stopping criterion is checked, and either the training stops or continues with a new set of batches.

*i*, then $Lmin,i:=Lv$ and the weights $\sigma min,i=(\theta ,\omega )$ are saved before continuing. However, if at any epoch $Lv>1.3Lmin,i$ and at least 200 epochs have passed, it is assessed that the network is overtrained and the training run is terminated. The representative weights of the estimation approach, $\sigma opt$, are then defined as

which selects the weights form the run with the lowest loss.

## 5 Results

The loss values during training on the validation data for the estimation approaches are shown in Fig. 9. The mode-kernel estimation approach has the highest average loss after epoch 50, though the lowest mode-kernel estimation approach loss is still lower than the average loss for all of the other three estimation approaches at the end of training, indicating that the overall difference between the estimation approaches is small. The other three estimation approaches have very similar average losses by the end of their training. However, the direct mode estimation approach has the run with the lowest loss by a small margin. A *t*-test is used to determine whether the mean minimum validation loss is statistically different between the methods. The result is that none of the methods used have a statistically significant difference in their mean minimum loss values from the CNN-based estimation approach (using $P=0.05$ as the cutoff for significance).

The optimal weights $\sigma opt$ can be evaluated on the testing data to compare the best estimated values of $(b,d)$ for each approach with the true values. These predicted and real values are shown in Fig. 10. The number of unique $(b,d)$ pairs in the testing data (38) is much less than the number of samples evaluated (760), leading to many different estimations of the true value distributed along a vertical line. The CNN-based estimation (Figs. 10(a) and 10(b)) and spectral image estimation (Figs. 10(g) and 10(h)) approaches are imprecise in their estimates of *d*, with a large range of predicted values arising from different samples of data associated with the same true value. By contrast, the mode and kernel-based approaches are precise, with different samples from the same simulation giving similar outputs. This is consistent with the theoretical motivation of the Koopman operator; a given system has exactly one Koopman operator $K$ that maps its dynamics forward by a specific length of time, which is valid both during the transient period and during steady-state. The estimated operator $Ks$ does in practice change slightly depending on which window of data is selected, but is generally much more consistent than the time series itself. The spectral image estimation approach may be inconsistent because the singular value decomposition is not unique. We make it unique (to within a sign) by constraining the diagonal of $\Sigma K$ to be decreasing from the left to the right. However, this presents a problem for the use of $UK$ for estimation: very small changes in $Ks$ can result in a reordering of $\Sigma K$, which in turn results in a reordering of $Uk$. This reordering is likely responsible for the large range of estimates for *d*.

More quantitatively, differences between the estimation methods can be investigated by considering the loss on the testing dataset. The CNN-based estimation has the highest loss value of 0.0393, followed by the spectral image estimation at 0.0343, the direct mode estimation at 0.0340, and the mode-kernel estimation at 0.0325. The relative accuracy of the estimation methods is inconsistent with that calculated on the validation data; this inconsistency could indicate that different estimation approaches are suited to different regions of the parameter space, which is sampled differently by the testing and validation data due to the random location of samples. This may be particularly true for the kernel method, which may have difficulty with testing $(b,d)$ values that are far from those in the training dataset.

## 6 Conclusion

We have shown that the Koopman operator can extract features amenable for estimation from a dynamic system more accurately than a state-of-the-art black-box convolutional neural network feature extractor. Multiple approaches to extract features from the Koopman operator are considered, and both directly inputting the modes to a DNN and inputting the spectrum of the operator into a CNN were found to be effective in estimating the parameters of the flow, with the approach using the dynamic modes performing slightly better and having shorter training times. This motivates the use of dynamic mode decomposition in the classification and estimation of parameters for multivariate time series which are generated by dynamic systems. The significant advantage of the Koopman operator-based approaches for parameter estimation over the approach using machine learning directly on time series is the physical insights offered into the estimation. Training data can be better selected, and sensitivity of estimation to changes in parameters can be better understood in light of such physics-informed machine learning. Future work will consider in more depth these aspects and the relationship between the measurable (surface) modes and the broader modes of the fluid.

This work opens several future avenues of research to make the application of the sensing and localization approach described in this paper practically useful in the mobile robotics context. The calculation of the Koopman operator and the classification of obstacle distance have to be done with smaller snapshots of data. This is a significant challenge that requires further optimization to the computation of the Koopman operator and the subsequent machine learning. Besides this, the presence of more complex flows with disturbances or the existence of more than one obstacle present more challenges to the sensing framework described in this paper. We currently envision this framework as one that augments other sensing modalities. Further research on such sensor fusion can be expected to improve flow sensing by underwater robots.

## Funding Data

Office of Naval Research (Grant No. 13204704; Funder ID: 10.13039/100000006).

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.