Abstract

We examine how a human–robot interaction (HRI) system may be designed when input–output data from previous experiments are available. Our objective is to learn an optimal impedance in the assistance design for a cooperative manipulation task with a new operator. Due to the variability between individuals, the design parameters that best suit one operator of the robot may not be the best parameters for another one. However, by incorporating historical data using a linear autoregressive (AR-1) Gaussian process, the search for a new operator’s optimal parameters can be accelerated. We lay out a framework for optimizing the human–robot cooperative manipulation that only requires input–output data. We characterize the learning performance using a notion called regret, establish how the AR-1 model improves the bound on the regret, and numerically illustrate this improvement in the context of a human–robot cooperative manipulation task. Furthermore, we show how our approach’s input–output nature provides robustness against modeling error through an additional numerical study.

1 Introduction

Recently, there has been a rapid adoption of industrial robots that surpass humans in precision and skill in performing structured, repetitive tasks. However, increasingly complex tasks require increasingly complex robots. Situations often arise in which a robot cannot complete a task on its own. By bringing a human into the loop, human–robot interaction (HRI) leverages a human’s perceptive and decision-making strengths while still benefiting from the robot’s precision.

Using an impedance model, a robotic manipulator’s interaction with the environment is often controlled by adjusting its effective mass, stiffness, and damping at its end-effector [1]. The impedance model simplifies control strategies by dynamically relating the manipulator’s position and force. Multiple types of impedance control methods have been proposed, including adaptive control [24], iterative methods [5], and neural networks [6]. Studies have also analyzed variable impedance models [7] and their stability [8].

Robotic manipulators have found many engineering applications, including exosuits [9] and construction automation [10]. We specifically consider a cooperative manipulation task in which a human operates a manipulator to move a large object along a given trajectory. The manipulator seeks to follow a general trajectory but requires the human to provide an auxiliary force to make task-specific corrections to the end-effector’s path. In this context, the human can be modeled using a transfer function specified by a set of gains [1113]. These gains vary between individuals, requiring specialized tuning for each operator. As a result, a tradeoff is encountered when a new operator must be trained. In a purely robotic setting, the system structure may be found using system identification. However, this process may become time-consuming for the operator, leading to operator impatience. Iteratively tuning the system for the new operator would also be time-consuming and tend to ignore valuable historical data. Meanwhile, solely relying on historical data may result in suboptimal performance. Our goal is to leverage previous operator data to find the ideal tuning parameters for a new operator.

We use Gaussian process (GP) regression, a technique commonly used to optimize unknown and difficult-to-evaluate cost functions [14]. The benefits of GPs are their inclusion of confidence bounds in their prediction and upper bounds on regret—the difference between the cost at an operating point and the unknown optimal cost. Multi-fidelity Gaussian processes (MF-GPs) use multiple correlated inputs to predict an output. Specifically, the autoregressive-1 (AR-1) model relates data across various inputs through a nested linear structure. AR-1 models have been used to incorporate low-fidelity data from a simulation in order to optimize a high-fidelity function of the true system [15,16]. In the context of HRI, GP regression has been used to generate predictions of HRI velocities [17] and state-space models [18]. However, GPs have yet to specifically be used for tuning HRI systems.

The following are our main contributions:

  • Using an impedance controller for the robotic manipulator and a transfer function model for human input, we formulate the optimal assistance design for cooperative manipulation as an input–output problem where the system gains are the inputs and the system performance is the output. Using GPs, we develop a sequential method to find the system’s optimal gains that require only these input–output data.

  • We incorporate previous operators’ input–output data using an MF-GP. By analyzing how multiple fidelities affect the conditional covariance, we provide an upper bound on the regret. Additionally, we relate this bound to the measurement quality and variability across operators to show that an increase in the accuracy of prior data leads to a decrease in regret.

  • We numerically simulate input–output data for a model of human–robot cooperative manipulation and compare the single- and multi-fidelity formulations (MFFs). We provide an example where cumulative and best instantaneous regret is better for the multi-fidelity formulation than the single-fidelity formulation. Furthermore, we simulate a disturbance-impacted model of the human–robot manipulator to demonstrate the robustness of our approach.

2 System Description

In this section, we formulate a model for this cooperative manipulation system. In general, robotic manipulators are nonlinear, but using feedback linearization, we design a control input so that the robot behaves as an impedance model. The impedance model allows the human–robot system to be formulated as a linear time-invariant system, which can then be controlled using state feedback. We refer the interested reader to Ref. [19, Section 9.3] for more details. An overview of this control strategy is displayed in Fig. 1.

Fig. 1
Block diagram of the human–robot manipulator system
Fig. 1
Block diagram of the human–robot manipulator system
Close modal

2.1 Robot Impedance Model.

Consider an n-link robot manipulator with the joint space dynamical model [19]
(1)
where qRn denotes the manipulator’s joint variables with n degrees-of-freedom. Here, Mq(q)Rn×n is the symmetric positive definite inertia matrix, Cq(q,q˙)Rn×n is the Coriolis-centrifugal matrix, FqRn×n is the vector of damping coefficients, Gq(q)Rn is the vector of gravitational forces, τqRn are the input torques at the joints, heRn are the contact forces exerted by the manipulator’s end-effector, and JRn×n is the Jacobian relating the end-effector velocity to the joint velocities.3
Let z and zd be the position and desired position of the manipulator end-effector. The error between these positions is given by e:=zzd. Assuming the joint positions q and velocities q˙ are known, feedback linearization using the control law
(2)
where p(q,q˙)=Cq(q,q˙)q˙+Fqq˙+Gq(q) reduces (2) to
(3)
Selecting Mm, Bm, and Km as the desired inertia, damping, and stiffness matrices of the impedance model, respectively, we set the input uq of (2) to
(4)
where fh is the human control input, KhRn×n is a diagonal matrix of gains that scales human input, and JA(q) is the analytical Jacobian satisfying z˙=JA(q)q˙.
Combining (3) and (4), we obtain the impedance model
(5)
Define the augmented error vector as e¯:=[eTe˙T]TR2n. Then (5) can be rewritten as
(6)
and
(7)

2.2 Human-Impedance Model.

To account for the effect of the human in the HRI system, we model the human operator using a proportional gain and a derivative gain (cf. Refs. [11,12,20]). Assuming the human’s reaction is based on the robot error e, we obtain the human-impedance model
(8)
where Kd,KpRn×n are diagonal matrices of human gains. These gains are considered to be unknown and may vary between operators. As such, we denote by Kdi and Kpi the gain matrices of the ith operator. Using these operator-specific gains, (8) can be rewritten as
(9)
where

2.3 Human–Robot Impedance Model.

With the models for the robot and human, we now write an augmented state model for the system. Define the augmented state of the human–robot system as Z:=[e¯T,fhT]TR3n. Then, the HRI manipulator for the ith operator has the state-space model
(10)
with control gains
(11)
Expressing the human–robot impedance model as (10) allows an interpretation of the tunable impedance parameters and human input scaling as controller gains in linear state feedback and using control-theoretic techniques to tune them optimally. Given a set of control gains K, the quadratic cost of cooperative manipulation for the ith operator is
(12)
where QR3n×3n weights the effect of the tracking error, error rate, and human effort, RRn×n weights the robot’s control effort, and Zi(τ) is the solution of (10) given an initial condition Zi(0).

2.4 Problem Statement.

Consider an HRI system with the impedance model (10). Let K(x) be a controller depending on design parameters xXRq. Suppose that the robot has m human operators, with the ith human possessing their own performance metric
(13)

As the ith operator tests different design parameters, they obtain data for XiX. Now, suppose a new (m+1)-th human operates the same robot. Our goal is to use previous data (Xi,fi(Xi)) to find an ideal set of design parameters x* that optimize the new operator’s performance fm+1.

3 Multi-fidelity Control Gain Selection

In this section, we first provide an overview of GPs. We then introduce the notion of multi-fidelity and describe how this framework applies to the HRI problem.

3.1 Gaussian Processes.

A GP is a collection of random variables, in which any finite subset of variables has a multivariate Gaussian distribution (cf. Ref. [14]). A GP is defined by its mean function μ(x) and its covariance (kernel) function k(x,x). For a set of inputs Xt={x1,,xt}, we can create a covariance matrix k(Xt,Xt)=[k(xi,xj)]i,j=1t. By taking the covariance between a point and a set of points, we obtain a covariance vector k(x):=k(Xt,x)=[k(x1,x)k(xt,x)]T. Let Yt=[y1,,yt]T be noisy samples of f at Xt, where yi=f(xi)+η has independent and identically distributed Gaussian measurement noise ηN(0,ξ2). Then, the posterior distribution of f is another GP with mean μt+1, covariance kt+1, and standard deviation σt+1 given by
(14)
(15)
The upper confidence bound (UCB) algorithm (cf. Ref. [21]) selects points according to
where βt is a parameter that controls the algorithm’s tendency to explore. This algorithm is formalized in Algorithm 1. One particular appeal of UCB is its theoretical guarantee associated with a metric called regret.

UCB sampling

Algorithm 1

1: Input: GP f with priors μ0, σ0, discrete domain X

2: fort=1,2,do

3: Choose xt=argmaxxXμt1(x)+βt1/2σt1(x)

4: Sample yt(xt)=f(xt)+η

5: Predict μt(x), σt(x)xX

6: end for

For an iterative optimization algorithm, the instantaneous regret of an evaluation is given by
(16)
where x*=argmaxxXf(x). Regret indicates the gap between the current evaluation and the best possible evaluation. After T rounds, the cumulative regret is given by RT=t=1Trt and the best instantaneous regret is given by rT*=mint={1,,T}rt.

3.2 Multi-fidelity Gaussian Processes.

An MF-GP incorporates data from multiple approximations of a model f. One type of MF-GP is the AR-1 model (cf. Ref. [22]). AR-1 models f as a linear combination of a low-fidelity GP fL(x) and an error GP δ(x) by
(17)
where ρ is a scaling constant.
Denote the kernels of fL and δ by k(L) and k(δ), respectively, and let evaluations of fL and f be noisy with variances ξL2 and ξH2. Then, for X=[XL,XH], an AR-1 model has a covariance matrix of the form
(18)
where kH,L(L) is shorthand notation for the single-fidelity covariance matrix k(L)(XH,XL). Unlike larger GP models, the AR-1 model allows for the iterative updating of each fidelity, thereby maintaining a computational complexity in the same order as a single-fidelity GP. Additionally, its decoupled recursive structure allows for the computationally efficient learning of its parameters.

3.3 Proposed Approach: MF-GPs for Control Design.

Using the AR-1 model, we aim to effectively leverage data from the previous operators to a specific individual. Consider a set of m+1 operators, with the ith operator’s performance data (Xi,fi(Xi)). Let f:XR be an unknown realization of a GP with AR-1 structure (17). Since the quadratic cost is sufficiently smooth in K, we assume the GP f adequately represents the performance fm+1 of the (m+1)th operator. Meanwhile, we treat fL as a GP with observations of the first m operators. Note, fL does not specifically represent any fi, but rather models the expected performance of the previous m operators. Using UCB, we iteratively select an xt to test for the (m+1)th operator, thereby obtaining evaluations of f. This MFF is formalized in Algorithm 2.

MFF

Algorithm 2

1: Input: Data (Xi,fi(Xi)) for i{1,2,,m+1}, Discrete domain X

2: Let fL be a GP with evaluations (Xi,fi(Xi)) for i={1,,m}

3: Let f be a GP with form f(x)=ρfL(x)+δ(x) and evaluations (Xm+1,fm+1(Xm+1))

4: Predict μ0(x), σ0(x)xX

5: Apply UCB(f,μ0,σ0,X)

We compare MFF to two single-fidelity approaches that do not take advantage of the AR-1 structure. In the collective single-fidelity (CSF) formulation of Algorithm 3, pooled data from all operators are treated as a single fidelity. In the limited single-fidelity (LSF) formulation of Algorithm 4, the single-fidelity GP contains only data from the new (m+1)th operator. Essentially, LSF is a naive approach that ignores any previous operator data.

CSF

Algorithm 3

1: Input: Data (Xi,fi(Xi)) for i{1,2,,m+1}, discrete domain X

2: Let f be a GP with evaluations (Xi,fi(Xi))

3: Predict μ0(x), σ0(x)xX

4: Apply UCB(f,μ0,σ0,X)

LSF

Algorithm 4

1: Input: Data (Xm+1,fm+1(Xm+1)), discrete set X

2: Let f be a GP with evaluations (Xm+1,fm+1(Xm+1))

3: Predict μ0(x), σ0(x)xX

4: Apply UCB(f,μ0,σ0,X)

4 Theoretical Results

In this section, we establish a generic theoretical bound on the regret of the proposed MFF Algorithm 2, and then discuss its implications on the HRI problem from Sec. 2.4. To formally state our main result, we first introduce
(19)
We will later establish that k~ is an upper bound on the conditional covariance of an AR-1 GP given the low and the high-fidelity data. Using this k~, we define
(20)
where i=1Tmi=T, h(T)=min{T,|XH|}, and λt(k~) are the eigenvalues of the matrix k~. This quantity γ~T is an upper bound on the information gain—a metric quantifying the greatest amount of information that can be learned after T points of an AR-1 GP are sampled. Using these quantities, we present the main result.

Regret bound for an AR-1 process

Theorem 1
Letfbe a sample function from a linear autoregressive GP(17) over a discrete domain X. Setδ(0,1)andβT=2log(|X|t2π2/6δ). Then, {x1,x2,,xT}obtained from Algorithm 2 satisfy
with probability at least1δ, where constantsvL:=kx,x(L)andvδ:=kx,x(δ)are the respective kernel variances.

The steps leading to the proof of Theorem 1 are presented in the  Appendix.

Implications on HRI

Remark 1

The regret characterizes the number of suboptimal parameter selections by a learning algorithm until time T. Since the quantity γ~T directly impacts the upper bound on the regret RT of Algorithm 2, if the eigenvalues of k~ are significantly smaller than the eigenvalues of kH,H(H), then γ~T becomes significantly smaller, leading to an improved regret bound. Consequently, the algorithm may require a smaller number of parameter-tuning personalization steps when data from prior operators are available. Furthermore, recall that fL represents the expected performance of the previous m operators, and ξL2 represents the variance of the evaluations of fL. In the context of the HRI problem, ξL2 reduces when we have a sufficiently large set of representative operator data. Furthermore, when fL closely matches f, vδ decreases along with the eigenvalues of k~. This happens when variations between operators have minimal effect on the HRI performance curve.

Remark 2
If the low-fidelity model is evaluated at all points in XH, we see that kH,L(L)[kL,L(L)]1kL,H(L)=kH,H(L), resulting in a simplification of k~ to

Furthermore, as the high- and low-fidelity noise terms approach 0, the conditional covariance approaches kH,H(δ).4

5 Numerical Simulations

We demonstrate the performance of our algorithms on a planar two-link prismatic-revolute (PR) manipulator model (cf. Fig. 2) through two numerical simulations. The first is when the feedback linearization leads to an exact linear time-invariant (LTI) model (10). The second simulation studies the robustness of the model to perturbations to the LTI system, modeled as a nonlinear disturbance term (Table 1).

Fig. 2
Schematic of the PR manipulator used in simulations
Fig. 2
Schematic of the PR manipulator used in simulations
Close modal
Table 1

Simulation parameters

ParameterNotationValue
Moment of inertia of link 1I10.1 kg m2
Moment of inertia of link 2I20.1 kg m2
Mass of link 1m11 kg
Mass of link 2m21 kg
Length of link 1l10.5 m
(Fixed) length of link 2d¯0.5 m
ParameterNotationValue
Moment of inertia of link 1I10.1 kg m2
Moment of inertia of link 2I20.1 kg m2
Mass of link 1m11 kg
Mass of link 2m21 kg
Length of link 1l10.5 m
(Fixed) length of link 2d¯0.5 m
We set the joint variable vector qT:=[θ1d2]T. Let the end-effector location zT=[z1z2]T=[l2cos(θ1)l2sin(θ1)]T, where l2:=d¯+d2. This yields
(21)
Now,
(22)
The manipulator dynamics (Eq. (1)) are [23, Example 6.6]

5.1 Feedback Linearization Is Exact.

Consider the augmented state model (10). Due to the impedance model, the end-effector’s motion is assumed to be independent in each direction. The desired trajectory for the end effector was chosen to be
Then, the actual trajectory satisfies z(t)=zd(t)+e(t). Equation (21) yields the joint angles q and Equation (22) yields their derivatives. For the nominal model, we assume I1=I2=0. Setting Mm=I2, we assume Bm and Km will be scalars. Thus, K will have the structure

Henceforth, we use x=(x1,x2,x3)X as the optimization parameter, where X is a 11×11×11 hyperrectangle with span x1[0.25,0.45], x2[0.85,0.95], and x3[0.02,0.22]. These ranges were chosen through trials to ensure that the closed-loop system is indeed stable to begin with.

Next, we generate data for m=9 previous operators. For the performance functions, we aim to minimize the human effort by setting Q=diag(0.1,0.1,0.1,0.1,10,10) and R=I2. We randomly draw kdiN(10,5), kpiN(20,5), and set Kdi=kdiIn, Kpi=kpiIn. An initial condition Zi(0)=[110]T is chosen to model an initial error in position. The performance fi from (13) is approximated using a finite integral from τ=0 to τ=10. Each fi is evaluated for 20 random sets of xX with additive Gaussian noise ηN(0,104).

We run 20 Monte Carlo simulations involving the random selection of previous data points and operator gains Kd, Kp. Figure 3 displays the averages of best and cumulative regrets across the simulations. We see that MFF leads to a general improvement in the cumulative regret, especially for higher iteration counts. Between the single-fidelity approaches, LSF has a lower regret and tighter variance than CSF.

Fig. 3
Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for system (10) with no disturbance. Error bars represent one standard deviation across 20 Monte Carlo trials.
Fig. 3
Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for system (10) with no disturbance. Error bars represent one standard deviation across 20 Monte Carlo trials.
Close modal

The best instantaneous regret plot shows that MFF typically makes better selections than CSF or LSF in the first few iterations. After around 10 iterations, LSF and MFF have found a selection with very low regret while CSF fails to find an optimal selection even after the 20 iterations.

These results indicate that data from the previous operators are beneficial when it is incorporated through a multi-fidelity structure. Incorporating previous data through CSF increases the regret compared to ignoring it in LSF.

5.2 Robustness to Model Perturbations.

We now suppose that the mass matrix is perturbed, i.e., M¯q(q)=Mq(q)+δMq(q), where a small perturbation is added to the (1,1) entry of Mq(q). Using τq from (2), and following similar steps, we obtain
Keeping the same choice of uq as in (4), the evolution (5) of the system gets modified to

In other words, the term MmJA(q)(M¯q1MqI)uq quantifies the error due to inexact feedback linearization.

Anticipating a higher regret due to model mismatch, we choose a lower control penalty, i.e., R=0.1I. We plot the regret from the MFF, CSF, and LSF approaches in Fig. 4. We also show the regret incurred when the optimal controller from the undisturbed system is used on the disturbed system which is negligible in this case.

Fig. 4
Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for the perturbed system. The dotted line line indicates the regret when the optimal controller from the undisturbed system is used on the disturbed system. Error bars represent one standard deviation across 20 Monte Carlo trials.
Fig. 4
Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for the perturbed system. The dotted line line indicates the regret when the optimal controller from the undisturbed system is used on the disturbed system. Error bars represent one standard deviation across 20 Monte Carlo trials.
Close modal

The disturbance increases the means and spreads the cumulative regret. One reason behind this is the fact that the same range as for the undisturbed case was used to search for control gains in the presence of the disturbance. We also observe that the MFF performs better than LSF or CSF in terms of instantaneous regret. Furthermore, on average, all three find controllers whose performance matches that of the optimal undisturbed controller.

5.3 Extension of the Proposed Methodology.

The extension of this methodology to larger models, i.e., manipulators with more joints/degrees-of-freedom, is straightforward. The unknown control gain submatrices Km,Bm, and Kh from Eq. (11) are diagonal, and have the structure
where In is the n×n identity matrix. So the number of unknowns x1,x2, and x3 remains unchanged for any n.

In terms of manipulator models, our methodology applies to any n-link manipulator that satisfies Eq. (1), which makes no assumptions on linearity. If we can design the input torques τq (e.g., using feedback linearization) to satisfy Eq. (2), we obtain Eq. (21). This choice of τq may either be exact (Sec. 5.1) or may have some terms that do not cancel out (e.g., disturbance/noise as considered in Sec. 5.2). Alternatively, τq may be designed using neural networks [12] or high-gain observers [24] that have good disturbance rejection properties.

With respect to the human-impedance model (8), the model allows us to restrict the search of the optimal controller to an affine form. However, our proposed approach is general and applies to any other impedance model for which an appropriate parameterization of the controller can be made.

6 Conclusion

We provide a multi-fidelity framework to find the optimal set of impedance parameters for a human–robot cooperative manipulation system using only input–output data. By treating prior operator data as a low-fidelity model, we are able to further optimize the system’s performance for a new operator. We establish how the AR-1 model improves the regret bound through the conditional covariance and then numerically simulate human–robot cooperative manipulation to demonstrate this improvement in regret.

In the future, we plan to validate this framework through physical experiments with human subjects and a robotic manipulator.

Footnotes

1

Paper presented at the 2024 Modeling, Estimation, and Control Conference (MECC 2024), Chicago, IL, Oct. 28–30, 2024, Paper No. MECC2024-68.

3

For ease of presentation, we assume the joint space and the workspace are n-dimensional. The ideas extend to redundant manipulators using standard modifications [19].

4

This generalizes the simplification found in the proof of Theorem 3.2 in Ref. [16], where XHXL and ξL2=0.

Acknowledgment

This work was supported in part by ARO grant W911NF-18-1-0325 and in part by NSF Award CNS-2134076.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The authors attest that all data for this study are included in the paper.

Appendix: Analysis and the Proof of the Main Result

We start with a basic result that will help us establish an upper bound on the conditional covariance of an AR-1 GP.

Proposition 1
LetQ0be a positive definite matrix andσR. Then,

Note, this form is given in Ref. [25, Eq. (191)], which denotes it as an approximation but does not state the direction of the inequality.

Proof of Proposition 1
We rewrite
(A1)
Since Q0Q10σ4Q20. Then,
Since I+σ2Q1 is invertible for any σ and for any Q10,
(A2)

By substituting (A2) into (A1), we complete the proof.

Conditional covariance of a AR-1 GP

Lemma 1

The covariance of the high-fidelity data conditioned on the low-fidelity data is upper bounded byk~defined in Eq. (19).

Proof of Lemma 1.
The conditional covariance of an AR-1 GP satisfies
where the inequality is obtained from Proposition 1.  □
An upper bound on the conditional covariance allows us to establish an upper bound on the maximum information gain γT [21]. Suppose f is sampled at points AX, resulting in a vector of noisy evaluations yA and a vector of true values fA. Then, denoting the entropy of a vector by H(), the information gain is defined as I(yA;fA):=H(yA)H(yA|f), and the maximum information gain is
(A3)

Info. gain bound for an AR-1 GP

Lemma 2

LetξH2andξL2be the variance of the high- and low-fidelity measurement noise of a linear autoregressive GP. Then the maximum information gainγTis upper bounded byγ~Tdefined in Eq. (20).

The proof of this result is similar to that of Theorem 4 from Ref. [21]. Finally, the proof of Theorem 1 closely follows the proof of Theorem 1 from Ref. [21]. We omit both of these proofs for brevity.

References

1.
Hogan
,
N.
,
1985
, “
Impedance Control: An Approach to Manipulation: Part II—Implementation
,”
ASME J. Dyn. Syst. Meas. Control
,
107
(
1
), pp.
8
16
.
2.
Lu
,
W.-S.
, and
Meng
,
Q.-H.
,
1991
, “
Impedance Control With Adaptation for Robotic Manipulations
,”
IEEE Trans. Rob. Autom.
,
7
(
3
), pp.
408
415
.
3.
Huo
,
Y.
,
Li
,
P.
,
Chen
,
D.
,
Liu
,
Y.-H.
, and
Li
,
X.
,
2021
, “
Model-Free Adaptive Impedance Control for Autonomous Robotic Sanding
,”
IEEE Trans. Autom. Sci. Eng.
,
19
(
4
), pp.
3601
3611
.
4.
Sun
,
T.
,
Yang
,
J.
,
Pan
,
Y.
, and
Yu
,
H.
,
2023
, “
Repetitive Impedance Learning-Based Physically Human–Robot Interactive Control
,”
IEEE Trans. Neural Netw. Learn. Syst.
,
35
(
8
), pp.
1
10
.
5.
Li
,
X.
,
Liu
,
Y.-H.
, and
Yu
,
H.
,
2018
, “
Iterative Learning Impedance Control for Rehabilitation Robots Driven by Series Elastic Actuators
,”
Automatica
,
90
, pp.
1
7
.
6.
Yang
,
C.
,
Peng
,
G.
,
Li
,
Y.
,
Cui
,
R.
,
Cheng
,
L.
, and
Li
,
Z.
,
2018
, “
Neural Networks Enhanced Adaptive Admittance Control of Optimized Robot–Environment Interaction
,”
IEEE Trans. Cybern.
,
49
(
7
), pp.
2568
2579
.
7.
Ficuciello
,
F.
,
Villani
,
L.
, and
Siciliano
,
B.
,
2015
, “
Variable Impedance Control of Redundant Manipulators for Intuitive Human–Robot Physical Interaction
,”
IEEE Trans. Rob.
,
31
(
4
), pp.
850
863
.
8.
Sun
,
T.
,
Peng
,
L.
,
Cheng
,
L.
,
Hou
,
Z.-G.
, and
Pan
,
Y.
,
2019
, “
Stability-Guaranteed Variable Impedance Control of Robots Based on Approximate Dynamic Inversion
,”
IEEE Trans. Syst. Man Cybern. Syst.
,
51
(
7
), pp.
4193
4200
.
9.
Li
,
Z.
,
Li
,
X.
,
Li
,
Q.
,
Su
,
H.
,
Kan
,
Z.
, and
He
,
W.
,
2022
, “
Human-in-the-Loop Control of Soft Exosuits Using Impedance Learning on Different Terrains
,”
IEEE Trans. Rob.
,
38
(
5
), pp.
2979
2993
.
10.
Bock
,
T.
, and
Linner
,
T.
,
2016
,
Construction Robots: Volume 3: Elementary Technologies and Single-Task Construction Robots
,
Cambridge University Press
,
New York
.
11.
Yang
,
Y.
,
Ding
,
Z.
,
Wang
,
R.
,
Modares
,
H.
, and
Wunsch
,
D. C.
,
2021
, “
Data-Driven Human–Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning
,”
IEEE/CAA J. Autom. Sin.
,
9
(
1
), pp.
47
63
.
12.
Modares
,
H.
,
Ranatunga
,
I.
,
Lewis
,
F. L.
, and
Popa
,
D. O.
,
2015
, “
Optimized Assistive Human–Robot Interaction Using Reinforcement Learning
,”
IEEE Trans. Cybern.
,
46
(
3
), pp.
655
667
.
13.
Li
,
Z.
,
Liu
,
J.
,
Huang
,
Z.
,
Peng
,
Y.
,
Pu
,
H.
, and
Ding
,
L.
,
2017
, “
Adaptive Impedance Control of Human–Robot Cooperation Using Reinforcement Learning
,”
IEEE Trans. Ind. Electron.
,
64
(
10
), pp.
8013
8022
.
14.
Williams
,
C. K.
, and
Rasmussen
,
C. E.
,
2006
,
Gaussian Processes for Machine Learning
, Vol. 2,
MIT Press
,
Cambridge, MA
.
15.
Marco
,
A.
,
Berkenkamp
,
F.
,
Hennig
,
P.
,
Schoellig
,
A. P.
,
Krause
,
A.
,
Schaal
,
S.
, and
Trimpe
,
S.
,
2017
, “
Virtual Vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning With Bayesian Optimization
,”
IEEE International Conference on Robotics and Automation
,
Singapore
,
May 29–June 3
, pp.
1557
1563
.
16.
Lau
,
E.
,
Srivastava
,
V.
, and
Bopardikar
,
S. D.
,
2023
, “
A Multi-fidelity Bayesian Approach to Safe Controller Design
,”
IEEE Control Syst. Lett.
,
7
, pp.
2904
2909
.
17.
Jin
,
Z.
,
Liu
,
A.
,
Zhang
,
W.-A.
,
Yu
,
L.
, and
Su
,
C.-Y.
,
2023
, “
A Learning Based Hierarchical Control Framework for Human–Robot Collaboration
,”
IEEE Trans. Autom. Sci. Eng.
,
20
(
1
), pp.
506
517
.
18.
Pöhler
,
L.
,
Umlauft
,
J.
, and
Hirche
,
S.
,
2019
, “
Uncertainty-Based Human Motion Tracking With Stable Gaussian Process State Space Models
,”
IFAC-PapersOnLine
,
51
(
34
), pp.
8
14
. .
19.
Siciliano
,
B.
,
Sciavicco
,
L.
,
Villani
,
L.
, and
Oriolo
,
G.
,
2010
,
Robotics: Modelling, Planning, and Control
,
Springer
,
London
.
20.
McRuer
,
D. T.
, and
Jex
,
H. R.
,
1967
, “
A Review of Quasi-linear Pilot Models
,”
IEEE Trans. Hum. Factors Electron.
,
8
(
3
), pp.
231
249
.
21.
Srinivas
,
N.
,
Krause
,
A.
,
Kakade
,
S. M.
, and
Seeger
,
M. W.
,
2012
, “
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
,”
IEEE Trans. Inform. Theory
,
58
(
5
), pp.
3250
3265
.
22.
Kennedy
,
M. C.
, and
O’Hagan
,
A.
,
2000
, “
Predicting the Output From a Complex Computer Code When Fast Approximations Are Available
,”
Biometrika
,
87
(
1
), pp.
1
13
.
23.
Craig
,
J. J.
,
2004
,
Introduction to Robotics: Mechanics and Control
, 3rd ed.,
Pearson Education
,
London, UK
.
24.
Wei
,
L.
, and
Chen
,
G.
,
2020
, “
Extended High-Gain Observer Based Output Feedback Linearization of Robot Manipulator
,”
IEEE Conference on Cyber Technology in Automation, Control, and Intelligent Systems
,
Xi'an, China
,
October
, pp.
73
79
.
25.
Petersen
,
K. B.
, and
Pedersen
,
M. S.
,
2008
, “
The Matrix Cookbook
,”
Tech. Univ. Denmark
,
7
(
15
), p.
510
.