Abstract
We examine how a human–robot interaction (HRI) system may be designed when input–output data from previous experiments are available. Our objective is to learn an optimal impedance in the assistance design for a cooperative manipulation task with a new operator. Due to the variability between individuals, the design parameters that best suit one operator of the robot may not be the best parameters for another one. However, by incorporating historical data using a linear autoregressive (AR-1) Gaussian process, the search for a new operator’s optimal parameters can be accelerated. We lay out a framework for optimizing the human–robot cooperative manipulation that only requires input–output data. We characterize the learning performance using a notion called regret, establish how the AR-1 model improves the bound on the regret, and numerically illustrate this improvement in the context of a human–robot cooperative manipulation task. Furthermore, we show how our approach’s input–output nature provides robustness against modeling error through an additional numerical study.
1 Introduction
Recently, there has been a rapid adoption of industrial robots that surpass humans in precision and skill in performing structured, repetitive tasks. However, increasingly complex tasks require increasingly complex robots. Situations often arise in which a robot cannot complete a task on its own. By bringing a human into the loop, human–robot interaction (HRI) leverages a human’s perceptive and decision-making strengths while still benefiting from the robot’s precision.
Using an impedance model, a robotic manipulator’s interaction with the environment is often controlled by adjusting its effective mass, stiffness, and damping at its end-effector [1]. The impedance model simplifies control strategies by dynamically relating the manipulator’s position and force. Multiple types of impedance control methods have been proposed, including adaptive control [2–4], iterative methods [5], and neural networks [6]. Studies have also analyzed variable impedance models [7] and their stability [8].
Robotic manipulators have found many engineering applications, including exosuits [9] and construction automation [10]. We specifically consider a cooperative manipulation task in which a human operates a manipulator to move a large object along a given trajectory. The manipulator seeks to follow a general trajectory but requires the human to provide an auxiliary force to make task-specific corrections to the end-effector’s path. In this context, the human can be modeled using a transfer function specified by a set of gains [11–13]. These gains vary between individuals, requiring specialized tuning for each operator. As a result, a tradeoff is encountered when a new operator must be trained. In a purely robotic setting, the system structure may be found using system identification. However, this process may become time-consuming for the operator, leading to operator impatience. Iteratively tuning the system for the new operator would also be time-consuming and tend to ignore valuable historical data. Meanwhile, solely relying on historical data may result in suboptimal performance. Our goal is to leverage previous operator data to find the ideal tuning parameters for a new operator.
We use Gaussian process (GP) regression, a technique commonly used to optimize unknown and difficult-to-evaluate cost functions [14]. The benefits of GPs are their inclusion of confidence bounds in their prediction and upper bounds on regret—the difference between the cost at an operating point and the unknown optimal cost. Multi-fidelity Gaussian processes (MF-GPs) use multiple correlated inputs to predict an output. Specifically, the autoregressive-1 (AR-1) model relates data across various inputs through a nested linear structure. AR-1 models have been used to incorporate low-fidelity data from a simulation in order to optimize a high-fidelity function of the true system [15,16]. In the context of HRI, GP regression has been used to generate predictions of HRI velocities [17] and state-space models [18]. However, GPs have yet to specifically be used for tuning HRI systems.
The following are our main contributions:
Using an impedance controller for the robotic manipulator and a transfer function model for human input, we formulate the optimal assistance design for cooperative manipulation as an input–output problem where the system gains are the inputs and the system performance is the output. Using GPs, we develop a sequential method to find the system’s optimal gains that require only these input–output data.
We incorporate previous operators’ input–output data using an MF-GP. By analyzing how multiple fidelities affect the conditional covariance, we provide an upper bound on the regret. Additionally, we relate this bound to the measurement quality and variability across operators to show that an increase in the accuracy of prior data leads to a decrease in regret.
We numerically simulate input–output data for a model of human–robot cooperative manipulation and compare the single- and multi-fidelity formulations (MFFs). We provide an example where cumulative and best instantaneous regret is better for the multi-fidelity formulation than the single-fidelity formulation. Furthermore, we simulate a disturbance-impacted model of the human–robot manipulator to demonstrate the robustness of our approach.
2 System Description
In this section, we formulate a model for this cooperative manipulation system. In general, robotic manipulators are nonlinear, but using feedback linearization, we design a control input so that the robot behaves as an impedance model. The impedance model allows the human–robot system to be formulated as a linear time-invariant system, which can then be controlled using state feedback. We refer the interested reader to Ref. [19, Section 9.3] for more details. An overview of this control strategy is displayed in Fig. 1.
2.1 Robot Impedance Model.
2.2 Human-Impedance Model.
2.3 Human–Robot Impedance Model.
2.4 Problem Statement.
As the th operator tests different design parameters, they obtain data for . Now, suppose a new -th human operates the same robot. Our goal is to use previous data to find an ideal set of design parameters that optimize the new operator’s performance .
3 Multi-fidelity Control Gain Selection
In this section, we first provide an overview of GPs. We then introduce the notion of multi-fidelity and describe how this framework applies to the HRI problem.
3.1 Gaussian Processes.
UCB sampling
1: Input: GP with priors , , discrete domain
2: fordo
3: Choose
4: Sample
5: Predict ,
6: end for
3.2 Multi-fidelity Gaussian Processes.
3.3 Proposed Approach: MF-GPs for Control Design.
Using the AR-1 model, we aim to effectively leverage data from the previous operators to a specific individual. Consider a set of operators, with the th operator’s performance data . Let be an unknown realization of a GP with AR-1 structure (17). Since the quadratic cost is sufficiently smooth in , we assume the GP adequately represents the performance of the th operator. Meanwhile, we treat as a GP with observations of the first operators. Note, does not specifically represent any , but rather models the expected performance of the previous operators. Using UCB, we iteratively select an to test for the th operator, thereby obtaining evaluations of . This MFF is formalized in Algorithm 2.
MFF
1: Input: Data for , Discrete domain
2: Let be a GP with evaluations for
3: Let be a GP with form and evaluations
4: Predict ,
5: Apply UCB()
We compare MFF to two single-fidelity approaches that do not take advantage of the AR-1 structure. In the collective single-fidelity (CSF) formulation of Algorithm 3, pooled data from all operators are treated as a single fidelity. In the limited single-fidelity (LSF) formulation of Algorithm 4, the single-fidelity GP contains only data from the new th operator. Essentially, LSF is a naive approach that ignores any previous operator data.
CSF
1: Input: Data for , discrete domain
2: Let be a GP with evaluations
3: Predict ,
4: Apply UCB()
LSF
1: Input: Data , discrete set
2: Let be a GP with evaluations
3: Predict ,
4: Apply UCB()
4 Theoretical Results
Regret bound for an AR-1 process
The steps leading to the proof of Theorem 1 are presented in the Appendix.
Implications on HRI
The regret characterizes the number of suboptimal parameter selections by a learning algorithm until time . Since the quantity directly impacts the upper bound on the regret of Algorithm 2, if the eigenvalues of are significantly smaller than the eigenvalues of , then becomes significantly smaller, leading to an improved regret bound. Consequently, the algorithm may require a smaller number of parameter-tuning personalization steps when data from prior operators are available. Furthermore, recall that represents the expected performance of the previous operators, and represents the variance of the evaluations of . In the context of the HRI problem, reduces when we have a sufficiently large set of representative operator data. Furthermore, when closely matches , decreases along with the eigenvalues of . This happens when variations between operators have minimal effect on the HRI performance curve.
Furthermore, as the high- and low-fidelity noise terms approach 0, the conditional covariance approaches .4
5 Numerical Simulations
We demonstrate the performance of our algorithms on a planar two-link prismatic-revolute (PR) manipulator model (cf. Fig. 2) through two numerical simulations. The first is when the feedback linearization leads to an exact linear time-invariant (LTI) model (10). The second simulation studies the robustness of the model to perturbations to the LTI system, modeled as a nonlinear disturbance term (Table 1).
Simulation parameters
Parameter | Notation | Value |
---|---|---|
Moment of inertia of link 1 | 0.1 kg m | |
Moment of inertia of link 2 | 0.1 kg m | |
Mass of link 1 | 1 kg | |
Mass of link 2 | 1 kg | |
Length of link 1 | 0.5 m | |
(Fixed) length of link 2 | 0.5 m |
Parameter | Notation | Value |
---|---|---|
Moment of inertia of link 1 | 0.1 kg m | |
Moment of inertia of link 2 | 0.1 kg m | |
Mass of link 1 | 1 kg | |
Mass of link 2 | 1 kg | |
Length of link 1 | 0.5 m | |
(Fixed) length of link 2 | 0.5 m |
5.1 Feedback Linearization Is Exact.
Henceforth, we use as the optimization parameter, where is a hyperrectangle with span , , and . These ranges were chosen through trials to ensure that the closed-loop system is indeed stable to begin with.
Next, we generate data for previous operators. For the performance functions, we aim to minimize the human effort by setting and . We randomly draw , , and set , . An initial condition is chosen to model an initial error in position. The performance from (13) is approximated using a finite integral from to . Each is evaluated for random sets of with additive Gaussian noise .
We run Monte Carlo simulations involving the random selection of previous data points and operator gains , . Figure 3 displays the averages of best and cumulative regrets across the simulations. We see that MFF leads to a general improvement in the cumulative regret, especially for higher iteration counts. Between the single-fidelity approaches, LSF has a lower regret and tighter variance than CSF.

Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for system (10) with no disturbance. Error bars represent one standard deviation across 20 Monte Carlo trials.

Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for system (10) with no disturbance. Error bars represent one standard deviation across 20 Monte Carlo trials.
The best instantaneous regret plot shows that MFF typically makes better selections than CSF or LSF in the first few iterations. After around 10 iterations, LSF and MFF have found a selection with very low regret while CSF fails to find an optimal selection even after the 20 iterations.
These results indicate that data from the previous operators are beneficial when it is incorporated through a multi-fidelity structure. Incorporating previous data through CSF increases the regret compared to ignoring it in LSF.
5.2 Robustness to Model Perturbations.
In other words, the term quantifies the error due to inexact feedback linearization.
Anticipating a higher regret due to model mismatch, we choose a lower control penalty, i.e., . We plot the regret from the MFF, CSF, and LSF approaches in Fig. 4. We also show the regret incurred when the optimal controller from the undisturbed system is used on the disturbed system which is negligible in this case.

Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for the perturbed system. The dotted line line indicates the regret when the optimal controller from the undisturbed system is used on the disturbed system. Error bars represent one standard deviation across 20 Monte Carlo trials.

Best instantaneous and cumulative regret (averaged across 20 trials) when UCB is used to select control gains for the perturbed system. The dotted line line indicates the regret when the optimal controller from the undisturbed system is used on the disturbed system. Error bars represent one standard deviation across 20 Monte Carlo trials.
The disturbance increases the means and spreads the cumulative regret. One reason behind this is the fact that the same range as for the undisturbed case was used to search for control gains in the presence of the disturbance. We also observe that the MFF performs better than LSF or CSF in terms of instantaneous regret. Furthermore, on average, all three find controllers whose performance matches that of the optimal undisturbed controller.
5.3 Extension of the Proposed Methodology.
In terms of manipulator models, our methodology applies to any -link manipulator that satisfies Eq. (1), which makes no assumptions on linearity. If we can design the input torques (e.g., using feedback linearization) to satisfy Eq. (2), we obtain Eq. (21). This choice of may either be exact (Sec. 5.1) or may have some terms that do not cancel out (e.g., disturbance/noise as considered in Sec. 5.2). Alternatively, may be designed using neural networks [12] or high-gain observers [24] that have good disturbance rejection properties.
With respect to the human-impedance model (8), the model allows us to restrict the search of the optimal controller to an affine form. However, our proposed approach is general and applies to any other impedance model for which an appropriate parameterization of the controller can be made.
6 Conclusion
We provide a multi-fidelity framework to find the optimal set of impedance parameters for a human–robot cooperative manipulation system using only input–output data. By treating prior operator data as a low-fidelity model, we are able to further optimize the system’s performance for a new operator. We establish how the AR-1 model improves the regret bound through the conditional covariance and then numerically simulate human–robot cooperative manipulation to demonstrate this improvement in regret.
In the future, we plan to validate this framework through physical experiments with human subjects and a robotic manipulator.
Footnotes
Paper presented at the 2024 Modeling, Estimation, and Control Conference (MECC 2024), Chicago, IL, Oct. 28–30, 2024, Paper No. MECC2024-68.
For ease of presentation, we assume the joint space and the workspace are -dimensional. The ideas extend to redundant manipulators using standard modifications [19].
This generalizes the simplification found in the proof of Theorem 3.2 in Ref. [16], where and .
Acknowledgment
This work was supported in part by ARO grant W911NF-18-1-0325 and in part by NSF Award CNS-2134076.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The authors attest that all data for this study are included in the paper.
Appendix: Analysis and the Proof of the Main Result
We start with a basic result that will help us establish an upper bound on the conditional covariance of an AR-1 GP.
Note, this form is given in Ref. [25, Eq. (191)], which denotes it as an approximation but does not state the direction of the inequality.
Conditional covariance of a AR-1 GP
The covariance of the high-fidelity data conditioned on the low-fidelity data is upper bounded bydefined in Eq. (19).
Info. gain bound for an AR-1 GP
Letandbe the variance of the high- and low-fidelity measurement noise of a linear autoregressive GP. Then the maximum information gainis upper bounded bydefined in Eq. (20).