Learning Virtual Impedance for Control of a Human-Coupled Lower Exoskeleton

2018-06-11HUANGRuiCHENGHongandGUOHongliang

电子科技大学学报 2018年3期

HUANG Rui, CHENG Hong, and GUO Hong-liang

(School of Automation Engineering, University of Electronic Science and Technology of China Chengdu 611731)

Many lower exoskeletons have been developed for strength augmentation and walking assistance scenarios over the past few decades[1-5]. For strength augmentation related applications, lower exoskeletons are designed to track the pilot's motion with little interaction force between the exoskeleton and the pilot[6-8]. The controller of these lower exoskeletons can be roughly categorized into two categories, namely,sensor-based controller and model-based controller.

For sensor-based controllers, extra sensors are always employed to measure the pilot's information and/or the interaction force between the pilot and the exoskeleton[9-11]. With the measured sensory information, many variations of control strategies can be employed to control the lower exoskeleton, i.e.,impedance control strategies. For example, the Hybrid assistive limb (HAL) exoskeleton system is an impedance control strategy proposed by Y. Sankai based on measuring eletro-myo-graphical (EMG)signals of the pilot[12]. In the impedance control strategy of HAL system, EMG signals are utilized to calculate reference patterns of the pilot which aims at estimating the human-exoskeleton interaction (HEI)between the pilot and the exoskeleton[13]. Furthermore,based on measuring the pilot’s motion with acceleration sensors, the active-impedance control strategy[14]and the fuzzy-based impedance control strategy[15]are proposed to adapt to the changing interaction dynamics among different pilots. However,sensor-based controllers heavily rely on complex sensory systems, which are unstable sometimes and is mostly costly. Hence, sensor-based controllers are, to some extent, limited in most strength augmentation scenarios.

On the other hand, model-based controllers are designed to simplify the sensory system of the exoskeleton, which is only based on the information from exoskeleton itself. Sensitivity amplification control (SAC) is one of the model-based controllers proposed by the berkeley lower extremity exoskeleton(BLEEX)[16]. With a sensitivity factor in the modelbased controller, SAC can estimate the output joint torques based on current states (joint angle, angular velocity and angular acceleration) of the lower exoskeleton. The SAC strategy is able to reduce the interaction force between the pilot and the exoskeleton without measuring it directly, which also reduces the complexity of the exoskeleton sensory system.However, the SAC strategy requires accurate dynamic models of the lower exoskeleton (sensitive to model imperfections and different pilots) which makes the system identification process quite complicated[17].

In this paper, we propose a novel variable virtual impedance control (VVIC) strategy which inherits both advantages of sensor-based controllers and modelbased controllers. On the one hand, it is a model-based control strategy, which reduces the complexity of the exoskeleton sensory system. On the other hand, we apply a reinforcement learning method based on policy improvement and path integrals (PI2) to learn parameters of the virtual impedance model, which circumvents the complicated system identification process. The main contributions of this paper can be summarized as follows：

1) A novel VVIC strategy with a model-based controller named virtual impedance controller is proposed, which reduces the exoskeleton sensory system requirement;

2) To reduce the complicated system identification process, a reinforcement learning method is utilized to learn/optimize parameters of the virtual impedance controller of VVIC strategy;

3) The proposed VVIC strategy is verified on both a single DOF platform and HUALEX system.

The proposed VVIC strategy is firstly validated on a single DOF exoskeleton platform, and then tested on a HUALEX system. Experimental results show that the proposed VVIC strategy is able to adapt different HEI to different pilots when compared with canonical model-based control strategies.

1 Virtual Impedance Control Strategy

This section presents the proposed virtual impedance control strategy. We will firstly introduce the design details of virtual impedance controller in Subsection 1.1. Then, in Subsection 1.2, we analyze the stability of the proposed model-based controller.

1.1 Virtual Impedance Controller

For the control of lower exoskeletons for strength augmentation related applications, the pilot always plays as a master role in the human-coupled system,which means that the exoskeleton should follow/track the pilot's motion. A general control system block diagram with the model-based controller for the single degree of freedom (DOF) case is depicted in Fig. 1,where： G represents the transfer function of the lower exoskeleton, C is the designed model-based controller of the lower exoskeleton. Khmis the impedance between the pilot and the exoskeleton. qeand qhindicate the joint angles of lower exoskeleton and the pilot, respectively. Thmis the resulting interaction torque applied by the pilot. Tactis the output torque applied by the lower exoskeleton actuator.

Fig. 1 A general control system diagram with the model-based controller for the single DOF case

As shown in Fig. 1, the input torque of the lower exoskeleton is combined with the actuator output torque Tactand pilot's resulting interaction torque Thm.The design goal of lower exoskeleton controller is to reduce the interaction torque, which also means that the exoskeleton can track the pilot's motion as soon as possible.

The traditional impedance controller always be designed as in Eq. (1), the pilot's angles are taken as inputs to the controller：whereis the estimated dynamics of the lower exoskeleton. k and d are stiffness and damping parameters of the designed impedance model,respectively. However, for the design of model-based controllers, we do not measure sensory information from the pilot. In the exoskeleton control of strength augmentation scenarios, the exoskeleton will receive the pilot's joint states after several control cycles.Therefore, a virtual impedance model is presented for the model-based exoskeleton controller：

wherehk andhd are positive parameters of the virtual impedance model. Hence, the proposed virtual impedance controller can be represented as Eq. (4),which is a model-based controller only based on the states of the lower exoskeleton：

where Kh= k khand Dh= d dhare virtual impedance factors of the proposed virtual impedance controller.

1.2 Stability Analysis

Since the design goal of the lower exoskeleton controller is to reduce the interaction torque, Thmapproximates to zero, and the stability of the system can be guaranteed by the stability of qe/qh[18].

From Fig. 1, the open loop system equation can be represented as：

The proposed virtual impedance controller can be rewritten as：

where Vh= Dhs2+ Khs . Through the model-based controller in Eq. (6) and the system equation described in Eq. (5), we have：

then the equation of qe/qhcan be obtained as：

If the dynamics of lower exoskeleton is estimated accurately (1G= ), then Eq. (8) can be simplified to：

Since virtual impedance parameters Dhand Khand the impedance Khmall have positive values, the control system is always stable when the dynamics of lower exoskeleton can be accurately estimated.

Another situation is that we haven't gained accurate dynamic models of the lower exoskeleton. In this case, we consider a single DOF exoskeleton with the second order dynamics and ignore the gravity composition, which indicates that：

where J and B represent the inertial moment and viscous friction of the lower exoskeleton, respectively.The estimated exoskeleton dynamicscan be represented as：

whereandrepresent the estimated inertial moment and viscous friction parameters. From Eq. (10)and Eq. (11), the equation of qe/qhcan be represented as：

According to Eq. (12), if the virtual impedance parameters Dhand Khare small enough (always positive), the system will still be stable when the dynamic model of lower exoskeleton is not over estimated (J ＜, B ＜). Hence, the system is always stable when the dynamic model of lower exoskeleton is not over estimated.

2 Virtual Impedance Adaptation through Reinforcement Learning

For the implementation of traditional model-based controllers, i.e. SAC in BLEEX system, the systemidentification process is often employed to obtain system dynamics and human-related parameters of the designed controller (sensitivity factors in SAC)[19].However, the lower exoskeleton is a kind of humancoupled system for different pilots, which requires that the controller needs to recalibrate for different pilots.

In this paper, a model-free reinforcement learning method is employed to learn the optimal virtual impedance parameters of VIC, which aims at adapting with different HEI for different pilots. Combining the learning process and the model-based controller, which is named as the VVIC strategy, we can reduce the system sensor requirement as well as the system identification process. In the reinforcement learning process, a model-free reinforcement learning method named policy improvement and path integral (PI2)algorithm[20-21]is employed to learn the parameters Khand Dhof VIC.

The parameterized policy of PI2is defined as：

where Θ is a vector of virtual impedance parameters[Kh, Dh]Tand ϵtindicate the exploration noise. WtTis the basis function with Gaussian kernels ω ：

Eq. (14) calculates the jthaverage weight, where n is the number of parameters which is to be learned(n=2 in our case).

In the implementation of VVIC strategy, we define the immediate cost based on the measured sensory information of the pilot. For single DOF case,the immediate cost function is defined as follows：

where α1andα2are positive scale factors. In order to obtain the pilot motion information during the learning process, inclinometers are utilized to measure the pilot's joint angle position qhand angular velocity

With the defined policy and cost function, the learning process of virtual impedance parameters based on PI2for single DOF algorithm is described in as：

1) Initialize the virtual impedance parameter vector Θ.

2) Initialize basis function Wtiaccording Eq. (14).

3) Repeat.

4) Run K gait cycles of the exoskeleton using stochastic parameters Θ +ϵtat every time step.

5) For all gait cycles k ∈[1,K]：

6) Compute the projection matrix M through Eq.(16).

7) Compute the stochastic cost S through Eq. (17).

8) Compute the probability P through Eq. (18).

9) For all time steps i∈[1,T]：

10) Compute ΔΘtifor each time step through Eq.(19).

11) Normalize ΔΘ according Eq. (20).

12) Update Θ ← Θ + ΔΘ .

13) Run one noiseless gait cycle to compute the trajectory cost R through Eq. (21).

14) Until Trajectory cost R is converged.

As shown in Tab. 1, virtual impedance parameters of VIC will be updated every K+1 gait cycles. The updating rule is described in Tab. 1 with Eq. (16) to Eq.(20).

The matrix H in Eq. (16) and Eq. (17) is a positive semi-definite weight matrix. The scale factor λ in Eq.(18) is set within (0,1]. With updated parameter vector Θ, a noiseless gait cycle (without exploration noise ϵt) is taken to determine whether the learning process should be terminated through calculating the trajectory cost R：

where 1/dtρ= (dt indicates time duration of the gait cycle) is a normalization factor, since the duration of each gait cycle are always different in real-time applications.

In the implementation of VVIC strategy, the reinforcement learning process needs to be taken in the case of different pilots, which learns optimal virtual parameters to adapt different HEI to different pilots.Afterwards, with the learned optimal model-based controller, the lower exoskeleton is able to track the pilot's motion as soon as possible based only on joint information of lower exoskeleton.

3 Experiments and Discussions

In this section, the proposed VVIC strategy is validated both on a single DOF case in simulation environment and the HUALEX system. Experimental results and discussions will be introduced in next two subsections.

3.1 Single DOF Case in Simulation Environment

3.1.1 Introduction to the Single DOF Exoskeleton Platform

Fig.2 illustrates the model of single DOF exoskeleton when coupling with the pilot in knee joint.As a human-coupled system, the resultant torque on exoskeleton knee joint is combined with two parts： one is Tactwhich is provided by the exoskeleton actuator,and another is Thmwhich is provided by the pilot through compliant connection between the exoskeleton and the pilot.

Fig. 2 Model of single DOF exoskeleton coupling with the pilot in knee joint

The dynamics of single DOF exoskeleton including the pilot is defined as Eq. (22) in the simulation environment：

where the last term mgl⋅sin qeis the gravity composition. Hence, according the control law of proposed VIC in Eq. (4), the controller of single DOF exoskeleton is designed as follows：

whereandare estimated inertial moment and viscous friction parameters, respectively. Khand Dhare the virtual impedance parameters of proposed VVIC strategy which should be learned to adapt different HEI to different pilots.

3.1.2 Experiments of Simulated Single DOF Exoskeleton

In the experiments of simulated single DOF exoskeleton, different values of the impedance Khm(described in Fig.1) are used to simulate different HEIs to different pilots. Here we choose three different impedance Khm. The estimated dynamic parameters of model-based controllerandare set as=0.9 J,=0.9 B with suitable values. Pilot's motion angles are set as periodic sine waves with different frequencies and amplitudes in simulation experiments.

Fig. 3 Learning curves of reinforcement learning process for simulated different pilots

In the learning process of the proposed VVIC strategy, the exoskeleton should take several gait cycles to obtain optimal virtual impedance parameters of the controller. The exoskeleton updates parameters every 4(K=4) gait cycle and spends one gait cycle to calculate trajectory cost R (the parameters Θ will beupdated every 5 gait cycles). Weight parameters1α and2α of immediate cost function (described in Eq.(15)) are both chosen as 1 500. Fig. 3 illustrates learning curves of reinforcement learning process for different simulated pilots (relationship between values of impedance parametershmK is C＞B＞A). As shown in Fig. 3, the learning process will take almost 120 gait cycles (24 updates) to obtain optimal virtual impedance parameters (trajectory cost R converged).

After obtaining the optimal parameters of the VVIC strategy, comparative experiments are carried out to compare the proposed VVIC and traditional SAC algorithm. Fig. 4 shows the control performances of the proposed VVIC strategy and SAC algorithm with pilot A. In the comparison experiments, we choose 11 gait cycles (total 50 gait cycles) with different motion patterns to compare control performances of VVIC strategy and SAC algorithm.Black curves in Fig. 4 represent the interaction force between the pilot and exoskeleton, which is calculated by a spring-damping model in the simulator. As shown in Fig. 4, experimental results show that the proposed VVIC strategy achieves better performance (with less interaction force) than the traditional SAC algorithm.

Fig. 4 Control performances of the proposed VVIC strategy and SAC algorithm

Tab. 1 shows the normalized mean square error(nMSE) of VVIC strategy and SAC algorithm in total 50 gait cycles with three different simulated pilots.Results show that the proposed VVIC strategy achieves better performance when dealing with different HEI to different pilots, e.g. with simulated pilot C, nMSE of the SAC algorithm is almost three times comparing with the proposed VVIC strategy(0.124 rad compare to 0.038 rad).

Table 1 Comparison of VVIC strategy and SAC algorithmfor three different simulated pilots in single DOF case

3.2 Experiments on the HUALEX System

3.2.1 Introduction to the HUALEX System

HUALEX system is designed for the strength augmentation applications. Fig. 5 shows the total HUALEX system with a pilot. In Fig. 5, 1— The pilot;2— The load backpack with the power unit and main controller (rigid connection with the HUALEX spline);3— Semi-rigidly connecting HUALEX to the pilot(waist, thighs, shanks and feet); 4— Active joints with DC servo motors (hip joints and knee joints); 5—Node controllers for active joints; 6— Smart shoes with plantar sensors.

Fig. 5 HUALEX with the pilot

As shown in Fig. 5, four active joints (hips and knees) are designed to provide active torques for strength augmentation. Ankle joints of HUALEX system are energy-storage mechanisms which can store energy in the stance phase and release it in the swing phase. Between the pilot and HUALEX system, many compliant connections are utilized to connect the pilot and HUALEX system in a semi-rigid way.

The control system of HUALEX is combined with one main controller and four node controllers for each active joints. The control algorithm is running on the main controller, and node controllers are aiming to collect sensory information and execute control commands. In the HUALEX system, three kinds of sensors are utilized in the sensory system： 1) Encoders are embedded in each active joint to measure motion information of HUALEX. 2) IMU sensors are utilized to measure motion information of the pilot if necessary.3) Plantar sensors in smart shoes are aiming to judge the walking phases of HUALEX.

3.2.2 Experimental Setup

In experiments of the HULEX system, three different pilots (A： 172 cm/76 kg, B： 176 cm/80 kg, C：180 cm/96 kg) are chosen to operate the HUALEX system in sequence, which indicates that during learning process of VVIC strategy, the learned optimal parameters of VVIC with pilot A will be regarded as initial values of VVIC with pilot B (note that the VVIC parameters of each joint of HUALEX system are learned independently). During the learning process,IMU sensors are utilized to measure the pilot's motion information for obtaining optimal virtual impedance parameters. Besides the virtual impedance parameters of VVIC, parameters of HUALEX dynamics are identified through Solidworks software. After obtaining optimal parameters of VVIC, IMU sensors are remained to capture the pilot's motion information(not use for control) which are aiming to validate the control performance of the proposed VVIC strategy.

3.2.3 Results and Discussions

Fig. 6 shows the learning curves of VVICs in the HUALEX system with different pilots (in left hip and knee joints). As discussed in experimental setup section, pilot A operates the HUALEX system at first so that the learning process of VVIC strategy needs to spend more training gait cycles (almost 140 gait cycles). With better initial values from pilot A, the learning process of pilot B and C can be reduced to almost 80 gait cycles.

After obtaining optimal virtual impedance parameters of the VVIC strategy through thereinforcement learning process, we validate the control performance of proposed VVIC strategy with comparison to the traditional SAC algorithm. The results show that the proposed VVIC strategy achieves good control performances. Moreover, Tab. 2 gives the comparison of the VVIC strategy and SAC algorithm with different pilots (100 gait cycles for each pilot). As shown in Tab. 2, the proposed VVIC strategy achieves better performances in experiments of the HUALEX system with different pilots, e.g. in the right knee joint of pilot C, the nMSE of SAC algorithm is almost three times than that of the VVIC strategy (0.094/rad compare to 0.032/rad).

Fig. 6 Learning curves of VVICs in the HUALEX system withdifferent pilots at joint

Table 2 Comparison of SAC and VVIC strategy in HUALEX with different pilots in total 100 gait cycles

4 Conclusions and Future Work

This paper has proposed a novel VVIC strategy to control of a HUALEX system, which aims at adapting different HEI to different pilots. The proposed VVIC strategy is based on a novel VIC, which is a model-based controller with a virtual impedance model.In order to adapt different HEI to different pilots, the PI2reinforcement learning algorithm is employed to obtain optimal parameters in virtual impedance of VIC.Control performances of the proposed VVIC strategy are validated on a single DOF exoskeleton simulation environment as well as the HUALEX system.Experimental results indicate that the proposed VVIC has better performances compared with the traditional SAC algorithm, and can deal with variation HEI from different pilots.

In the future, we will investigate the methods which can learn/update the parameters of VVIC online.In this case, the HUALEX will be able to 'get used to'the pilot during the operation process. Moreover, the estimation to the accurate dynamic models of HUALEX is also important, the accurate dynamic models always achieve better performances for model-based controller in strength augmentation lower exoskeletons.

[1]KAZEROONI H, CHU A, STEGER R. That which does not stabilize, will only make us stronger[J]. International Journal of Robotics Research, 2007, 26(1)： 5-89.

[2]SANKAI Y. HAL： Hybrid assistive limb based on cybernics[J]. Robotics Research, 2010： 25-34.

[3]WALSH C J, PALUSKA D, PASCH K, et al. Development of a lightweight, under-actuated exoskeleton for loadcarrying augmentation[C]//IEEE International Conference on Robotics and Automation (ICRA). Florida, USA： IEEE,2006： 3485-3491.

[4]STAUSSE K A, KAZEROONI H. The development and testing of a human machine interface for a mobile medical exoskeleton[C]//IEEE International Conference on Intelligent Robots and Systems (IROS). California, USA：IEEE, 2011： 4911-4916.

[5]ESQUENAZI A, TALATY M, PACKEL A, et al. The rewalk powered exoskeleton to restore ambulatory function to individuals with theracic-level motor-complete spinal cord injury[J]. American Journal of Physical Medicine and Rehabilitation, 2012, 91(11)： 911-921.

[6]HUANG R, CHENG H, CHEN Q, et al. Interative learningfor sensitivity factors of a human-powered augmentation lower exoskeleton[C]//IEEE International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany：IEEE, 2015： 6409-6415.

[7]WALSH C J, PASCH K, HERR H. An autonomous,under-actuated exoskeleton for load-carrying augmentation[C]//IEEE International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany： IEEE,2006： 1410-1415.

[8]ZOSS A, KAZEROONI H, CHU A. On the mechanical design of the berkeley lower extremity exoskeleton (BLEEX)[C]//IEEE International Conference on Intelligent Robots and Systems (IROS). Edmonton, Canada： IEEE, 2005：3132-3139.

[9]TRAN H T, CHENG H, LIN X, et al. The relationship between physical human-exoskeleton interaction and dynamic factors： using a learning approach for control applications[J]. Science China Information Science, 2014,57(12)： 1-13.

[10]KAZEROONI H, STEGER R, HUANG L, et al. Hybrid control of the berkeley lower extremity exoskeleton(BLEEX)[J]. International Journal of Robotics Research,2006, 25(6)： 561-573.

[11]KAWAMOTO H, SANKAI Y. Power assist method based on phase sequence and muscle force condition for HAL[J].Advance Robotics, 2005, 19(7)： 717-734.

[12]LEE S, SANKAI Y. Power Assist control for walking aid with hal-3 based on EMG and impedance adjustment around knee joint[C]//International Conference on Intelligent Robots and Systems (IROS). Lausanne,Switzerland： [s.n.], 2002： 1499-1504.

[13]HAYASHI T, KAWAMOTO H, SANKAI Y. Control method of robot suit HAL working as operator's muscle using biologic and dynamical information[C]//IEEE International Conference on Intelligent Robots and Systems (IROS). Edmonton, Canada： IEEE, 2005： 3063-3068.

[14]GABRIEL A O, COLGATE J E, PESHKIN M A, et al.Active-impedance control of a lower-limb assistive exoskeleton[C]//IEEE International Conference on Rehabilitation Robotics. Noordwijk, Netherlands： IEEE,2007： 188-195.

[15]TRAN H T, CHENG H, DUONG M K, et al. Fuuzy-based impedance regulation for control of the coupled human-exoskeleton system[C]//IEEE International Conference on Robotics and Biomimetics. Bali, Indonesia：IEEE, 2014： 986-992.

[16]KAZEROONI H, RACINE J L, HUANG L, et al. On the control of the berkeley lower extremity exoskeleton(BLEEX)[C]//International Conference of Robotics and Automation (ICRA). Barcelona, Spain： [s.n.], 2005：4353-4360.

[17]GHAN J, STEGER R, KAZEROONI H. Control and system identification for the berkeley lower extremity exoskeleton[J]. Advanced Robotics, 2006, 20(9)： 989-1014.

[18]RACINE J L. Control of a lower extremity exoskeleton for human performance amplification[D]. California, USA：University of California, Berkeley, 2003.

[19]GHAN J, KAZEROONI H. System identification for the berkeley lower extremity exoskeleton (BLEEX) [C]//International Conference of Robotics and Automation(ICRA). Florida, USA： [s.n.], 2006： 3477-3484.

[20]THEODOROU E A, BUCHILI J, SCHAAL S. A generalized path integral control aproach to reinforcement learning[J]. Journal of Machine Learning Research,2010,11： 3137-3181.

[21]BUCHLI J, STULP F, THEODOROU E A, et al. Learning variable impedance control[J]. International Journal of Robotics Research, 2011, 30(7)： 820-833.

电子科技大学学报

2018年3期