Adaptive Optimal Discrete-Time Output-Feedback Using an Internal Model Principle and Adaptive Dynamic Programming

2024-01-27ZhongyangWangYouqingWangandZdzisawKowalczuk

IEEE/CAA Journal of Automatica Sinica 2024年1期

Zhongyang Wang , Youqing Wang ,,, and Zdzisław Kowalczuk ,,

Abstract—In order to address the output feedback issue for linear discrete-time systems, this work suggests a brand-new adaptive dynamic programming (ADP) technique based on the internal model principle (IMP).The proposed method, termed as IMP-ADP, does not require complete state feedback-merely the measurement of input and output data.More specifically, based on the IMP, the output control problem can first be converted into a stabilization problem.We then design an observer to reproduce the full state of the system by measuring the inputs and outputs.Moreover, this technique includes both a policy iteration algorithm and a value iteration algorithm to determine the optimal feedback gain without using a dynamic system model.It is important that with this concept one does not need to solve the regulator equation.Finally, this control method was tested on an inverter system of grid-connected LCLs to demonstrate that the proposed method provides the desired performance in terms of both tracking and disturbance rejection.

I.INTRODUCTION

A significant concern in control theory and implementation is output regulation.The goal of the output regulation problem is to develop a feedback controller that can monitor a set of reference inputs, reject a set of disturbances, and maintain the uniform final boundedness of the closed-loop system’s overall signals [1].Generally, a systematic and accurate mathematical model is required to solve the solutions of output regulation problem [1], [2].The study of output regulation problems has become a challenging research area in the realm of control because of the unpredictability, complexity,and diversity of the controlled object system [3], [4].

To solve the control challenge of uncertain systems, some data-driven and adaptive control methods have been proposed.In [5], a PID-like adaptive fuzzy controller was proposed to enhance the functionality of the closed-loop system.In [6],adaptive sliding mode control was applied to the power system.In [7], the authors suggested data-driven iterative learning control.Traditional adaptive control techniques offer the desired stability of a closed-loop system while requiring no precise knowledge of the system, but they cannot ensure the closed-loop system’s best performance.In addition, learningbased methods are also valuable approaches to the control problem.For example, the authors of [8] developed a learning-based method for stable servo control using a complex learning system used to control microrobots.

The dynamic programming principle is a basic tool for analyzing optimal control problems.Dynamic programming mainly solves the Hamilton-Jacobi-Bellman (HJB) equation[9].However, the HJB equation is generally challenging to solve numerically.The linear quadratic regulator (LQR) problem and the dynamic programming problem can both be equivalent for linear systems [10], as can the HJB equation and the algebraic Riccati equation (ARE).Still, the ARE is a nonlinear equation and a correct system model is needed to solve it.Currently, policy iteration (PI) and value iteration(VI) are the two implementation strategies available for addressing the ARE.The PI process includes policy evaluation and enhancement that are repeatedly iterated until the policy converges.The VI process includes optimal value function determination and policy improvement.The difference between the two algorithms is that the PI algorithm must be evaluated for stability, and the VI algorithm does not rely on the initial stable controller during learning.However, an accurate dynamic model is required to solve LQR.

Adaptive dynamic programming (ADP) is considered as a useful technique for solving optimal control without a datadriven dynamic model [11]–[17].ADP is a new approximate optimization technique in optimal control that expands reinforcement learning, power systems, and turbofan engines [11],[18]–[20].

Two categories of approaches-the feedforward-feedback(FF-FB) and internal model principle (IMP) are used to address the output regulation problem.With the feedbackfeedforward principle, the regulator equation must be solved for optimal feedforward gain, and the controller parameters are readjusted when the reference signal changes.In [21], the output regulation theory and ADP technology are combined in a novel way to solve the problem of adaptive optimal output regulation and achieve interference suppression and progressive tracking.Recently, an extension of this approach in [21]has been with great interest in the continuous-time (CT)[22]–[24] and discrete-time (DT) systems [25]–[29].

The output regulation mentioned in the literature above is based on full state feedback.In most practical situations, it is very difficult or even impossible to measure the complete state of the system and usually requires a large number of sensors.This makes it more expensive to implement state feedback technology compared to output feedback (due to larger hardware) [30].Based on measurement feedback, the best output feedback for linear discrete systems has been proposed in[29].However, this method requires adjusting the controller when the reference signal changes.Therefore, the study of output feedback is of practical significance.

Here, a brand-new ADP technique is provided to address the output feedback issue of linear DT systems with external disturbances.The output control problem is transformed into the problem of stabilizing an enhanced system according to the IMP [31].Then, an optimal model-free controller IMPADP based on the PI and VI algorithm is developed.Keep in mind that the planned output feedback control does not require the system’s whole state information.The current learning algorithm’s convergence to its ideal value is then tested.

To the authors’ best knowledge, the main contributions of this paper are as follows:

1) In [31], the authors proposed an adaptive optimal control method based on IMP for controlling multi-agent systems.However, this method requires access to all system state information.

2) In [27], the authors proposed an adaptive optimal output feedback for linear discrete systems, which guarantees trajectory tracking.Although their method uses inputs and outputs to design the controller, it is necessary to solve the regulator equation here to obtain the feedforward gain.In addition,when the reference signal or external disturbance changes, the feedforward gain must be recalculated to keep the system stable.

3) In [29], the authors proposed an adaptive optimal control with output feedback for linear discrete systems.Unlike optimal tracking control [27], their method directly determines the control gain without the need to solve the controller equation.However, when the reference input or external disturbance changes, the robustness of the system deteriorates due to the use of more delayed data.

4) In [28], the authors proposed an optimal control with fixed point tracking for linear discrete systems.Their method can track more general signals, rather than being limited to tracking fixed points, such as sine or cosine signals with varying amplitudes.

At the end of this introduction, the contribution of this study can also be briefly summarized as follows.First, the current driver can be successfully implemented using only input and output.Second, our method avoids solving the regulator equation.Third, this method does not require adjustment of the regulator in the event of a change in the reference input or external disturbances.

The remaining sections of this study are structured as follows.Section II discusses the linear optimal output feedback problem, the IMP, and LQR.Model-free LQR by measurement feedback using the PI and VI algorithms is discussed in Section III to determine the best controller for an unknown dynamic model.In Section IV, the simulation results are provided to show that the suggested strategy works.Section V presents conclusions and further research.

Definition 1[21]:Consider a symmetric matrix,Q∈Rm×m,an asymmetric matrix,X∈Rn×m, arbitrary matrices,A∈Rn×m,B∈Rm×q, andC∈Rq×r, a vector,z(t)∈Rn, as well as the following vectors defined as:

II.OUTPUT FEEDBACK PROBLEM FOR LINEAR DT SYSTEMS

A. Problem Description

Consider the next linear DT system

The exo-system signalv(k) and reference trajectoryyd(k)can be expressed as

It can be directly shown that under the controller (3)

Also, it is clear that if a system is stable, its inputs and states will exhibit the same dynamics as an external signal.

B. Internal Model Principle

The characteristic equation ofSis

Hence, one has

We define combined variables

Then, as a result

Define γ(k) = [e(k-q),e(k-q+ 1),...,e(k- 1)]T∈Rq.Then, we have

Hence

where

Integrating (8) and (10), a new linear DT system can be obtained

The ideal feedback controller is obtained in accordance with Problem 1 by transforming the optimal output regulation problem into the optimal stabilization problem based on (11).

Problem 1:

whereQ=QT≥0,R=RT>0, andobservable.

The ideal controller can be stated as follows using linear optimal control:

C. Linear Quadratic Regulator

Problem 1 corresponds to the LQR problem, which is the only answer to the following algebraic Riccati equation(ARE), according to linear optimum control theory:

wherePT=P≥0.

Equation (14) is a nonlinear equation, difficult to solve directly.A PI algorithm (see Algorithm 1) and VI algorithm(see Algorithm 2) can be used to iteratively solve the LQR problem if the system matrix is known.

Algorithm 2 Model-based LQR using VI algorithm P0=PT0 >0 j=0 τ>0 Step 1: Choose the initial value matrix , let and.Step 2: Find the value matrix by P j+1 Pj+1= ¯AT P j ¯A+ ¯CT Q¯C- ¯AT Pj¯B(R+ ¯BT P j¯B)-1¯BT P j ¯A.(17)Step 3: Repeat with until for.¯K j j ←j+1 ||P j-Pj-1||<τ j ≥1 Step 4: Find the control matrix by¯Kj=(R+ ¯BT Pj¯B)-1¯BT Pj ¯A.(18)

III.DESIGN OF ADAPTIVE OPTIMAL CONTROLLER BASED ON ADP

Algorithms 1 and 2 require knowledge of the system matricesA¯ andB¯.Here, we developed an adaptive optimal controller for an unidentified dynamic model based on data.

A. State Reconstruction

In this subsection, information from measurement inputs and outputs is used to reconstruct the full state.The authors[34] designed an input and output observer to reconstruct the full state of linear CT systems.In this study, we apply their method to reconstruct the full state of linear DT systems.

Theorem 1: Consider the state as

are constructed as

where

The aforementioned equation can be expanded as follows:

B. Adaptive Optimal Control via Output-Feedback Using PI Algorithm

Taking anarbitrary initial control strategyu′(k) and applying it to system (11)

withu¯′(k)=u′(k)+a1u′(k-1)+···+aqu′(k-q), we have

By (36), (40) can be rewritten as follows:

where Pj=MT PjM , and the matrix Kj+1results from

Given a positive integerk0and a sufficiently large positive integers, data is collected online:ψ(k0)toψ(k0+s), andu¯′(k0)tou¯′(k0+s).The following matrix linear equation form is represented by (41):

where

The model-free LQR via output-feedback using the PI algorithm is shown as Algorithm 3.

Algorithm 3 Model-free LQR via output-feedback using the PI algorithm Step 1: Given a stable initial control gain , , and.Step 2: Collect the online data: Ψ and Φ.Pj ¯BTj P¯Bj MT(¯A- ¯B¯K j)T Pj¯B K j+1 Step 3: Solve , , and by (43) and K0 j ←0 τ>0 by Kj+1=(R+ ¯BT Pj¯B)-1×(¯BT Pj(¯A- ¯B¯K j)M+ ¯BT P j¯BK j).(44)Step 4: Repeat with until for.K j ←j+1 ||P j-P j-1||<τ j ≥1 Step 5: Find the control matrix by K∗=Kj.(45)

C. Adaptive Optimal Control via Output-Feedback Using VI Algorithm

The PI algorithm requires an initial stability test.This subsection uses the VI algorithm to design the controller.Under an initial controlu′(k), we have

By (36), (46) can be rewritten as follows:

The matrices Pj+1and Kjcan be formed as

where

The model-free LQR via output-feedback using the VI algorithm is shown as Algorithm 4.

Algorithm 4 Model-free LQR via output-feedback using the VI algorithm P0=PT0 >0 j,q ←0 τ>0 Step 1: Choose , , and.Step 2: Collect online data Γ and.MT(¯AT Pj ¯A+ ¯CT Q¯C)M ¯BT P j¯B MT ¯AT Pj¯B Pj+1ΩStep 3: Solve , , and by (49)and by Pj+1=MT(¯AT P j ¯A+ ¯CT Q¯C)M-MT ¯AT Pj¯B(R+ ¯BT Pj¯B)-1¯BT Pj ¯AM.(50)Step 4: Repeat with until for.j ←j+1 ||P j-P j-1||<τ j ≥1 Step 5: Find the control matrix by K j+1 Kj+1=(R+ ¯BT Pj+1¯B)-1¯BT Pj+1 ¯AM.(51)

Remark 2: The condition so that (43) and (49) have unique solutions is that the matrix Ψ and Γ column are full, i.e.,

The ADP control block diagram based on output-feedback is shown in Fig.1.

Fig.1.ADP control block diagram based on output-feedback.

D. Stability and Convergence Analysis of ADP Algorithm

IV.SIMULATION ANALYSIS

We use an LCL grid-connected inverter system as an example to demonstrate how well the suggested approach works for the linear optimal output feedback problem.The structure of the LCL grid-connected inverter is shown in Fig.2.

Fig.2.Topology diagram of the LCL grid-connected inverter.

In thisexample, the system sampling time is set toT=10-4sec.The system’s DT form can be written as [29]

The internal model matrices of the external matrixSare given as

The system matrices by the IMP are

The optimal control gain obtained by solving the LQR problem directly is characterized by

The approximate optimal control gain obtained by the PI algorithm is represented as

The approximate optimal control gain obtained by the VI algorithm is given by

In order to verify the effectiveness and show the superiority of the proposed IMP-ADP algorithm, a comparative simulation analysis was carried out using also other methods:Method 1 [31], Method 2 (FF-FB) [21], Method 3 [27],Method 4 [29], with our approach (IMP-ADP) as Method 5.

Fig.3.Convergence of the optimal control matrix {||Kj-K∗||} and value matrix { ||P j-P∗||} using the PI algorithm.

Fig.4.Convergence of the optimal control matrix {||Kj-K∗||} and value matrix { ||P j-P∗||} using the VI algorithm.

Fig.5.Comparative simulation results for system output and reference trajectory related to different control Methods 1-5.

The reference trajectoriesyd(k) and the outputy(k) are shown in Fig.5.For the IMP-ADP method, the designed signalu(k) properly controls the outputy(k) in order to asymptotically track the reference trajectoryyd(k), which changes from 10sin(100πt+π/6) to 3 0sin(100πt+π/6) int=0.04 s.Despite changing the amplitude of the mains voltage from 150 V to 220 V int=0.08 s, the output of they(k) circuit can still quickly follow the intended trajectoryyd(k) without adjusting the regulator.In this way, the proposed algorithm was verified in terms of effectiveness.

Considering Method 1, Method 5 represents a state feedback based on the IMP.The results indicate that the designed regulator with output feedback can be equivalent to a regulator with state feedback.

Let’s consider Methods 2 and 3, which involve solving the regulator equation.When the reference input or grid voltage changes, the system output cannot match the reference input without adjusting the controller accordingly.This is a limitation of the two methods.

Comparing Methods 5 and 4, it can be seen that although Method 4 can also track the reference signal, Method 5 (IMPADP) has better resistance to changes in the reference signal or grid voltage.

V.CONCLUSIONS

In this study, we looked at the issue of input and output information measurement-based linear optimal output feedback using the IMP concept and ADP.To mitigate the impact of multi-observer operation on the system’s dynamic performance, this study used the technique of output feedback.In addition, we took into account the PI and VI algorithms to discover the best controller without utilizing any system model data.The simulated example demonstrated the suggested algorithm’s effectiveness under widely held assumptions.Future research will focus on non-linear systems’ ideal output feedback.

IEEE/CAA Journal of Automatica Sinica

2024年1期