It is well known that nearly all modern control systems are implemented digitally, which results in the research significance of sampled-data systems. In industrial process control, there commonly exist conditions where the periods for the practical plant inputs are different. Then traditional and advanced control methods for sampled-data systems will not adapt to such multirate systems. Researchers noticed this problem in the 1950s and Kranc first used the switch decomposition method to solve this problem. Kalman and Bertram, Friedland, and Meyer also made contribution to the development of multirate systems. In 1990, the lifting technique was brought out to simplify the multirate problems by converting these systems to the equivalent discrete systems. The topic became active ever since.
Based on the lifting method, standard control methods can be applied to solve the multirate problems. With the development of the advanced control theory, more and more research has been reported so far.
However, all controllers mentioned above are designed according to the system dynamics model. When system structure is unknown or system parameters are uncertain, these controllers will not satisfy our demands. The authors in this paper aim to design a controller that can make use of the input and output data to optimize itself and we denote this kind of controller as a model-free controller.
Reinforcement learning (RL) is an important branch of machine learning. Famous research groups utilize RL to solve artificial intelligence problems and teach robots to play games. Through the interactions with environment, the cognitive agents can obtain the rewards of their actions. With the utilization of the value function, which is calculated by rewards, agents use the RL algorithm to optimize the policy. A similar idea was brought from control theory by Bertsekas and Tsitsiklis in 1995, which is adaptive dynamic programming (ADP). And in past decades, this method was utilized to deal with output regulation problems, switch systems, nonlinear systems, sliding mode control and so on. Both ADP and RL are studied based on the Bellman equation and researchers combine these two algorithms and apply it for solving control problems. RL algorithms have been used to deal with controller design problems Also the optimal regulation problem was solved by Kamalapurkar et al. The RL algorithm can optimize the policy only with the use of the input and output data, which discards the requirements of system dynamics. Such model-free algorithms were applied for solving discrete systems and heterogeneous systems. Controller design methods based on reinforcement learning have many directions. Madady et al., Li et al. proposed a RL based control structure to train neuro-network controllers for a helicopter. Similar methods can also be applied in unmanned aerial vehicle (UAV). Some other learning-based control methods can also be used in the servo control systems and traffic systems. In this paper, authors aim to design a model-free optimal controller for multirate systems through similar schemes.
In this paper, a model-free algorithm based on RL is developed to help us to design an optimal controller for multirate systems. We assume that the sampling periods for the state variables are different from the periods of the system inputs. Instead of the lifting method, a different technique was used to convert the multirate systems into an equivalent discrete system. With matrix transformations, we put forward an algorithm to design a linear quadratic regulator (LQR) controller for multirate systems. Later, we propose the definition of the behavior policy and target policy, and then an off-policy algorithm based on RL was provided. With the utilization of the least squares (LS) method, we reformulate the offpolicy algorithm into a model-free RL algorithm, which can help us to optimize the controller in an uncertain environment. Finally, an example is presented to illustrate the applicability and efficiency of the proposed methods.
The paper is organized as follows. A multirate system model with a state feedback controller is provided in Section 2. Section 3 proposes a controller design method and three controller optimization methods. Finally, Section 4 gives an industrial example to illustrate the applicability of the methods above mentioned.
【Download Full Text】
Controller Optimization for Multirate Systems Based on Reinforcement Learning
Zhan Li, Sheng-Ri Xue, Xing-Hu Yu, Hui-Jun Gao
The goal of this paper is to design a model-free optimal controller for the multirate system based on reinforcement learning. Sampled-data control systems are widely used in the industrial production process and multirate sampling has attracted much attention in the study of the sampled-data control theory. In this paper, we assume the sampling periods for state variables are different from periods for system inputs. Under this condition, we can obtain an equivalent discrete-time system using the lifting technique. Then, we provide an algorithm to solve the linear quadratic regulator (LQR) control problem of multirate systems with the utilization of matrix substitutions. Based on a reinforcement learning method, we use online policy iteration and off-policy algorithms to optimize the controller for multirate systems. By using the least squares method, we convert the off-policy algorithm into a model-free reinforcement learning algorithm, which only requires the input and output data of the system. Finally, we use an example to illustrate the applicability and efficiency of the model-free algorithm above mentioned.
Multirate system, reinforcement learning, policy iteration, optimal control, controller optimization.
For more up-to-date information:
1) WeChat: IJAC
3) Facebook:International Journal of Automation and Computing
4) Linkedin: Int.J. of Automation and Computing
5) Sina Weibo:IJAC-国际自动化与计算杂志