Zhan Li, Sheng-Ri Xue, Xing-Hu Yu, Hui-Jun Gao. Controller Optimization for Multirate Systems Based on Reinforcement Learning. International Journal of Automation and Computing, vol. 17, no. 3, pp.417-427, 2020. https://doi.org/10.1007/s11633-020-1229-0
Citation: Zhan Li, Sheng-Ri Xue, Xing-Hu Yu, Hui-Jun Gao. Controller Optimization for Multirate Systems Based on Reinforcement Learning. International Journal of Automation and Computing, vol. 17, no. 3, pp.417-427, 2020. https://doi.org/10.1007/s11633-020-1229-0

Controller Optimization for Multirate Systems Based on Reinforcement Learning

doi: 10.1007/s11633-020-1229-0
More Information
  • Author Bio:

    Zhan Li received the Ph. D. degree in control science and engineering from Harbin Institute of Technology, Harbin, China in 2015. He is currently an associate professor with Research Institute of Intelligent Control and Systems, School of Astronautics, Harbin Institute of Technology, China.His research interests include motion control, industrial robot control, robust control of small unmanned aerial vehicles (UAVs), and cooperative control of multivehicle systems. E-mail: zhanli@hit.edu.cn ORCID iD: 0000-0002-7601-4332

    Sheng-Ri Xue received the B. Sc. degree in automation engineering from Harbin Institute of Technology, China in 2015, where he is currently pursuing the Ph. D. degree with the Research Institute of Intelligent Control and Systems.His research interests include H-infinity control, controller optimization, reinforcement learning, and their applications to sampled-data control systems design. E-mail: srxue2015@126.com

    Xing-Hu Yu received the M. M. degree in osteopathic medicine from Jinzhou Medical University, China, in 2016. He is currently a Ph. D. degree candidate in control science and engineering from Harbin Institute of Technology, China.His research interests include intelligent control and biomedical image processing. E-mail: yuxinghu1012@126.com

    Hui-Jun Gao received the Ph. D. degree in control science and engineering from Harbin Institute of Technology, China in 2005. From 2005 to 2007, he carried out his postdoctoral research with Department of Electrical and Computer Engineering, University of Alberta, Canada. Since 2004, he has been with Harbin Institute of Technology, where he is currently a full professor, the Director of Inter-discipline Science Research Center, and the Director of the Research Institute of Intelligent Control and Systems. He is an IEEE Industrial Electronics Society Administration Committee Member, and a council member of IFAC. He is the Co-Editor-in-Chief for IEEE Transactions on Industrial Electronics, and an Associate Editor for Automatica, IEEE Transactions on Control Systems Technology, IEEE Transactions on Cybernetics, and IEEE/ASME Transactions on Mechatronics. His research interests include intelligent and robust control, robotics, mechatronics, and their engineering applications.E-mail: hjgao@hit.edu.cn (Corresponding author) ORCID iD: 0000-0001-5554-5452

  • Received Date: 2019-12-21
  • Accepted Date: 2020-02-21
  • Publish Online: 2020-04-14
  • Publish Date: 2020-06-04
  • The goal of this paper is to design a model-free optimal controller for the multirate system based on reinforcement learning. Sampled-data control systems are widely used in the industrial production process and multirate sampling has attracted much attention in the study of the sampled-data control theory. In this paper, we assume the sampling periods for state variables are different from periods for system inputs. Under this condition, we can obtain an equivalent discrete-time system using the lifting technique. Then, we provide an algorithm to solve the linear quadratic regulator (LQR) control problem of multirate systems with the utilization of matrix substitutions. Based on a reinforcement learning method, we use online policy iteration and off-policy algorithms to optimize the controller for multirate systems. By using the least squares method, we convert the off-policy algorithm into a model-free reinforcement learning algorithm, which only requires the input and output data of the system. Finally, we use an example to illustrate the applicability and efficiency of the model-free algorithm above mentioned.

     

  • loading
  • [1]
    P. Shi. Filtering on sampled-data systems with parametric uncertainty. IEEE Transactions on Automatic Control, vol. 43, no. 7, pp. 1022–1027, 1998. DOI: 10.1109/9.701119.
    [2]
    X. J. Han, Y. C. Ma. Sampled-data robust H control for T-S fuzzy time-delay systems with state quantization. International Journal of Control,Automation and Systems, vol. 17, no. 1, pp. 46–56, 2019. DOI: 10.1007/s12555-018-0279-3.
    [3]
    K. Abidi, Y. Yildiz, A. Annaswamy. Control of uncertain sampled-data systems: An adaptive posicast control approach. IEEE Transactions on Automatic Control, vol. 62, no. 5, pp. 2597–2602, 2017. DOI: 10.1109/TAC.2016.2600627.
    [4]
    T. Nguyen-Van. An observer based sampled-data control for class of scalar nonlinear systems using continualized discretization method. International Journal of Control,Automation and Systems, vol. 16, no. 2, pp. 709–716, 2018. DOI: 10.1007/s12555-016-0739-6.
    [5]
    R. J. Liu, J. F. Wu, D. Wang. Sampled-data fuzzy control of two-wheel inverted pendulums based on passivity theory. International Journal of Control,Automation and Systems, vol. 16, no. 5, pp. 2538–2648, 2018. DOI: 10.1007/s12555-018-0063-4.
    [6]
    R. E. Kalman, J. E. Bertram. A unified approach to the theory of sampling systems. Journal of the Franklin Institute, vol. 267, no. 5, pp. 405–436, 1959. DOI: 10.1016/0016-0032(59)90093-6.
    [7]
    B. Friedland. Sampled-data control systems containing periodically varying members. In Proceedings of the 1st IFAC World Conference, Moscow, Russia, pp. 361–367, 1961.
    [8]
    D. G. Meyer. A new class of shift-varying operators, their shift-invariant equivalents, and multirate digital systems. IEEE Transactions on Automatic Control, vol. 35, no. 4, pp. 429–433, 1990. DOI: 10.1109/9.52295.
    [9]
    T. W. Chen, L. Qiu. H design of general multirate sampled-data control systems. Automatica, vol. 30, no. 7, pp. 1139–1152, 1994. DOI: 10.1016/0005-1098(94)90210-0.
    [10]
    M. F. Sågfors, H. T. Toivonen, B. Lennartson. H control of multirate sampled-data systems: A state-space approach. Automatica, vol. 34, no. 4, pp. 415–428, 1998. DOI: 10.1016/S0005-1098(97)00236-7.
    [11]
    L. Qiu, K. Tan. Direct state space solution of multirate sampled-data H2 optimal control. Automatica, vol. 34, no. 11, pp. 1431–1437, 1998. DOI: 10.1016/S0005-1098(98)00080-6.
    [12]
    P. Colaneri, G. D. Nicolao. Multirate LQG control of continuous-time stochastic systems. Automatica, vol. 31, no. 4, pp. 591–595, 1995. DOI: 10.1016/0005-1098(95)98488-R.
    [13]
    N. Xiao, L. H. Xie, L. Qiu. Feedback stabilization of discrete-time networked systems over fading channels. IEEE Transactions on Automatic Control, vol. 57, no. 9, pp. 2167–2189, 2012. DOI: 10.1109/TAC.2012.2183450.
    [14]
    W. Chen, L. Qiu. Stabilization of networked control systems with multirate sampling. Automatica, vol. 49, no. 6, pp. 1528–1537, 2013. DOI: 10.1016/j.automatica.2013.02.010.
    [15]
    S. R. Xue, X. B. Yang, Z. Li, H. J. Gao. An approach to fault detection for multirate sampled-data systems with frequency specifications. IEEE Transactions on Systems,man,and cybernetics:Systems, vol. 48, no. 7, pp. 1155–1165, 2018. DOI: 10.1109/TSMC.2016.2645797.
    [16]
    M. Y. Zhong, H. Ye, S. X. Ding, G. Z. Wang. Observer-based fast rate fault detection for a class of multirate sampled-data systems. IEEE Transactions on Automatic control, vol. 52, no. 3, pp. 520–525, 2007. DOI: 10.1109/TAC.2006.890488.
    [17]
    H. J. Gao, S. R. Xue, S. Yin, J. B. Qiu, C. H. Wang. Output feedback control of multirate sampled-data systems with frequency specifications. IEEE Transactions on Control Systems Technology, vol. 25, no. 5, pp. 1599–1608, 2017. DOI: 10.1109/TCST.2016.2616379.
    [18]
    X. X. Guo, S. Singh, H. Lee, R. Lewis, X. S. Wang. Deep learning for real-time Atari game play using offline monte-carlo tree search planning. In Proceedings of the 27th International Conference on Neural Information Processing Systems, ACM, Montreal, Canada, pp. 3338-3346, 2014.
    [19]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: 10.1038/nature16961.
    [20]
    D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic programming: An overview. In Proceedings of the 34th IEEE Conference on Decision and Control, IEEE, New Orleans, USA, pp. 560–564, 1995. DOI: 10.1109/CDC.1995.478953.
    [21]
    F. Y. Wang, H. G. Zhang, D. R. Liu. Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39–47, 2009. DOI: 10.1109/MCI.2009.932261.
    [22]
    W. N. Gao, Z. P. Jiang. Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4164–4169, 2016. DOI: 10.1109/TAC.2016.2548662.
    [23]
    W. J. Lu, P. P. Zhu, S. Ferrari. A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems. IEEE Transactions on Automatic Control, vol. 61, no. 10, pp. 3203–3208, 2016. DOI: 10.1109/TAC.2015.2509421.
    [24]
    Y. Yang, J. M. Lee. A switching robust model predictive control approach for nonlinear systems. Journal of Process Control, vol. 23, no. 6, pp. 852–860, 2013. DOI: 10.1016/j.jprocont.2013.03.011.
    [25]
    B. Luo, H. N. Wu, T. W. Huang. Off-policy reinforcement learning for H control design. IEEE Transactions on Cybernetics, vol. 45, no. 1, pp. 65–76, 2015. DOI: 10.1109/TCYB.2014.2319577.
    [26]
    H. J. Yang, M. Tan. Sliding mode control for flexible-link manipulators based on adaptive neural networks. International Journal of Automation and Computing, vol. 15, no. 2, pp. 239–248, 2018. DOI: 10.1007/s11633-018-1122-2.
    [27]
    M. S. Tong, W. Y. Lin, X. Huo, Z. S. Jin, C. Z. Miao. A model-free fuzzy adaptive trajectory tracking control algorithm based on dynamic surface control. International Journal of Advanced Robotic Systems, vol. 17, no. 1, pp. 17–29, 2020. DOI: 10.1177/1729881419894417.
    [28]
    I. Zaidi, M. Chtourou, M. Djemel. Robust neural control of discrete time uncertain nonlinear systems using sliding mode backpropagation training algorithm. International Journal of Automation and Computing, vol. 16, no. 2, pp. 213–225, 2019. DOI: 10.1007/s11633-017-1062-2.
    [29]
    M. Zhu, J. N. Bian, W. M. Wu. A novel collaborative scheme of simulation and model checking for system properties verification. Computers in Industry, vol. 57, no. 8-9, pp. 752–757, 2006. DOI: 10.1016/j.compind.2006.04.006.
    [30]
    Y. H. Zhu, D. B. Zhao, H. B. He, J. H. Ji. Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4101–4109, 2017. DOI: 10.1109/TIE.2016.2597763.
    [31]
    R. Kamalapurkar, P. Walters, W. E. Dixon. Model-based reinforcement learning for approximate optimal regulation. Automatica, vol. 64, pp. 94–104, 2016. DOI: 10.1016/j.automatica.2015.10.039.
    [32]
    B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, M. B. Naghibi-Sistani. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, vol. 50, pp. 1167–1175, 2014. DOI: 10.1016/j.automatica.2014.02.015.
    [33]
    H. Modares, S. P. Nageshrao, G. A. Delgado Lopes, R. Babuska, F. L. Lewis. Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica, vol. 71, pp. 334–341, 2016. DOI: 10.1016/j.automatica.2016.05.017.
    [34]
    A. Madady, H. R. Reza-Alikhani, S. Zamiri. Optimal N-parametric type iterative learning control. International Journal of Control,Automation and Systems, vol. 16, no. 5, pp. 2187–2202, 2018. DOI: 10.1007/s12555-017-0259-z.
    [35]
    Z. Li, S. R. Xue, W. Y. Lin, M. S. Tong. Training a robust reinforcement learning controller for the uncertain system based on policy gradient method. Neurocomputing, vol. 316, pp. 313–321, 2018. DOI: 10.1016/j.neucom.2018.08.007.
    [36]
    S. R. Xue, Z. Li, L. Yang. Training a model-free reinforcement learning controller for a 3-degree-of-freedom helicopter under multiple constraints. Measurement and Control, vol. 52, no. 7-8, pp. 844–854, 2019. DOI: 10.1177/0020294019847711.
    [37]
    S. Preitl, R. E. Precup, Z. Preitl, S. Vaivoda, S. Kilyeni, J. K. Tar. Iterative feedback and learning control. Servo systems applications. IFAC Proceedings Volumes, vol. 40, no. 8, pp. 16–27, 2007. DOI: 10.3182/20070709-3-RO-4910.00004.
    [38]
    R. P. A. Gil, Z. C. Johanyak, T. Kovacs. Surrogate model based optimization of traffic lights cycles and green period ratios using microscopic simulation and fuzzy rule interpolation. International Journal of Artificial Intelligence, vol. 16, no. 1, pp. 20–40, 2018.
    [39]
    F. L. Lewis, D. Vrabie, K. G. Vamvoudakis. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012. DOI: 10.1109/MCS.2012.2214134.
    [40]
    J. X. Yu, H. Dang, L. M. Wang. Fuzzy iterative learning control-based design of fault tolerant guaranteed cost controller for nonlinear batch processes. International Journal of Control,Automation and Systems, vol. 16, no. 5, pp. 2518–2527, 2018. DOI: 10.1007/s12555-017-0614-0.
    [41]
    H. Modares, F. L. Lewis, Z. P. Jiang. Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning. IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016. DOI: 10.1109/TCYB.2015.2477810.
    [42]
    B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. International Journal of Automation and Computing, vol. 17, no. 1, pp. 17–29, 2020. DOI: 10.1007/s11633-019-1194-7.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)

    Article Metrics

    Article views (821) PDF downloads(108) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return