Research on Transfer Learning of Vision-based Gesture Recognition

Bi-Xiao Wu Chen-Guang Yang Jun-Pei Zhong

Citation: B. X. Wu, C. G. Yang, J. P. Zhong. Research on transfer learning of vision-based gesture recognition. International Journal of Automation and Computing, http://doi.org/10.1007/s11633-020-1273-9 doi:  10.1007/s11633-020-1273-9
Citation: Citation: B. X. Wu, C. G. Yang, J. P. Zhong. Research on transfer learning of vision-based gesture recognition. International Journal of Automation and Computing , http://doi.org/10.1007/s11633-020-1273-9 doi:  10.1007/s11633-020-1273-9

doi: 10.1007/s11633-020-1273-9

Research on Transfer Learning of Vision-based Gesture Recognition

More Information
    Author Bio:

    Bi-Xiao Wu received the B. Eng. degree in electrical engineering from Soochow University, China in 2019. She is currently a master student in control engineering at South China University of Technology, China. Her research interests include human-robot interaction, gesture recognition and transfer learning. E-mail: wubixiao1997@163.com ORCID iD: 0000-0003-0894-2521

    Chen-Guang Yang received the B. Eng. degree in measurement and control from Northwestern Polytechnical University, China in 2005, the Ph. D. degree in control engineering from National University of Singapore, Singapore in 2010, and postdoctoral training with the Imperial College London, UK. He received Best Paper Awards from IEEE Transactions on Robotics, and over 10 international conferences. His research interests include robotics and automation. E-mail: cyang@ieee.org (Corresponding author) ORCID iD: 0000-0001-5255-5559

    Jun-Pei Zhong received the B. Eng degree in control science and computer science from South China University of Technology, China in 2006, the M. Phil degree in electrical engineering from Hong Kong Polytechnic University, China in 2010, and the Ph. D. degree in computer science from University of Hamburg, Germany in 2015. He has been awarded the Marie-Curie fellowship for his doctoral study from 2010 to 2013. From 2014 to 2016, he has participated in different European Union and Japanese funded projects at University of Hertfordshire, UK, Plymouth University, UK and Waseda University, Japan. His research interests include machine learning, computational intelligence and cognitive robotics. E-mail: jonizhong@scut.edu.cn

  • 1https://ai.baidu.com/tech/body/hand
  • Figure  1.  Pipeline of the proposed approach

    Figure  2.  Ten gestures captured by the RGB Camera

    Figure  3.  Ten gestures captured by the Leap Motion

    Figure  4.  Hand joints coordinate obtained from the picture

    Figure  5.  Leap Motion coordinate system

    Figure  6.  Corresponding positions of 21 hand joints

    Figure  7.  Process of transfer between different devices

    Figure  8.  Comparison of results in Experiment 1

    Figure  9.  Comparison of results in Experiment 2

    Figure  10.  Comparison of results in Experiment 3

    Figure  11.  Comparison of results in Experiment 4

    Figure  12.  Comparison of results in Experiment 5

    Figure  13.  Comparison of experimental results using neural network classification

    Table  1.   Notations and descriptions

    Notation Description
    $ D_s, D_t $ Source/Target domain
    $ N_s, N_t $ Source/Target dimension
    $ \pmb{X} $ Input data matrix
    $ \pmb{A} $ Adaptation matrix
    $ \pmb{Z} $ Embedding matrix
    $ \pmb{H} $ Lefting matrix
    $ \pmb{M}_c $ MMD matrices, c $ \in $ [0, ···, C]
    $\pmb{\varPhi}$ k largest eigenvalues
    $ p $ Mapping feature node
    $ n $ Number of mapping features
    $ q $ Number of enhanced feature nodes
    $ (x_{ci},y_{ci}) $ The hand joint point coordinates obtained by the RGB Camera, $c_i$ $ \in $ [0, 1, ···, 20]
    $ (x_{li},y_{li}) $ The hand joint point coordinates obtained by the Leap Motion, $l_i$ $ \in $ [0, 1, ···, 20]
    $ length $ The length feature
    $ \alpha $ The angel feature
    C SVM penalty coefficient
    下载: 导出CSV

    Table  2.   SVM and BLS related parameter settings

    Related parameter Ranges
    $ C $ 1, 2, ···, 20
    $ \gamma $ 0.0001, 0.001, ··· , 1
    $ p $ 5, 10, ···, 30
    $ n $ 5, 10, ···, 30
    $ q $ 20, 40, ··· , 300
    下载: 导出CSV
  • [1] S. S. Rautaray, A. Agrawal. Vision based hand gesture recognition for human computer interaction: A survey. Artificial Intelligence Review, vol. 43, no. 1, pp. 1–54, 2015. DOI:  10.1007/s10462-012-9356-9.
    [2] J. P. Wachs, M. Kölsch, H. Stern, Y. Edan. Vision-based hand-gesture applications. Communications of the ACM, vol. 54, no. 2, pp. 60–71, 2011. DOI:  10.1145/1897816.1897838.
    [3] F. Weichert, D. Bachmann, B. Rudak, D. Fisseler. Analysis of the accuracy and robustness of the Leap Motion controller. Sensors, vol. 13, no. 5, pp. 6380–6393, 2013. DOI:  10.3390/s130506380.
    [4] C. L. P. Chen, Z. L. Liu. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 10–24, 2018. DOI:  10.1109/TNNLS.2017.2716952.
    [5] L. Yang, S. J. Song, C. L. P. Chen. Transductive transfer learning based on broad learning system. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Miyazaki, Japan, pp. 912−917, 2018.
    [6] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2096–2030, 2016. DOI:  10.5555/2946645.2946704.
    [7] D. M. Roy, L. P. Kaelbling. Efficient Bayesian task-level transfer learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI, Hyderabad, India, pp. 2599−2604, 2007.
    [8] S. J. Pan, Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. DOI:  10.1109/TKDE.2009.191.
    [9] P. Rashidi, D. J. Cook. Activity knowledge transfer in smart environments. Pervasive and Mobile Computing, vol. 7, no. 3, pp. 331–343, 2011. DOI:  10.1016/j.pmcj.2011.02.007.
    [10] X. Zhang, Q. Yang. Transfer hierarchical attention network for generative dialog system. International Journal of Automation and Computing, vol. 16, no. 6, pp. 720–736, 2019. DOI:  10.1007/s11633-019-1200-0.
    [11] S. Ruder, M. E. Peters, S. Swayamdipta, T. Wolf. Transfer learning in natural language processing. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, Association for Computational Linguistics, Minneapolis, Minnesota, pp. 15−18, 2019.
    [12] Z. Chen, T. Y. Qian. Transfer capsule network for aspect level sentiment classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp. 547−556, 2019.
    [13] G. Domeniconi, G. Moro, A. Pagliarani, R. Pasolini. Markov chain based method for in-domain and cross-domain sentiment classification. In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IEEE, Lisbon, Portugal, pp. 127−137, 2015.
    [14] P. H. C. Guerra, A. Veloso, W. Meira, V. Almeida. From bias to opinion: A transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Diego, USA, pp. 150−158, 2011.
    [15] X. Yin, X. Yu, K. Sohn, X. M. Liu, M. Chandraker. Feature transfer learning for face recognition with under-represented data. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5697−5706, 2019.
    [16] I. D. Apostolopoulos, T. A. Mpesiana. Covid-19: Automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, vol. 43, no. 2, pp. 635–640, 2020. DOI:  10.1007/s13246-020-00865-4.
    [17] K. Aukkapinyo, S. Sawangwong, P. Pooyoi, W. Kusakunniran. Localization and classification of rice-grain images using region proposals-based convolutional neural network. International Journal of Automation and Computing, vol. 17, no. 2, pp. 233–246, 2020. DOI:  10.1007/s11633-019-1207-6.
    [18] Z. W. He, L. Zhang, F. Y. Liu. Discostyle: Multi-level logistic ranking for personalized image style preference inference. International Journal of Automation and Computing, vol. 17, no. 5, pp. 637–651, 2020. DOI:  10.1007/s11633-020-1244-1.
    [19] B. Kulis, K. Saenko, T. Darrell. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1785−1792, 2011.
    [20] M. Raghu, C. Y. Zhang, J. Kleinberg, S. Bengio. Transfusion: Understanding transfer learning for medical imaging. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 3342– 3352, 2019.
    [21] M. Oquab, L. Bottou, I. Laptev, J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1717−1724, 2014.
    [22] J. E. Liu, M. Shah, B. Kuipers, S. Savarese. Cross-view action recognition via view knowledge transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 3209− 3216, 2011.
    [23] V. W. Zheng, S. J. Pan, Q. Yang, J. J. Pan. Transferring multi-device localization models using latent multi-task learning. In Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago, USA, pp. 1427− 1432, 2008.
    [24] D. H. Hu, Q. Yang. Transfer learning for activity recognition via sensor mapping. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, pp. 1962−1967, 2011.
    [25] S. J. Pan, I. W. Tsang, J. T. Kwok, Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011. DOI:  10.1109/TNN.2010.2091281.
    [26] M. S. Long, J. M. Wang, G. G. Ding, J. G. Sun, P. S. Yu. Transfer joint matching for unsupervised domain adaptation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1410−1417, 2014.
    [27] L. X. Duan, I. W. Tsang, D. Xu. Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 465–479, 2012. DOI:  10.1109/TPAMI.2011.114.
    [28] M. Kurz, G. Hölzl, A. Ferscha, A. Calatroni, D. Roggen, G. Tröster. Real-time transfer and evaluation of activity recognition capabilities in an opportunistic system. Machine Learning, vol. 1, no. 7, pp. 73–78, 2011.
    [29] D. Roggen, K. Förster, A. Calatroni, G. Tröster. The adARC pattern analysis architecture for adaptive human activity recognition systems. Journal of Ambient Intelligence and Humanized Computing, vol. 4, no. 2, pp. 169–186, 2013. DOI:  10.1007/s12652-011-0064-0.
    [30] G. Marin, F. Dominio, P. Zanuttigh. Hand gesture recognition with jointly calibrated Leap Motion and depth sensor. Multimedia Tools and Applications, vol. 75, no. 22, pp. 14991–15015, 2016. DOI:  10.1007/s11042-015-2451-6.
    [31] M. S. Long, J. M. Wang, G. G. Ding, J. G. Sun, P. S. Yu. Transfer feature learning with joint distribution adaptation. In Proceedings of IEEE International Conference on Computer Vision, Sydney, Australia, pp. 2200−2207, 2013.
    [32] Q. Sun, R. Chattopadhyay, S. Panchanathan, J. P. Ye. A two-stage weighting framework for multi-source domain adaptation. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, pp. 505−513, 2011.
    [33] Y. H. Jang, H. Lee, S. J. Hwang, J. Shin. Learning what and where to transfer. [Online], Available: https://arxiv.org/abs/1905.05901, 2019.
  • [1] Punyanuch Borwarnginn, Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Kittikhun Thongkanchorn.  Knowing Your Dog Breed: Identifying a Dog Breed with Deep Learning . International Journal of Automation and Computing, doi: 10.1007/s11633-020-1261-0
    [2] Li-Fang Wu, Qi Wang, Meng Jian, Yu Qiao, Bo-Xuan Zhao.  A Comprehensive Review of Group Activity Recognition in Videos . International Journal of Automation and Computing, doi: 10.1007/s11633-020-1258-8
    [3] Bin Hu, Jiacun Wang.  Deep Learning Based Hand Gesture Recognition and UAV Flight Controls . International Journal of Automation and Computing, doi: 10.1007/s11633-019-1194-7
    [4] Zhen-Wei He, Lei Zhang, Fang-Yi Liu.  DiscoStyle: Multi-level Logistic Ranking for Personalized Image Style Preference Inference . International Journal of Automation and Computing, doi: 10.1007/s11633-020-1244-1
    [5] Kittinun Aukkapinyo, Suchakree Sawangwong, Parintorn Pooyoi, Worapan Kusakunniran.  Localization and Classification of Rice-grain Images Using Region Proposals-based Convolutional Neural Network . International Journal of Automation and Computing, doi: 10.1007/s11633-019-1207-6
    [6] Xiang Zhang, Qiang Yang.  Transfer Hierarchical Attention Network for Generative Dialog System . International Journal of Automation and Computing, doi: 10.1007/s11633-019-1200-0
    [7] Dong-Jie Li, Yang-Yang Li, Jun-Xiang Li, Yu Fu.  Gesture Recognition Based on BP Neural Network Improved by Chaotic Genetic Algorithm . International Journal of Automation and Computing, doi: 10.1007/s11633-017-1107-6
    [8] Justin Carpentier, Mehdi Benallegue, Jean-Paul Laumond.  On the Centre of Mass Motion in Human Walking . International Journal of Automation and Computing, doi: 10.1007/s11633-017-1088-5
    [9] Fadhlan Kamaru Zaman, Amir Akramin Shafie, Yasir Mohd Mustafah.  Robust Face Recognition Against Expressions and Partial Occlusions . International Journal of Automation and Computing, doi: 10.1007/s11633-016-0974-6
    [10] Liu Yue, Wu-Xing Jing, Ying-Ying Xiao.  Finite Thrust Transfer Strategy Designing for Low Energy Moon Return Based on GPM/VSM . International Journal of Automation and Computing, doi: 10.1007/s11633-014-0869-3
    [11] Paramita Guha, Sunita Mishra.  Reduced Order Modeling & Controller Design for Mass Transfer in a Grain Storage System . International Journal of Automation and Computing, doi: 10.1007/s11633-014-0805-6
    [12] Methods for Controlling the Authenticity of Textual Information Transfer on the Basis of Statistical and Structural Redundancy . International Journal of Automation and Computing, doi: 10.1007/s11633-012-0675-8
    [13] Fusaomi Nagata, Keigo Watanabe.  Adaptive Learning with Large Variability of Teaching Signals for Neural Networks and Its Application to Motion Control of an Industrial Robot . International Journal of Automation and Computing, doi: 10.1007/s11633-010-0554-0
    [14] Jun Ren,  Chun-Wen Li,  De-Zong Zhao.  CAN-based Synchronized Motion Control for Induction Motors . International Journal of Automation and Computing, doi: 10.1007/s11633-009-0055-1
    [15] Mingcong Deng, Hongnian Yu, Akira Inoue.  Decentralized PD Control for Non-uniform Motion of a Hamiltonian Hybrid System . International Journal of Automation and Computing, doi: 10.1007/s11633-008-0119-7
    [16] Woong Choi, Tadao Isaka, Mamiko Sakata, Seiya Tsuruta, Kozaburo Hachimura.  Quantification of Dance Movement by Simultaneous Measurement of Body Motion and Biophysical Information . International Journal of Automation and Computing, doi: 10.1007/s11633-007-0001-z
    [17] Hui Yu,  Yong-Ji Wang.  Stable Flocking Motion of Mobile Agents Following a Leader in Fixed and Switching Networks . International Journal of Automation and Computing, doi: 10.1007/s11633-006-0008-x
    [18] Clemens Dietl, Uwe K. Rakowsky.  An Operating Strategy for High-availability Multi-station Transfer Lines . International Journal of Automation and Computing, doi: 10.1007/s11633-006-0125-6
    [19] Cheng-Wei Dai,  Shuang-Hua Yang,  Roger Knott.  Data Transfer Over the Internet for Real Time Applications . International Journal of Automation and Computing, doi: 10.1007/s11633-006-0414-0
    [20] Nelson Costa, Milan Bezdicek, Michael Brown, John O. Gray, Darwin G. Caldwell, Stephen Hutchins.  Joint Motion Control of a Powered Lower Limb Orthosis for Rehabilitation . International Journal of Automation and Computing, doi: 10.1007/s11633-006-0271-x
  • 加载中
图(13) / 表(2)
计量
  • 文章访问数:  6
  • HTML全文浏览量:  14
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-20
  • 录用日期:  2020-12-23
  • 网络出版日期:  2020-03-08

Research on Transfer Learning of Vision-based Gesture Recognition

doi: 10.1007/s11633-020-1273-9
    作者简介:

    Bi-Xiao Wu received the B. Eng. degree in electrical engineering from Soochow University, China in 2019. She is currently a master student in control engineering at South China University of Technology, China. Her research interests include human-robot interaction, gesture recognition and transfer learning. E-mail: wubixiao1997@163.com ORCID iD: 0000-0003-0894-2521

    Chen-Guang Yang received the B. Eng. degree in measurement and control from Northwestern Polytechnical University, China in 2005, the Ph. D. degree in control engineering from National University of Singapore, Singapore in 2010, and postdoctoral training with the Imperial College London, UK. He received Best Paper Awards from IEEE Transactions on Robotics, and over 10 international conferences. His research interests include robotics and automation. E-mail: cyang@ieee.org (Corresponding author) ORCID iD: 0000-0001-5255-5559

    Jun-Pei Zhong received the B. Eng degree in control science and computer science from South China University of Technology, China in 2006, the M. Phil degree in electrical engineering from Hong Kong Polytechnic University, China in 2010, and the Ph. D. degree in computer science from University of Hamburg, Germany in 2015. He has been awarded the Marie-Curie fellowship for his doctoral study from 2010 to 2013. From 2014 to 2016, he has participated in different European Union and Japanese funded projects at University of Hertfordshire, UK, Plymouth University, UK and Waseda University, Japan. His research interests include machine learning, computational intelligence and cognitive robotics. E-mail: jonizhong@scut.edu.cn

English Abstract

Citation: B. X. Wu, C. G. Yang, J. P. Zhong. Research on transfer learning of vision-based gesture recognition. International Journal of Automation and Computing, http://doi.org/10.1007/s11633-020-1273-9 doi:  10.1007/s11633-020-1273-9
Citation: Citation: B. X. Wu, C. G. Yang, J. P. Zhong. Research on transfer learning of vision-based gesture recognition. International Journal of Automation and Computing , http://doi.org/10.1007/s11633-020-1273-9 doi:  10.1007/s11633-020-1273-9
    • Recently, human-robot interaction has been developed rapidly. Gesture could be an important way for human-robot interaction since it is able to give accurate and intuitive instruction to the robots, and it has been widely studied for decades[1]. Gesture recognition could enable effective and efficient interactions between human workers and robots. There are many kinds of devices for vision-based gesture recognition. For example, the camera is the main sensor used in the field of gesture recognition. Previously, most of the researchers used red-green-blue (RGB) images for gesture recognition[2]. With the development of technology, some new devices have sprung up, such as leap motion, Kinect, etc. Leap motion is an interactive hardware device based on infrared radiation (IR) sensors, and it could precisely capture and extract the positions and angles of finger joints. Specifically, Leap Motion is designed to detect and track human hand gestures, so the error of tracking is about 200 μm about the 3D coordinate of fingertips[3].

      However, the data from different domains may be distributed differently. Therefore, classifiers trained from one domain are likely to have a poor performance in the other domains. And for each domain, it is too expensive to collect a mass of examples manually and build a separate classifier. Therefore, how to make better use of the trained model in the source domain and reduce the learning cost in the target domain has become an urgent problem to be solved.

      In recent years, transfer learning has arisen wide interest in researchers. Transfer learning refers to the application of existing knowledge to other related domains. Researchers have studied transfer learning in different methods, e.g., board learning system (BLS)[4, 5], neural network (NN)[6], Bayesian model[7] and some other methods. Although transfer learning has received a lot of attention in [8], there are very few cases in the application of gesture recognition. The goal of this paper is to propose a method in the field of gesture recognition, which enables a model trained in the source domain to be used in the target domain directly. Therefore, the time for collecting data is reduced and the time for annotating data could be minimized or eliminated[9].

      At present, transfer learning has been effectively used in text classification[10, 11], sentiment classification[12-14], image classification[15-21] and other fields. It could be divided into feature representation transfer learning, instance transfer learning, parameter transfer learning and relationship knowledge transfer learning[8]. Feature representation transfer learning refers to transfer through feature transformation to decrease the difference between the source domain and the target domain[22-24]; or to convert the data of the source and target domains into a unified feature space[25-27], and then use the classification algorithm for identification. Feature representation transfer learning is one of the most popular research methods in the field of transfer learning. The paper uses this method to convert the original data of the RGB Camera and Leap Motion into a unified feature space, and then use the classification algorithm for recognition.

      In the process of gesture recognition, it is generally necessary to assume: 1) the same feature space, it means that the training and test data need to use the same set of sensors; 2) the same overall distribution, i.e., experimenters′ preferences or habits on training and test data are similar; 3) the same label space, i.e., the same label set in the training and test data[25]. Using conventional unsupervised data mining methods for gesture recognition, the long data collection cycle becomes a practical problem. If a supervised method is used, it will put a great burden on users, and they must annotate enough data to train the algorithm. It is a time-consuming task to label the original sensor data manually and requires experts to spend a lot of time annotating the sensor data. In addition, learning the model of each device independently and neglecting the learned knowledge in other domains will result in redundant calculation workload, excessive time cost, and loss of useful knowledge. Consequently, it is very profitable to develop models in a new field by using the learned information. Using transferable knowledge could decrease the collection of data, reduce the need for data annotation, and increase the learning speed[9]. There is very little work to transfer knowledge between two or more sensor models. Kurz et al.[28] and Roggen et al.[29] used teacher/learner models to handle the transfer problem of action recognition. Hu and Yang[24] introduced a transfer learning method to effectively transfer knowledge between models, but the greater knowledge transfer between different domains remains to be explored. Marin et al.[30] proposed how to jointly exploit the Camera and Leap Motion for accurate gesture recognition. However, it still needs to collect a large number of data from various devices and does not use the model learned from a certain device. The focus of this paper is to effectively solve the transfer problem between the RGB Camera and Leap Motion, thereby improving the learning efficiency of cross-device transfer.

      This paper presents a method to apply the learned model in one device to another. The RGB Camera and Leap Motion were used to collect gesture data from several human users to verify the presented method. The main contributions are as follows:

      1) A transfer learning framework of gesture recognition across different devices is proposed. Here, these devices have different data distributions, but all of them have the same output labels.

      2) In the transfer of gesture recognition by the RGB Camera and Leap Motion, we extract several different features, and from the experimental results, the average accuracy of the new coordinate features is the highest.

      3) When using the back propagation neural network (BP NN) algorithm for classification, we found that in some cases, the epoch of training has some unusual effects on the transfer results. Too many training times may lead to model overfitting in the source domain, and reduce the generalization ability in the target domain.

      Fig. 1 shows a general overview of our approach. The structure of this paper is organized as follows. In Section 2, the preliminaries of transfer learning are reviewed. In Section 3, the data collection and feature extraction are described. Then, we introduce the experiment in Section 4. We further discuss the problems found in the experiment in Section 5, and Section 6 concludes our work.

      Figure 1.  Pipeline of the proposed approach

    • The difference between the source domain and the target domain is approximated by the distance between $ P(x_s) $ and $ P(x_t) $, and the distance between $ P(y_s|x_s) $ and $ P(y_t|x_t) $, as shown in (1). The JDA algorithm realizes transfer by reducing the distance of marginal distribution and conditional distribution in different domains. In this paper, we use the JDA algorithm to reduce the distance between the RGB Camera and Leap Motion. Just to be clear, the related notations and descriptions are shown in Table 1.

      Table 1.  Notations and descriptions

      Notation Description
      $ D_s, D_t $ Source/Target domain
      $ N_s, N_t $ Source/Target dimension
      $ \pmb{X} $ Input data matrix
      $ \pmb{A} $ Adaptation matrix
      $ \pmb{Z} $ Embedding matrix
      $ \pmb{H} $ Lefting matrix
      $ \pmb{M}_c $ MMD matrices, c $ \in $ [0, ···, C]
      $\pmb{\varPhi}$ k largest eigenvalues
      $ p $ Mapping feature node
      $ n $ Number of mapping features
      $ q $ Number of enhanced feature nodes
      $ (x_{ci},y_{ci}) $ The hand joint point coordinates obtained by the RGB Camera, $c_i$ $ \in $ [0, 1, ···, 20]
      $ (x_{li},y_{li}) $ The hand joint point coordinates obtained by the Leap Motion, $l_i$ $ \in $ [0, 1, ···, 20]
      $ length $ The length feature
      $ \alpha $ The angel feature
      C SVM penalty coefficient
      $$ d(D_s,D_t)\approx||P(x_s)-P(x_t)||+||P(y_s|x_s)-P(y_t|x_t)||. $$ (1)
    • Dimensionality reduction could be used to transfer the data. For clarity, principal component analysis (PCA) is used to reconstruct the data. The goal of PCA is to find a transformation matrix $ \pmb{A} $ to maximize the embedded data variance, which is shown in (2).

      $$ \mathop{{\rm{max}}}\limits_{\pmb{A}^{\rm{T}}\pmb{A} = \pmb{I}} {\rm{tr}}(\pmb{A}^{\rm{T}}\pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A}). $$ (2)

      Eigen decomposition $\pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A} = \pmb{A}\pmb{\varPhi}$ can deal with this optimization problem effectively.

    • Although PCA could extract k-dimensional features from the data, the distribution of different domains is still very large. It needs to reduce the difference of marginal distributions firstly, in other words, the distance between $ P(\pmb{A}^tx_s) $ and $ P(\pmb{A}^tx_t) $ should be as close as possible. The maximum mean discrepancy (MMD)[32] is used to compute the distance between the source domain and the target domain.

      $$ \Big|\Big|\dfrac{1}{n_s}\sum\limits_{i = 1}^{n_s}{\pmb{A}^{\rm{T}}}{x_i}-\dfrac{1}{n_t}\sum\limits_{j = n_s+1}^{n_s+n_t}{\pmb{A}^{\rm{T}}}{x_j}\Big|\Big|^2 = {\rm{tr}}({\pmb{A}^{\rm{T}}}\pmb{X}{\pmb{M}_0}{\pmb{X}^{\rm{T}}}\pmb{A}) $$ (3)

      where $ \pmb{M}_0 $ is the MMD matrix and is computed as follows:

      $${({\pmb{M}_0})_{ij}} = \left\{ {\begin{array}{*{20}{l}} {\dfrac{1}{{{n_s}{n_s}}}},&{\rm{if}}\;{{x_i},{x_j} \in {D_s}}\\ {\dfrac{1}{{{n_t}{n_t}}}},&{\rm{if}}\;{{x_i},{x_j} \in {D_t}}\\ {\dfrac{{ - 1}}{{{n_s}{n_t}}}},&{{\rm{otherwise}}}. \end{array}} \right.$$ (4)
    • Then, it needs to reduce the difference of the conditional distribution, i.e., the distance between $P(y_s|\pmb{A}^{\rm{T}}x_s)$ and $P(y_t|\pmb{A}^{\rm{T}}x_t)$ should be reduced. A modified MMD is used to measure the distance between the $P(y_s|\pmb{A}^{\rm{T}}x_s)$ and $P(y_t|\pmb{A}^{\rm{T}}x_t)$.

      $$ \begin{split} &\Big|\Big|\dfrac{1}{n_s^{(c)}}\sum\limits_{x_i\in D_s^{(c)}}{\pmb{A}^{\rm{T}}}{x_i}-\dfrac{1}{n_t^{(c)}}\sum\limits_{x_j\in D_t^{(c)}}{\pmb{A}^{\rm{T}}}{x_j}\Big|\Big|^2=\\ &\qquad tr({\pmb{A}^{\rm{T}}}\pmb{X}{\pmb{M}_c}{\pmb{X}^{\rm{T}}}\pmb{A}) \end{split}$$ (5)

      where $ \pmb{M}_c $ is computed as follows:

      $${({\pmb{M}_c})_{ij}} = \left\{ {\begin{array}{*{20}{l}} {\dfrac{1}{{n_s^{(c)}{n_s}^{(c)}}}},&{\rm{if}}\;{{x_i},\;{x_j} \in D_s^{(c)}}\\ {\dfrac{1}{{n_t^{(c)}{n_t}^{(c)}}}},&{\rm{if}}\;{{x_i},\;{x_j} \in D_t^{(c)}}\\ {\dfrac{{ - 1}}{{n_s^{(c)}n_t^{(c)}}}},&{\rm{if}}\;{\left\{ {\begin{array}{*{20}{l}} {{x_i} \in D_s^{(c)},\; {x_j} \in D_t^{(c)}}\\ {{x_j} \in D_s^{(c)},\; {x_i} \in D_t^{(c)}} \end{array}} \right.}\\ 0,&{{\rm{otherwise}}}. \end{array}} \right.$$ (6)
    • In JDA, the distance of the marginal distributions and conditional distributions is minimized at the same time, which makes the transfer learning more robust. Thus, by combining the above two distances, a total optimization goal could be obtained:

      $$ {\rm{min}}\sum\limits_{c = 0}^C{\rm{tr}}(\pmb{A}^{\rm{T}}\pmb{X}\pmb{M}_c\pmb{X}^{\rm{T}}\pmb{A})+\lambda||\pmb{A}||^2_F . $$ (7)

      Since the variance of the data is maintained before and after the transformation, another constraint is obtained as

      $$ {\rm{max}}\;\pmb{A}^{\rm{T}}\pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A}. $$ (8)

      Therefore, by combining the above constraints, the optimization goal is transformed into

      $$ {\rm{min}}\frac{\displaystyle\sum\nolimits_{c = 0}^C{\rm{tr}}(\pmb{A}^{\rm{T}}\pmb{X}\pmb{M}_c\pmb{X}^{\rm{T}}\pmb{A})+\lambda||\pmb{A}||^2_F }{\pmb{A}^{\rm{T}}\pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A}}. $$ (9)

      Using the Rayleigh quotient, (9) could be translated as follows:

      $$ \begin{split} &{\rm{min}}\displaystyle\sum\limits_{c = 0}^C{\rm{tr}}(\pmb{A}^{\rm{T}}\pmb{X}\pmb{M}_c\pmb{X}^{\rm{T}}\pmb{A})+\lambda||\pmb{A}||^2_F \\ &\quad {\rm{ s.t. }} \;\; \pmb{A}^{\rm{T}}\pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A} = \pmb{I}. \end{split} $$ (10)

      According to the Lagrange method, the formula turns out to be as follows:

      $$ \left(\pmb{X}\sum\limits_{c = 0}^C\pmb{M}_c\pmb{X}^{\rm{T}}+\lambda\pmb{I}\right) \pmb{A} = \pmb{X}\pmb{H}\pmb{X}^{\rm{T}}\pmb{A}\varPhi. $$ (11)

      Thus, we could use the eigs function in Matlab to solve the transformation matrix $ \pmb{A} $ easily.

    • We use the RGB Camera and Leap Motion to collect 10 static gestures of multiple experimenters (Figs. 2 and 3), and about 800 sets of data. In order to find the most suitable features for transfer between the two devices, we extract various features from the original data obtained for experimental comparison. We introduce each feature in the following sections.

      Figure 2.  Ten gestures captured by the RGB Camera

      Figure 3.  Ten gestures captured by the Leap Motion

    • Thanks to the existing hand key point detection technology, it is easy to extract the coordinates of 21 joint points of the hand from the gesture images taken by the RGB Camera, as shown in Fig. 4. We use $(x_{ci},y_{ci}), i = 0, 1, 2, \cdots, 20$ to represent the hand joint point coordinates obtained by the RGB Camera. The upper left corner of the image is the origin of the coordinate system, and the positive direction of the x-axis and y-axis are shown in Fig. 4. Leap Motion could directly collect the three-dimensional coordinate positions of the 21 joint points of the hand. Fig. 5 is a coordinate system with the center of the Leap Motion device as the origin of the coordinates. In the paper, $(x_{li},y_{li}),\;\;i = 0, 1, 2, \cdots, 20$ represents the hand joint point coordinates obtained by Leap Motion, and the depth information is not used in this work.

      Figure 4.  Hand joints coordinate obtained from the picture

      Figure 5.  Leap Motion coordinate system

      The joint point coordinates extracted from the RGB Camera and Leap Motion corresponding to the position of the hand are shown in Fig. 6. It could be seen that the coordinates obtained by the two devices correspond to the same joint points. However, the coordinate systems of the two devices are different, so their distributions are different.

      Figure 6.  Corresponding positions of 21 hand joints

    • Using the coordinate of the joint points obtained by the two devices, the length information could be easily calculated. It could be found that the position of the fingertip is the most variable point, so we use (12) to calculate the following distances: 1) the distance between the root of the fingers and the fingertips, 2) the distance between the root of the palm (point 0 in Fig. 6) and each fingertip, 3) the distance between each fingertip.

      $$ length = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2} $$ (12)

      where 1) $ i $ = 2, 5, 9, 13, 17; $ j $ = 4, 8, 12, 16, 20; 2) $ i $ = 0; $ j $ = 4, 8, 12, 16, 20; 3) $ i $ = 4, 8, 12, 16, 20; $ j $ = 4, 8, 12, 16, 20. (Note: $ i $ and $ j $ are not equal at the same time.) The algorithm flow is as follows.

      Algorithm 1. Calculation of the length

      Input: The coordinates of 21 hand joint points $ (x_i, y_i) $,     $ (x_j, y_j) $.

      Output: The length information.

       1) for $ i $ $ \in $ $ [0, 2, 4, 5, 8, 9, 12, 13 ,16, 17 ,20] $ do

       2)   for $ j $ $ \in $ $ [4, 8, 12, 16 , 20] $ do

       3)    if $ i $ $ \ne $ j then

       4)     $ length $ = $ \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2} $

       5)    end if

       6)   end for

       7) end for

       8) return $ length $;

    • Using the obtained joint point coordinates, we could easily calculate the angle information. We use (13)−(15) to calculate the following feature: 1) take the points 2, 5, 9, 13, 17 as the vertices, and the angle formed by the point 0 and any one of the points 4, 8, 12, 16, and 20, an example is shown in Fig. 6; 2) an angle formed by any three points in the fingertips.

      $$ \pmb{A}_1 = (x_{i1} - x_j, y_{i1} - y_j) $$ (13)
      $$ \pmb{A}_2 = (x_{i2} - x_j, y_{i2} - y_j) $$ (14)
      $$ \alpha = {\rm{arcos}}\left( \frac {\pmb{A}_1 \cdot \pmb{A}_2} {\|\pmb{A}_1\| \|\pmb{A}_2\|}\right) $$ (15)

      where 1) $ i1 $ = 0; $ i2 $ = 4, 8, 12, 16, 20; $ j $ = 2, 5, 9, 13, 17; 2) $ i1 $ = 4, 8, 12, 16, 20; $ i2 $ = 4, 8, 12, 16, 20; $ j $ = 4, 8, 12, 16, 20. (Note: $i1, \;i2, \;j$ are not equal at the same time.) The algorithm flow is as follows.

      Algorithm 2. Calculation of the angel

      Input: The coordinates of 21 hand joint points $ (x_{i1}, y_{i1}) $,     $ (x_{i2}, y_{i2}) $, $ (x_j, y_j) $.

      Output: The angel information.

       1) for $ j $ $ \in $ $ [2, 5, 9, 13, 17] $ do

       2)  for $ i2 $ $ \in $ $ [4, 8, 12, 16, 20] $ do

       3)   $ i1 = 0 $

       4)   $ \pmb{A}_1 $ = $ (x_{i1} - x_j, y_{i1} - y_j) $

       5)   $ \pmb{A}_2 $ = $ (x_{i2} - x_j, y_{i2} - y_j) $

       6)   $\alpha 1 = {\rm{arcos}}\left( \dfrac {\pmb{A}_1 \cdot \pmb{A}_2} {\|\pmb{A}_1\| \|\pmb{A}_2\|}\right)$

       7)  end for

       8) end for

       9) for $ i1, i2, j $ $ \in $ $ [4, 8, 12, 16, 20] $ do

       10)  if $ i1 $ $ \ne $ $ i2 $ $ \ne $ $ j $ then

       11)   $ \pmb{A}_1 $ = $ (x_{i1} - x_j, y_{i1} - y_j) $

       12)   $ \pmb{A}_2 $ = $ (x_{i2} - x_j, y_{i2} - y_j) $

       13)   $\alpha 2 = {\rm{arcos}}\left( \dfrac {\pmb{A}_1 \cdot \pmb{A}_2} {\|\pmb{A}_1\| \|\pmb{A}_2\|}\right)$

       14)  end if

       15) end for

       16) return $ \alpha 1,\alpha 2 $

    • In order to weaken the influence of different coordinate systems on the joint point coordinates, the coordinate origin could be unified as the root of the middle finger (point 9 in Fig. 6). Take the direction from point 0 to point 9 as the positive direction of the y-axis, the direction perpendicular to the y-axis and to the right is the positive direction of the x-axis.

      The positive y-axis direction vector is expressed as

      $$ \pmb{A}_y = (x_9-x_0, y_9-y_0). $$ (16)

      The positive x-axis direction vector is expressed as

      $$ \pmb{A}_x = \left(1, \frac{x_0 - x_9} {y_9 - y_0}\right). $$ (17)

      The point representation in the new coordinate system is

      $$ length = \sqrt{(x_i - x_9)^2 + (y_i - y_9)^2} $$ (18)
      $$ \pmb{A}_i = (x_i - x_9, y_i - y_9) $$ (19)
      $$ x_{inew} = length \times \frac {\pmb{A}_i \cdot \pmb{A}_x} {\|\pmb{A}_i\| \|\pmb{A}_x\|} $$ (20)
      $$ y_{inew} = length \times \frac {\pmb{A}_i \cdot \pmb{A}_y} {\|\pmb{A}_i\| \|\pmb{A}_y\|} $$ (21)

      where $ i $ = 0, 1, 2, ···, 20. Through the above formulas, 21 point coordinates could be converted to a new coordinate system. We convert the original coordinates of the RGB Camera and Leap Motion to obtain the new coordinates of the 21 joint points. The algorithm flow is as follows.

      Algorithm 3. Calculation of the new coordinates

      Input: The coordinates of 21 hand joint points $ (x_{i1}, y_{i1}) $,     $ (x_{i2}, y_{i2}) $, $ (x_j, y_j) $.

      Output: The new coordinates information.

       1) $ \pmb{A}_y \gets (x_9-x_0, y_9-y_0) $

       2) $\pmb{A}_x \gets \left(1, \dfrac{x_0 - x_9} {y_9 - y_0}\right)$

       3) for $ i $ $ \in $ $[0, 1, 2, \cdots, 20]$ do

       4)   $ length = \sqrt{(x_i - x_9)^2 + (y_i - y_9)^2} $

       5)   $ \pmb{A}_i $ = $ (x_i - x_9, y_i - y_9) $

       6)   $x_{inew} = length \times \dfrac {\pmb{A}_i \cdot \pmb{A}_x} {\|\pmb{A}_i\| \|\pmb{A}_x\|}$

       7)   $y_{inew} = length \times \dfrac {\pmb{A}_i \cdot \pmb{A}_y} {\|\pmb{A}_i\| \|\pmb{A}_y\|}$

       8) end for

       9) return $ x_{inew}, y_{inew} $

    • This section mainly introduces the relevant parameter settings of the algorithms used in the experiment. The relevant parameter settings of support vector machine (SVM) and BLS are shown in Table 2.

      Table 2.  SVM and BLS related parameter settings

      Related parameter Ranges
      $ C $ 1, 2, ···, 20
      $ \gamma $ 0.0001, 0.001, ··· , 1
      $ p $ 5, 10, ···, 30
      $ n $ 5, 10, ···, 30
      $ q $ 20, 40, ··· , 300

      We use a BP neural network in this paper, and the number of nodes in the input layer is determined by the dimension N of each feature described in Section 3. The number of nodes in the first and second layers is set as 0.5N − 2N, and the number of nodes in the output layer is set as 10. Fig. 7 shows the transfer process between different devices.

      Figure 7.  Process of transfer between different devices

    • In Section 3, the different gesture features of the RGB Camera and Leap Motion are obtained. This section will use the algorithm introduced in Section 2 to transfer the data of these two domains.

    • The results of Experiment 1 are shown in Fig. 8. It could be found from the experimental results that if the images taken by the RGB Camera and the coordinates of joint points obtained by the Leap Motion are transferred to each other, the experimental results are poor.

      Figure 8.  Comparison of results in Experiment 1

    • The results of Experiment 2 are shown in Fig. 9. From the experimental results of Experiment 2, the following conclusions could be drawn:

      Figure 9.  Comparison of results in Experiment 2

      1) By comparing with Experiment 1, we could see that after extracting the coordinate features, the accuracy of the transfer result between the two devices has been greatly improved.

      2) The JDA algorithm could reduce the distance between two domains, and improve the accuracy of the experimental results in most cases.

    • The results of Experiment 3 are shown in Fig. 10.

      Figure 10.  Comparison of results in Experiment 3

    • The results of Experiment 4 are shown in Fig. 11.

      Figure 11.  Comparison of results in Experiment 4

    • The results of Experiment 5 are shown in Fig. 12.

      Figure 12.  Comparison of results in Experiment 5

      From the comparison of the experimental results of Experiments 3−5, it could be found that most of the experimental results are improved little or even not at all by using the JDA algorithm. Through the analysis, we could see that because of the length feature, the angle feature and the new coordinate feature are less affected by the original coordinate system. Different original coordinate systems do not have much influence on them. For this paper, the main function of the JDA algorithm is to reduce the impact of different coordinate systems on the data, thus reducing the difference between domains. According to [31], we know that the JDA algorithm needs complex calculations to obtain the transformation matrix, which is a time-consuming process. In this paper, we could directly use the extracted length, angle and coordinate features to transfer learning, which not only guarantees the accuracy, but also greatly reduces the training time.

      By comparing the 5 experiments, it could be found that the average accuracy of experiment 5 is the highest. In other words, the best results are obtained by the new coordinates feature. In addition, in five experiments, the accuracy of the Leap Motion transfer to the RGB Camera is generally higher than that of the RGB Camera transfer to the Leap Motion. After analysis, we think that this is because the coordinates originally extracted by Leap Motion are in three-dimensional space, while those extracted from the RGB Camera images are two-dimensional coordinates. Therefore, the features of Leap Motion have more information, the accuracy of transfer is higher when Leap Motion data is used as the source domain.

    • Some interesting phenomena are found when using the neural network algorithm to transfer and classify data. Figs. 13(a)-13(c) show the experimental results with the Camera as the source domain and the Leap Motion as the target domain. Figs. 13(d)-13(f) show the experimental results with the Leap Motion as the source domain and the Camera as the target domain. It could be seen from the comparison figure that when the Camera (training set/source domain) transfers to the Leap Motion (test set/target domain), the accuracy of the target domain is generally lower than the source domain. But something special happened when the Leap Motion (training set/source domain) transfers to the Camera (test set/ target domain), which will be discussed in more detail below.

      Figure 13.  Comparison of experimental results using neural network classification

      From the experimental results shown in Figs. 13(d)-13(f), we could draw the following conclusions:

      1) In a small interval to the left of the intersection (the rightmost intersection), the accuracy of the target domain is higher than the source domain, and the highest point of the accuracy of the target domain is in this interval. This means that in this interval, the model trained on the source domain is more suitable for the target domain. We speculate that this is because the Leap Motion data is originally in three-dimensional space, while the Camera data is in two-dimensional space. In other words, the Leap Motion has a more abundant feature space than the Camera, so that the Camera data could perform better. Therefore, in this interval, the accuracy of the target domain is higher than the source domain.

      2) In the right region of the intersection (the rightmost intersection), the accuracy of the target domain decreases with the improvement of source domain accuracy. This may mean that the model is more suitable for the source domain due to the increase of training times, which reduces the generalization ability in the target domain. Therefore, it could be concluded that in some cases, the training times in the source domain affect the accuracy of the target domain.

      For transfer learning, we have not yet found the discussion of these two points. Compared with the discussion in [33] about “1) Which layers in the source domain could be transferred to the target domain? 2) How much layers of knowledge in the source domain are transferred to the target domain?” We propose “When is the best time to transfer during the training of the source domain.” A detailed introduction is given based on the experiment.

    • In this paper, an effective transfer learning method for gesture recognition between the RGB Camera and Leap Motion has been put forward. The different distribution of data collected by the Leap Motion and the RGB Camera raises challenging problems, for which effective solutions have been presented. We extracted various features from the obtained original data, such as the coordinates, the length and the angle features, and compared the learning performances with and without using the JDA algorithm. The experimental results show the performance of different features when using different algorithms. Through the comparison of several groups of experimental results, we found that the average accuracy of the new coordinate features is the highest. In the future work, we will focus on the following points:

      1) At present, only two-dimensional features are used in the transfer learning of gesture recognition, which has certain limitations on the direction of the palm. If the palm is not parallel to the device, it will have an impact on the classification results. We will use Kinect to extract more reliable features from 3D space.

      2) We only discuss the experiment result of coordinates, length, and angle features, more features could be calculated for transfer.

      3) In the future, it could also be extended to the field of transfer learning of the action recognition among different devices.

    • This work was supported by National Nature Science Foundation of China (NSFC) (Nos. U20A20200, 61811530281, and 61861136009), Guangdong Regional Joint Foundation (No. 2019B1515120076), Fundamental Research for the Central Universities, and in part by the Foshan Science and Technology Innovation Team Special Project (No. 2018IT100322).

    • This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

      The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

      To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

参考文献 (33)

目录

    /

    返回文章
    返回