Recommend | Research on Transfer Learning of Vision-based Gesture Recognition

  • Share:
Release Date: 2021-09-02 Visited: 

Gesture recognition has been widely used for human-robot interaction. At present, a problem in gesture recognition is that the researchers did not use the learned knowledge in existing domains to discover and recognize gestures in new domains. This paper proposes a method that could transfer gesture data in different domains. The experimental results show that the proposed method could effectively solve the transfer problem between RGB Camera and Leap Motion. 




Recently, human-robot interaction has been developed rapidly. Gesture could be an important way for human-robot interaction since it is able to give accurate and intuitive instruction to the robots, and it has been widely studied for decades. Gesture recognition could enable effective and efficient interactions between human workers and robots. There are many kinds of devices for vision-based gesture recognition. For example, the camera is the main sensor used in the field of gesture recognition. Previously, most of the researchers used red-green-blue (RGB) images for gesture recognition. With the development of technology, some new devices have sprung up, such as leap motion, Kinect, etc. Leap motion is an interactive hardware device based on infrared radiation (IR) sensors, and it could precisely capture and extract the positions and angles of finger joints. Specifically, Leap Motion is designed to detect and track human hand gestures, so the error of tracking is about 200 μm about the 3D coordinate of fingertips.


However, the data from different domains may be distributed differently. Therefore, classifiers trained from one domain are likely to have a poor performance in the other domains. And for each domain, it is too expensive to collect a mass of examples manually and build a separate classifier. Therefore, how to make better use of the trained model in the source domain and reduce the learning cost in the target domain has become an urgent problem to be solved.


In recent years, transfer learning has arisen wide interest in researchers. Transfer learning refers to the application of existing knowledge to other related domains. Researchers have studied transfer learning in different methods, e.g., board learning system (BLS), neural network (NN), Bayesian model and some other methods. Although transfer learning has received a lot of attention, there are very few cases in the application of gesture recognition. The goal of this paper is to propose a method in the field of gesture recognition, which enables a model trained in the source domain to be used in the target domain directly. Therefore, the time for collecting data is reduced and the time for annotating data could be minimized or eliminated.


At present, transfer learning has been effectively used in text classification, sentiment classification, image classification and other fields. It could be divided into feature representation transfer learning, instance transfer learning, parameter transfer learning and relationship knowledge transfer learning. Feature representation transfer learning refers to transfer through feature transformation to decrease the difference between the source domain and the target domain; or to convert the data of the source and target domains into a unified feature space, and then use the classification algorithm for identification. Feature representation transfer learning is one of the most popular research methods in the field of transfer learning. The paper uses this method to convert the original data of the RGB Camera and Leap Motion into a unified feature space, and then use the classification algorithm for recognition.


In the process of gesture recognition, it is generally necessary to assume: 1) the same feature space, it means that the training and test data need to use the same set of sensors; 2) the same overall distribution, i.e., experimenters′ preferences or habits on training and test data are similar; 3) the same label space, i.e., the same label set in the training and test data. Using conventional unsupervised data mining methods for gesture recognition, the long data collection cycle becomes a practical problem. If a supervised method is used, it will put a great burden on users, and they must annotate enough data to train the algorithm. It is a time-consuming task to label the original sensor data manually and requires experts to spend a lot of time annotating the sensor data. In addition, learning the model of each device independently and neglecting the learned knowledge in other domains will result in redundant calculation workload, excessive time cost, and loss of useful knowledge.


Consequently, it is very profitable to develop models in a new field by using the learned information. Using transferable knowledge could decrease the collection of data, reduce the need for data annotation, and increase the learning speed. The focus of this paper is to effectively solve the transfer problem between the RGB Camera and Leap Motion, thereby improving the learning efficiency of cross-device transfer.


This paper presents a method to apply the learned model in one device to another. The RGB Camera and Leap Motion were used to collect gesture data from several human users to verify the presented method. The main contributions are as follows:


1) A transfer learning framework of gesture recognition across different devices is proposed. Here, these devices have different data distributions, but all of them have the same output labels.


2) In the transfer of gesture recognition by the RGB Camera and Leap Motion, we extract several different features, and from the experimental results, the average accuracy of the new coordinate features is the highest.


3) When using the back propagation neural network (BP NN) algorithm for classification, we found that in some cases, the epoch of training has some unusual effects on the transfer results. Too many training times may lead to model overfitting in the source domain, and reduce the generalization ability in the target domain.


Fig. 1 shows a general overview of our approach. The structure of this paper is organized as follows. In Section 2, the preliminaries of transfer learning are reviewed. In Section 3, the data collection and feature extraction are described. Then, we introduce the experiment in Section 4. We further discuss the problems found in the experiment in Section 5, and Section 6 concludes our work.




Download full text

Research on Transfer Learning of Vision-based Gesture Recognition

Bi-Xiao Wu, Chen-Guang Yang, Jun-Pei Zhong 





For more up-to-date information:

1) WeChat: IJAC

2) Twitter:IJAC_Journal

3) Facebook:International Journal of Automation and Computing

4) Linkedin: Int.J. of Automation and Computing

5) Sina Weibo:IJAC-国际自动化与计算杂志

  • Share:
Release Date: 2021-09-02 Visited: