Deep Learning Based Hand Gesture Recognition and UAV Flight Controls

  • Share:

[Download Full Text]

Deep Learning Based Hand Gesture Recognition and UAV Flight Controls

Bin Hu, Jiacun Wang





Hand gesture recognition research has been gaining more and more attention from researchers worldwide. In addition to ordinary application in daily life, gesture recognition starts entering into virtual reality, medical systems, education, communication systems, games, mobile devices, automotive etc.


There are basically three kinds of hand gesture recognition technologies: data glove based, vision based and radar based. A data glove is an interactive device, resembling a glove worn on the hand, which facilitates tactile sensing and fine-motion control in robotics and virtual reality. The output of the sensors can be used to control video games, presentations, music and visual entertainment. The glove based approach is inconvenient because gloves are bulky to carry. One advantage of using data gloves is that it does not need to extract the human gestures from background. However, due to their high cost and calibration requirements, data gloves don′t have the same wide range of applications as vision-based gesture recognition systems.


The radar based approach is a technology which transmits a radio wave towards a target, and then the receiver of the radar intercepts the reflected energy from that target. The radar waves bounce off your hand and back to the receiver, allowing it to interpret changes in the shape or movement of your hand. This technology is still under research. The most promising one is Google Soli, which was approved by the U.S. government in January 2019. On the other hand, the vision based approach is gaining momentum because the user does not need to carry devices but can perform a gesture in a much more natural way. The early studies widely used color cameras for the development of gesture recognition systems, while today′s recognition systems such as Microsoft Kinect, Leap Motion Controller and Intel RealSense usually use depth images as a modality. The Leap Motion Controller is a small USB powered device that uses two monochromatic infrared cameras and three infrared LEDs to track movements and motion made by hands and fingers in a roughly 1m hemispherical 3D space. The Leap Motion Controller has always been one of the most widely used cameras for gesture recognition, because it allows users to act as freely as they do in real life. Its low-cost and depth sensors can capture video in real-time under any ambient lighting and outputs the skeletal data. Furthermore, hand gestures can be any simple hand movement or complex shape for Leap Motion Controller, making it an obvious choice for this study.


Hand gestures can be defined as either the static postures or dynamic gestures. In this paper, we consider dynamic gestures only.


Most of the research of static gesture recognition focuses on neural-network-centered approaches. For recognition of dynamic gestures, one of the most common methodologies is to represent gestures with spatiotemporal sequences. Since Starner and Pentland started to use hidden Markov models (HMM) to recognize gestures, HMM had become a common method for gesture recognition. Other approaches, such as hidden conditional random fields, autoregressive models, fuzzy-logic, Kalman-filtering, support vector machines and recurrent neural networks are used in some studies.


As a branch of the study of machine learning, deep learning models have drawn interests from both research society and industry rapidly because it is so powerful in learning and classification. Many research fields such as speech recognition, computer vision and natural language processing, etc. applied this technology. In recent years, one of the most important neural networks, convolutional neural network (CNN) has achieved the best performance in gesture recognition fields.


This paper is an extension of the conference paper. It attempts to apply the deep learning approach in dynamic hand gesture recognition.


The engineering target of the study is the control of unmanned aerial vehicles (UAV). A data model that represents a dynamic gesture sequence by converting the 4-D spatiotemporal data to a 2-D matrix and a 1-D array is introduced. We designed two fully connected neural networks and one convolutional neural network in order to find the one with the best performance. We created two data models for neural network training and testing. We also implemented the software system based on deep learning neural networks. It is our understanding that this is the first work reported that uses Leap Motion Controllers as input devices in deep learning network based hand gesture recognition.


The rest of the paper is organized as follows: Section 2 introduces basic concepts of deep learning. Section 3 gives an overview of the hand gesture recognition system, Leap Motion Controllers and UAVs. Section 4 introduces hand gestures and datasets that are used in the system. Section 5 presents the deep learning networks, the core of the system. Section 6 discusses neural network training and testing results. Section 7 concludes the paper with some future work suggestions.

There are three subsystems in our system: the gesture input component, the deep learning neural network component, and the UAV control component, as illustrated in Fig. 4.

In this study we designed, trained and tested three different neural networks so as to find the one with the best performance, as shown in Fig. 7. They are a 2-layer fully connected network, a 5-layer fully connected network, and an 8-layer convolutional network. Figs. 8–10 illustrate the architectures of the three types of networks. The high-level designs of these three networks are described in Tables 2–4.


For more up-to-date information:

1) WeChat: IJAC

2) Twitter:IJAC_Journal

3) Facebook:International Journal of Automation and Computing

4) Linkedin: Int.J. of Automation and Computing

5) Sina Weibo:IJAC-国际自动化与计算杂志

  • Share: