Connectionist Temporal Classification Model for Dynamic Hand Gesture Recognition using RGB and Optical flow Data
Sunil
Patel1 and Ramji Makwana2
1Computer Engineering Department, Gujarat
Technological University, India
2Managing Director, AIIVINE PXL Pvt.
Ltd, India
Abstract: Automatic classification of dynamic hand gesture is
challenging due to the large diversity in a different class of gesture, Low
resolution, and it is performed by finger. Due to a number of challenges many
researchers focus on this area. Recently deep neural network can be used for
implicit feature extraction and Soft Max layer is used for classification. In
this paper, we propose a method based on a two-dimensional convolutional neural
network that performs detection and classification of hand gesture
simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow
Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network
for frame-to-frame probability generation with Connectionist Temporal Classification
(CTC) network for loss calculation. We have calculated an optical flow from Red,
Green, Blue (RGB) data for getting proper motion information present in the
video. CTC model is used to efficiently evaluate all possible alignment of hand
gesture via dynamic programming and check consistency via frame-to-frame for
the visual similarity of hand gesture in the unsegmented input stream. CTC
network finds the most probable sequence of a frame for a class of gesture. The
frame with the highest probability value is selected from the CTC network by
max decoding. This entire CTC network is trained end-to-end with calculating
CTC loss for recognition of the gesture. We have used challenging Vision for
Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture
recognition captured with RGB and Depth data. On this VIVA dataset, our
proposed hand gesture recognition technique outperforms competing
state-of-the-art algorithms and gets an accuracy of 86%.
Keywords: Connectionist temporal classification, Long-short term
memory, Hand gesture, Convolutional neural network, VIVA.
Received February 2, 2019; accepted July 28, 2019
https://doi.org/10.34028/iajit/17/4/8