STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

 Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1289-9

## STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

###### Author Bio: Zhi-Wei Xu received the B. Eng. degree in computer science and technology from Harbin Institute of Technology, China in 2017. He is a postgraduate student at School of Artificial Intelligence and Computer Science, Jiangnan University, China. His research interests include computer vision, video understanding and action recognition. E-mail: zhiwei_xu@stu.jiangnan.edu.cn ORCID iD: 0000-0003-1472-431X Xiao-Jun Wu received the B. Sc. degree in mathematics from Nanjing Normal University, China in 1991. He received the M. Sc. and the Ph. D. degrees in pattern recognition and intelligent systems from Nanjing University of Science and Technology, China in 1996 and 2002, respectively. He is currently a professor in artificial intelligent and pattern recognition at the Jiangnan University, China. His research interests include pattern recognition, computer vision, fuzzy systems, neural networks and intelligent systems. E-mail: wu_xiaojun@jiangnan.edu.cn (Corresponding author) ORCID iD: 0000-0002-0310-5778 Josef Kittler received the B. A. degree in electrical science tripos, Ph. D. degree in pattern recognition, and D. Sc. degree from University of Cambridge, UK in 1971, 1974, and 1991, respectively. He is a Distinguished Professor of machine intelligence at Centre for Vision, Speech and Signal Processing, University of Surrey, UK. He conducts research in biometrics, video and image database retrieval, medical image analysis, and cognitive vision. He published the textbook Pattern Recognition: A Statistical Approach and over 700 scientific papers. His publications have been cited more than 66000 times (Google Scholar). He is series editor of Springer Lecture Notes on Computer Science. He currently serves on the Editorial Boards of Pattern Recognition Letters, Pattern Recognition and Artificial Intelligence, and Pattern Analysis and Applications. He also served as a member of the Editorial Board of IEEE Transactions on Pattern Analysis and Machine Intelligence during 1982−1985. He served on the Governing Board of the International Association for Pattern Recognition (IAPR) as one of the two British representatives during the period 1982-2005, and President of the IAPR during 1994−1996. His research interests include robotics, feedback control systems, and control theory. E-mail: j.kittler@surrey.ac.uk ORCID iD: 0000-0002-8110-9205
• Figure  1.  Architecture overview of STRNet. Our STRNet consists of three individual branches that focus on learning appearance, motion and temporal relation information, respectively. For comprehensively representing the information of the whole video, we apply two-stage fusion and separable (2+1)D convolution to reinforce the feature learning. Finally, we apply a decision level weight assignment to adjust the classification performance.

Figure  2.  Feature visualization of STRNet. The first column is the input frames. The second column is the feature maps of Stem. The third column is the fusion feature maps of stage 3. The last column is the output of spatiotemporal with relation feature maps of stage 5. We rescale the feature maps into original size for good comparison.

Figure  3.  The schema of building relation unit, where X denotes the original inputs of the sequential feature maps, and $\tilde{ X}$ denotes the calculated relation maps. The function Fsm(*) is to calculate the similarity measurement. And g denotes the similarity weight vector and Y denotes the final relation response maps.

##### 计量
• 文章访问数:  15
• HTML全文浏览量:  52
• PDF下载量:  29
• 被引次数: 0
##### 出版历程
• 收稿日期:  2020-10-30
• 录用日期:  2021-02-05
• 网络出版日期:  2021-03-23

## STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition

### English Abstract

Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1289-9 doi:  10.1007/s11633-021-1289-9
 Citation: Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing. http://doi.org/10.1007/s11633-021-1289-9

