Home  |  About Journal  |  Editorial Board  |  For Authors  |  For Referees  |  For Readers  |  Subscription  |  Contract Us
International Journal of Automation and Computing 2018, Vol. 15 Issue (2) :194-206    DOI: 10.1007/s11633-018-1118-y
Research Articles Current Issue | Next Issue | Archive | Adv Search << Previous Articles | Next Articles >>
Large-scale 3D Semantic Mapping Using Stereo Vision
Yi Yang1, Fan Qiu1, Hao Li1, Lu Zhang1, Mei-Ling Wang1, Meng-Yin Fu1,2
1 School of Automation and National Key Laboratory of Intelligent Control and Decision of Complex Systems, Beijing Institute of Technology, Beijing 100081, China;
2 Nanjing University of Science and Technology, Nanjing 210094, China
Download: [PDF 4345KB] HTML()   Export: BibTeX or EndNote (RIS)      Supporting Info
Abstract In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and the motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify the feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate the dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.
Service
Email this article
Add to my bookshelf
Add to citation manager
Email Alert
RSS
Articles by authors
KeywordsSemantic map   stereo vision   motion segmentation   visual odometry   simultaneous localization and mapping (SLAM)     
Received: 2017-09-20; Revised: 2018-02-02; published: 2018-02-02
Fund:

This work was supported by National Natural Science Foundation of China (Nos. NSFC 61473042 and 61105092) and Beijing Higher Education Young Elite Teacher Project (No. YETP1215).

Corresponding Authors: Yi Yang     Email: yang_yi@bit.edu.cn
About author: Yi Yang received the Ph. D. degree in automation from Beijing Institute of Technology, China in 2010. E-mail:yang_yi@bit.edu.cn;Fan Qiu received the B. Eng. degree in automation from the Beijing Institute of Technology. E-mail:466547687@qq.com;Hao Li received the B. Eng. degree in automation from the Beijing Institute of Technology. E-mail:lh506692791@qq.com;Lu Zhang received the B. Eng. degree in automation from the Beijing Institute of Technology. E-mail:214430810@qq.com;Mei-Ling Wang received the M. Eng. and Ph. D. degrees from School of Automation, Beijing Institute of Technology. E-mail:wangml@bit.edu.cn;Meng-Yin Fu received the M. Eng. degree from School of Automation, Beijing Institute of Technology. E-mail:fumy@bit.edu.cn
Cite this article:   
Yi Yang, Fan Qiu, Hao Li, Lu Zhang, Mei-Ling Wang, Meng-Yin Fu. Large-scale 3D Semantic Mapping Using Stereo Vision[J]. International Journal of Automation and Computing , vol. 15, no. 2, pp. 194-206, 2018.
URL:  
http://www.ijac.net/EN/10.1007/s11633-018-1118-y      或     http://www.ijac.net/EN/Y2018/V15/I2/194
 
[1] Ros G, Ramos S, Granados M, et al. Vision-Based Offline-Online Perception Paradigm for Autonomous Driving[C]. Applications of Computer Vision. IEEE, 2015:231-238.
[2] Mason J, Marthi B. An object-based semantic world model for longterm change detection and semantic querying[C]. Ieee/rsj International Conference on Intelligent Robots and Systems. IEEE, 2012:3851-3858.
[3] Nchter A, Hertzberg J. Towards semantic maps for mobile robots[J]. Robotics & Autonomous Systems, 2008, 56(11):915-926.
[4] Valentin J P C, Sengupta S, Warrell J, et al. Mesh Based Semantic Modelling for Indoor and Outdoor Scenes[C]. IEEE Conference on Computer Vision & Pattern Recognition. 2013:2067-2074.
[5] J. Civera, D. Glvez-Lpez, Riazuelo L, et al. Towards semantic SLAM using a monocular camera[C]. IEEE/RSJ International Conference on Intelligent Robots & Systems. IEEE, 2011:1277-1284.
[6] Agarwal S, Furukawa Y, Snavely N, et al. Building Rome in a day[C]. IEEE, International Conference on Computer Vision. IEEE, 2009:72-79.
[7] Munoz D, Bagnell J A, Vandapel N, et al. Contextual classification with functional Max-Margin Markov Networks[C]. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE Xplore, 2009:975-982.
[8] Douillard B, Fox D, Ramos F, et al. Classification and Semantic Mapping of Urban Environments[J]. International Journal of Robotics Research, 2011, 30(1):5-32.
[9] Mason J, Marthi B. An object-based semantic world model for long-term change detection and semantic querying[C]. IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012:3851-3858.
[10] Gunther M, Wiemann T, Albrecht S, et al. Building semantic object maps from sparse and noisy 3D data[C]. IEEE/RSJ International Conference on Intelligent Robots & Systems. 2013:2228-2233.
[11] Sengupta S, Greveson E, Shahrokni A, et al. Urban 3D Semantic Modelling Using Stereo Vision[C]. IEEE International Conference on Robotics & Automation. 2013:580-585.
[12] Reddy N D, Singhal P, Chari V, et al. Dynamic body VSLAM with semantic constraints[C]. Ieee/rsj International Conference on Intelligent Robots and Systems. IEEE, 2015:1897-1904.
[13] Vineet V, Miksik O, Lidegaard M, et al. Incremental Dense Semantic Stereo Fusion for Large-Scale Semantic Scene Reconstruction[C]. Proceedings-IEEE International Conference on Robotics and Automation. 2015:75-82.
[14] Badrinarayanan V, Kendall A, Cipolla R. SegNet:A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation[J]. Computer Science, 2015:1-1.
[15] Geiger A. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2012:3354-3361.
[16] Hirschmuller H. Stereo processing by semiglobal matching and mutual information.[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2008, 30(2):328-341.
[17] Geiger, Andreas, Roser, et al. Efficient large-scale stereo matching[M]. Computer Vision-ACCV 2010. Springer Berlin Heidelberg, 2010:25-38.
[18] Philipp Krahenbuhl, Koltun V. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials[C]. Advances in Neural Information Processing Systems. 2015:109-117.
[19] Fan Qiu, Yi Yang, Hao Li, Mengyin Fu and Shunting Wang. Semantic Motion Segmentation for Urban Dynamic Scene Understanding[C]. IEEE International Conference on Automation Science and Engineering. IEEE, 2016:497-502.
[20] Li Y, Ruichek Y. Occupancy grid mapping in urban environments from a moving on-board stereo-vision system.[J]. Sensors, 2014, 14(6):10454.
[21] Geiger A, Ziegler J, Stiller C. StereoScan:Dense 3d reconstruction in real-time[J]. IEEE Intelligent Vehicles Symposium, 2010, 32(14):963-968.
[22] Nie, Ner M, Zollh, et al. Real-time 3D reconstruction at scale using voxel hashing[J]. Acm Transactions on Graphics, 2013, 32(6):169.
[23] Ladicky L, Russell C, Kohli P, et al. Associative Hierarchical Random Fields[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(6):1056-77.
[24] Sengupta S, Sturgess P, Ladicky L, et al. Automatic dense visual semantic mapping from street-level imagery[C]. Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012:857-862.
[25] He H, Upcroft B. Nonparametric semantic segmentation for 3D street scenes[C]. IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013:3697-3703.
[26] Kundu A, Krishna K M, Sivaswamy J. Moving object detection by multiview geometric techniques from a single camera mounted robot[C]. IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2009:4306-4312.
[27] Lin T H, Wang C C. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation[C]. Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014:3058-3065.
[28] Reddy N D, Singhal P, Krishna K M. Semantic Motion Segmentation Using Dense CRF Formulation[C]. Indian Conference on Computer Vision. 2015:1-8.
[29] Krähenbühl P, Koltun V. Efficient inference in fully connected crfs with gaussian edge potentials[C]. Advances in neural information processing systems. 2011:109-117.
[30] Zhang R, Candra S A, Vetter K, et al. Sensor fusion for semantic segmentation of urban scenes[C]. IEEE International Conference on Robotics and Automation. IEEE, 2015:1850-1857.
[31] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM:a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5):1147-1163.
[32] Menze M, Geiger A. Object scene flow for autonomous vehicles[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3061-3070.
[33] Hu Z, Uchimura K. UV-disparity:an efficient algorithm for stereovision based scene analysis[C]. Proceedings of the IEEE Intelligent Vehicles Symposium. 2005:48-54.
[34] Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International journal of computer vision, 2002, 47(1-3):7-42.
[35] Zbontar J, LeCun Y. Computing the stereo matching cost with a convolutional neural network[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:1592-1599.
Copyright 2010 by International Journal of Automation and Computing