[1] J. X. Xiao, K. A. Ehinger, J. Hays, A. Torralba, A. Oliva. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, vol. 119, no. 1, pp. 3–22, 2016. DOI:  10.1007/s11263-014-0748-y.
[2] A. Torralba, R. Fergus, W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958–1970, 2008. DOI:  10.1109/TPAMI.2008.128.
[3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. DOI:  10.1007/s11263-009-0275-4.
[4] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 248–255, 2009. DOI: 10.1109/CVPR.2009.5206848.
[5] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: 10.1007/978-3-319-10602-1_48.
[6] B. L. Zhou, A. Lapedriza, A. Khosla, A. Oliva, A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1452–1464, 2018. DOI:  10.1109/tpami.2017.2723009.
[7] I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, K. Murphy. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, [Online], Available: https://storage.googleapis.com/openimages/web/index.html, October 6, 2019.
[8] J. Tremblay, T. To, A. Molchanov, S. Tyree, J. Kautz, S. Birchfield. Synthetically trained neural networks for learning human-readable plans from real-world demonstrations. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Brisbane, Australia, pp. 5659–5666, 2018. DOI: 10.1109/ICRA.2018.8460642.
[9] J. Tremblay, T. To, S. Birchfield. Falling things: A synthetic dataset for 3D object detection and pose estimation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Salt Lake City, USA, pp. 2119–21193, 2018. DOI: 10.1109/cvprw.2018.00275.
[10] B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, A. M. Dollar. The YCB object and model set: Towards common benchmarks for manipulation research. In Proceedings of International Conference on Advanced Robotics, IEEE, Istanbul, Turkey, pp. 510–517, 2015. DOI: 10.1109/ICAR.2015.7251504.
[11] M. Arsenovic, S. Sladojevic, A. Anderla, D. Stefanovic, B. Lalic. Deep learning powered automated tool for generating image based datasets. In Proceedings of the 14th IEEE International Scientific Conference on Informatics, IEEE, Poprad, Slovakia, pp. 13–17, 2017. DOI: 10.1109/informatics.2017.8327214.
[12] J. Sun, P. Wang, Y. K. Luo, G. M. Hao, H. Qiao. Precision work-piece detection and measurement combining top-down and bottom-up saliency. International Journal of Automation and Computing, vol. 15, no. 4, pp. 417–430, 2018. DOI:  10.1007/s11633-018-1123-1.
[13] N. Poolsawad, L. Moore, C. Kambhampati, J. G. F. Cleland. Issues in the mining of heart failure datasets. International Journal of Automation and Computing, vol. 11, no. 2, pp. 162–179, 2014. DOI:  10.1007/s11633-014-0778-5.
[14] X. Y. Gong, H. Su, D. Xu, Z. T. Zhang, F. Shen, H. B. Yang. An overview of contour detection approaches. International Journal of Automation and Computing, vol. 15, no. 6, pp. 656–672, 2018. DOI:  10.1007/s11633-018-1117-z.
[15] A. Aldoma, T. Fäulhammer, M. Vincze. Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Chicago, USA, pp. 5016–5023, 2014. DOI: 10.1109/IROS.2014.6943275.
[16] K. Lai, L. F. Bo, X. F. Ren, D. Fox. A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Shanghai, China, pp. 1817–1824, 2011. DOI: 10.1109/icra.2011.5980382.
[17] M. Di Cicco, C. Potena, G. Grisetti, A. Pretto. Automatic model based dataset generation for fast and accurate crop and weeds detection. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vancouver, Canada, pp. 5188–5195, 2017. DOI: 10.1109/IROS.2017.8206408.
[18] S. Greuter, J. Parker, N. Stewart, G. Leach. Real-time procedural generation of `pseudo infinite′ cities. In Proceedings of the 1st International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, ACM, Melbourne, Australia, pp. 87–94, 2003. DOI: 10.1145/604487.604490.
[19] R. Van Der Linden, R. Lopes, R. Bidarra. Procedural generation of dungeons. IEEE Transactions on Computational Intelligence and AI in Games, vol. 6, no. 1, pp. 78–89, 2013. DOI:  10.1109/tciaig.2013.2290371.
[20] S. R. Richter, V. Vineet, S. Roth, V. Koltun. Playing for data: Ground truth from computer games. In Proceedings of 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 102–118, 2016. DOI: 10.1007/978-3-319-46475-6_7.
[21] P. Marion, P. R. Florence, L. Manuelli, R. Tedrake. Label Fusion: A pipeline for generating ground truth labels for real RGBD data of cluttered scenes. In Proceedings of IEEE International Conference on Robotics and Automation, Brisbane, Australia, pp. 3235–3242, 2018. DOI: 10.1109/icra.2018.8460950.
[22] T. Hodan, P. Haluza, Š. Obdržálek, J. Matas, M. Lourakis, X. Zabulis. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Santa Rosa, USA, pp. 880–888, 2017. DOI: 10.1109/WACV.2017.103.
[23] H. Hattori, V. Naresh Boddeti, K. Kitani, T. Kanade. Learning scene-specific pedestrian detectors without real data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3819–3827, 2015. DOI: 10.1109/CVPR.2015.7299006.
[24] H. S. Koppula, A. Anand, T. Joachims, A. Saxena. Semantic labeling of 3D point clouds for indoor scenes. In Proceedings of the 24th International Conference on Neural Information Processing Systems, ACM, Red Hook, USA, pp. 244–252, 2011.
[25] J. Xie, M. Kiefel, M. T. Sun, A. Geiger. Semantic instance annotation of street scenes by 3D to 2D label transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 3688–3697, 2016. DOI: 10.1109/CVPR.2016.401.
[26] B. Zoph, E. D. Cubuk, G. Ghiasi, T. Y. Lin, J. Shlens, Q. V. Le. Learning data augmentation strategies for object detection. ArXiv preprint ArXiv: 1906.11172, 2019.
[27] A. Dutta, A. Zisserman. The VIA annotation software for images, audio and video. ArXiv preprint ArXiv: 1904.10699, 2019.
[28] L. Von Ahn, L. Dabbish. Labeling images with a computer game. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, USA, pp. 319–326, 2004. DOI: 10.1145/985692.985733.
[29] C. H. Zhang, K. Loken, Z. Y. Chen, Z. Y. Xiao, G. Kunkel. Mask Editor: An image annotation tool for image segmentation tasks. ArXiv preprint ArXiv: 1809.06461v1, 2018.
[30] B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, vol. 77, no. 1–3, pp. 157–173, 2008. DOI: 10.1007/s11263-007-0090-8.
[31] M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, R. Vasudevan. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 746–753, 2017. DOI: 10.1109/icra.2017.7989092.
[32] B. T. Phong. Illumination for computer generated pictures. Communications of the ACM, vol. 18, no. 6, pp. 311–317, 1975. DOI:  10.1145/360825.360839.
[33] S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, ACM, Cambridge, USA, pp. 91–99, 2015.
[34] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 21–37, 2016. DOI: 10.1007/978-3-319-46448-0_2.
[35] J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 779–788, 2016. DOI: 10.1109/CVPR.2016.91.
[36] F. Q. Liu, Z. Y. Wang. PolishNet-2d and PolishNet-3d: Deep learning-based workpiece recognition. IEEE Access, vol. 7, pp. 127042–127054, 1270. DOI:  10.1109/ACCESS.2019.2940411.