Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance

Sheng-Ye Yan Xin-Xing Xu Qing-Shan Liu

Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance[J]. 国际自动化与计算杂志(英)/International Journal of Automation and Computing, 2014, 11(5): 480-488. doi: 10.1007/s11633-014-0833-2
引用本文: Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance[J]. 国际自动化与计算杂志(英)/International Journal of Automation and Computing, 2014, 11(5): 480-488. doi: 10.1007/s11633-014-0833-2
Sheng-Ye Yan, Xin-Xing Xu and Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance. International Journal of Automation and Computing, vol. 11, no. 5, pp. 480-488, 2014. doi: 10.1007/s11633-014-0833-2
Citation: Sheng-Ye Yan, Xin-Xing Xu and Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance. International Journal of Automation and Computing, vol. 11, no. 5, pp. 480-488, 2014. doi: 10.1007/s11633-014-0833-2

Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance

doi: 10.1007/s11633-014-0833-2
基金项目: 

This work was supported by National Natural Science Foundation of China (Nos. 61300163, 61125106 and 61300162) and Jiangsu Key Laboratory of Big Data Analysis Technology.

详细信息
    通讯作者: Sheng-Ye Yan
  • 中图分类号: 

Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance

Funds: 

This work was supported by National Natural Science Foundation of China (Nos. 61300163, 61125106 and 61300162) and Jiangsu Key Laboratory of Big Data Analysis Technology.

More Information
    Corresponding author: Sheng-Ye Yan
  • 摘要: This paper proposes a new two-phase approach to robust text detection by integrating the visual appearance and the geometric reasoning rules. In the first phase, geometric rules are used to achieve a higher recall rate. Specifically, a robust stroke width transform (RSWT) feature is proposed to better recover the stroke width by additionally considering the cross of two strokes and the continuousness of the letter border. In the second phase, a classification scheme based on visual appearance features is used to reject the false alarms while keeping the recall rate. To learn a better classifier from multiple visual appearance features, a novel classification method called double soft multiple kernel learning (DS-MKL) is proposed. DS-MKL is motivated by a novel kernel margin perspective for multiple kernel learning and can effectively suppress the influence of noisy base kernels. Comprehensive experiments on the benchmark ICDAR2005 competition dataset demonstrate the effectiveness of the proposed two-phase text detection approach over the state-of-the-art approaches by a performance gain up to 4.4% in terms of F-measure.
  • [1] Y. Y. Qu, W. M. Liao, S. Lu, S. J. Wu. Hierarchical textdetection: From word level to character level. In Proceedingsof the 19th International Conference on Advances inMultimedia Modeling, Lecture Notes in Computer Science,Springer, Huangshan, China, vol. 7733 pp. 24-35, 2013.
    [2] V. N. M. Aradhya, M. S. Pavithra. An application of Kmeansclustering for improving video text detection. InProceedings of International Symposium on Intelligent Informatics,Advances in Intelligent Systems and Computer,Springer, Channai, India, vol. 182, pp. 41-47, 2013.
    [3] C. Z. Shi, C. H. Wang, B. H. Xiao, Y. Zhang, S. Gao. Scenetext detection using graph model built upon maximally stableextremal regions. Pattern Recognition Letters, vol. 34,no. 2, pp. 107-116, 2013.
    [4] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong,R. Young. ICDAR 2003 robust reading competitions. InProceedings of the 7th International Conference on DocumentAnalysis and Recognition, IEEE, Edinburgh, Scotland,pp. 682-687, 2003.
    [5] J. Liang, D. Doermann, H. P. Li. Camera-based analysisof text and documents: A survey. International Journal ofDocument Analysis and Recognition, vol. 7, no. 2-3, pp. 83-104, 2005.
    [6] H. G. Zhang, K. Zhao, Y. Z. Song, J. Guo. Text extractionfrom natural scene image: A survey. Neurocomputing,vol. 122, pp. 310-323, 2013.
    [7] A. K. Jain, B. Yu. Automatic text location in images andvideo frames. Pattern Recognition, vol. 31, no. 12, pp. 2055-2076, 1998.
    [8] X. R. Chen, A. L. Yuille. Detecting and reading text innatural scenes. In Proceedings of IEEE Computer SocietyConference on Computer Vision and Pattern Recognition,IEEE, Washington DC, USA, pp. 366-373, 2004.
    [9] L. Neumann, R. Ewerth, B. Freisleben. Text detection inimages based on unsupervised classification of high frequencywavelet coefficients. In Proceedings of InternationalConference on Pattern Recognition, IEEE, Cambridge,England, pp. 425-428, 2004.
    [10] L. Neumann, J. Matas. Real-time scene text localizationand recognition. In Proceedings of IEEE Computer SocietyConference on Computer Vision and Pattern Recognition,IEEE, Providence, USA, pp. 3538-3545, 2012.
    [11] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. ElGhaoui, M. I. Jordan. Learning the kernel matrix withsemidefinite programming. Journal of Machine LearningResearch, vol. 5, pp. 27-72, 2004.
    [12] F. R. Bach, G. R. G. Lanckriet, M. I. Jordan. Multiple kernellearning, conic duality, and the SMO algorithm. In Proceedingsof the 21st International Conference on MachineLearning, ACM, Banff, Alberta, Canada, 2004.
    [13] S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf. Largescale multiple kernel learning. Journal of Machine LearningResearch, vol. 7, pp. 1531-1565, 2006.
    [14] A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet. SimpleMKL.Journal of Machine Learning Research, vol.9,pp. 2491-2521, 2008.
    [15] C. Cortes, M. Mohri, A. Rostamizadeh. L2 regularizationfor learning kernels. In Proceedings of the 25th Conferenceon Uncertainty in Artificial Intelligence, AUAI Press, Arlington,Virginia, USA, pp. 109-116, 2009.
    [16] M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien. Lp-norm multiplekernel learning. Journal ofMachine Learning Research,vol. 12, pp. 953-997, 2011.
    [17] X. Xu, I. W. Tsang, D. Xu. Soft margin multiple kernellearning. IEEE Transactions on Neural Networks andLearning Systems, vol. 24, no. 5, pp. 749-761, 2013.
    [18] J. X. Xiao, J. Hays, K. A. Ehinger, A. Oliva, A. Torralba.Sun database: Large-scale scene recognition from abbey tozoo. In Proceedings of IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, IEEE, SanFrancisco, USA, pp. 3485-3492, 2010.
    [19] T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolutiongray-scale and rotation invariant texture classification withlocal binary patterns. IEEE Transactions on Pattern Recognitionand Machine Intelligence, vol. 24, no. 7, pp. 971-987,2002.
    [20] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision,vol. 60, no. 2, pp. 91-110, 2004.
    [21] E. Shechtman, M. Irani. Matching local self-similaritiesacross images and videos. In Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition, IEEE,Minneapolis, USA, pp. 1-8, 2007.
    [22] C. Cortes, V. Vapnik. Support-vector networks. MachineLearning, vol. 20, no. 3, pp. 273-297, 1995.
    [23] B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithmfor optimal margin classifiers. In Proceedings of the5th Annual Workshop on Computational Learning Theory,ACM, Pittsburgh, PA, USA, pp. 144-152, 1992.
    [24] Z. L. Xu, R. Jin, H. Q. Yang, I. King, M. R. Lyu. Simpleand efficient multiple kernel learning by group lasso. In Proceedingsof the 27th International Conference on MachineLearning, Omnipress, Haifa, Israel, pp. 1175-1182, 2010.
    [25] M. Szafranski, Y. Grandvalet, A. Rakotomamonjy. Compositekernel learning. Machine Learning, vol. 79, no. 1-2,pp. 73-103, 2010.
    [26] S. Shalev-Shwartz, Y. Singer. Efficient learning of labelranking by soft projections onto polyhedra. Journal of MachineLearning Research, vol. 7, pp. 1567-1599, 2006.
    [27] S. M. Lucas. Text locating competition results. In Proceedingsof the 8th International Conference on Document Analysisand Recognition, IEEE, Seoul, Korea, pp. 80-85, 2005.
    [28] S. Y. Yan, X. X. Xu, D. Xu, S. Lin, X. L. Li. Beyond spatialpyramids: A new feature extraction framework withdense spatial sampling for image classification. In Proceedingsof the 12th European Conference on Computer Vision,Springer, Florence, Italy, pp. 464-478, 2012.
    [29] C. C. Chang, C. J. Lin. Libsvm: A library for supportvector machines. ACM Transactions on Intelligent Systemsand Technology, vol. 2, no. 3, Article 27, 2011.
    [30] C. Yi, Y. L. Tian. Text string detection from natural scenesby structure-based partition and grouping. IEEE Transactionson Image Processing, vol. 20, no. 9, pp. 2594-2605,2011.
  • [1] Maryam Aljanabi, Mohammad Shkoukani, Mohammad Hijjawi.  Ground-level Ozone Prediction Using Machine Learning Techniques: A Case Study in Amman, Jordan . International Journal of Automation and Computing, 2020, 17(5): 667-677. doi: 10.1007/s11633-020-1233-4
    [2] Han Xu, Yao Ma, Hao-Chen Liu, Debayan Deb, Hui Liu, Ji-Liang Tang, Anil K. Jain.  Adversarial Attacks and Defenses in Images, Graphs and Text: A Review . International Journal of Automation and Computing, 2020, 17(2): 151-178. doi: 10.1007/s11633-019-1211-x
    [3] Ziheng Chen, Hongshik Ahn.  Item Response Theory Based Ensemble in Machine Learning . International Journal of Automation and Computing, 2020, 17(5): 621-636. doi: 10.1007/s11633-020-1239-y
    [4] Harita Reddy, Namratha Raj, Manali Gala, Annappa Basava.  Text-mining-based Fake News Detection Using Ensemble Methods . International Journal of Automation and Computing, 2020, 17(2): 210-221. doi: 10.1007/s11633-019-1216-5
    [5] Huan Liu, Gen-Fu Xiao, Yun-Lan Tan, Chun-Juan Ouyang.  Multi-source Remote Sensing Image Registration Based on Contourlet Transform and Multiple Feature Fusion . International Journal of Automation and Computing, 2019, 16(5): 575-588. doi: 10.1007/s11633-018-1163-6
    [6] Jiao Yin, Jinli Cao, Siuly Siuly, Hua Wang.  An Integrated MCI Detection Framework Based on Spectral-temporal Analysis . International Journal of Automation and Computing, 2019, 16(6): 786-799. doi: 10.1007/s11633-019-1197-4
    [7] Meng-Yang Zhang, Guo-Hui Tian, Ci-Ci Li, Jing Gong.  Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base . International Journal of Automation and Computing, 2018, 15(5): 582-592. doi: 10.1007/s11633-018-1128-9
    [8] S.P. Mishra, P.K. Dash.  Short Term Wind Speed Prediction Using Multiple Kernel Pseudo Inverse Neural Network . International Journal of Automation and Computing, 2018, 15(1): 66-83. doi: 10.1007/s11633-017-1086-7
    [9] Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda, Noureddine Ghoggali.  A Novel Active Learning Method Using SVM for Text Classification . International Journal of Automation and Computing, 2018, 15(3): 290-298. doi: 10.1007/s11633-015-0912-z
    [10] Pavla Bromová, Petr Škoda, Jaroslav Vážný.  Classification of Spectra of Emission Line Stars Using Machine Learning Techniques . International Journal of Automation and Computing, 2014, 11(3): 265-273. doi: 10.1007/s11633-014-0789-2
    [11] R. I. Minu, K. K. Thyagharajan.  Semantic Rule Based Image Visual Feature Ontology Creation . International Journal of Automation and Computing, 2014, 11(5): 489-499. doi: 10.1007/s11633-014-0832-3
    [12] Nassim Laouti, Sami Othman, Mazen Alamir, Nida Sheibat-Othman.  Combination of Model-based Observer and Support Vector Machines for Fault Detection of Wind Turbines . International Journal of Automation and Computing, 2014, 11(3): 274-287. doi: 10.1007/s11633-014-0790-9
    [13] Li-Jie Zhao, Tian-You Chai, De-Cheng Yuan.  Selective Ensemble Extreme Learning Machine Modeling of Effluent Quality in Wastewater Treatment Plants . International Journal of Automation and Computing, 2012, 9(6): 627-633 . doi: 10.1007/s11633-012-0688-3
    [14] Lei Liu, Feng Yang, Peng Zhang, Jing-Yi Wu, Liang Hu.  SVM-based Ontology Matching Approach . International Journal of Automation and Computing, 2012, 9(3): 306-314. doi: 10.1007/s11633-012-0649-x
    [15] Hussein Al-Bahadili, Shakir M. Hussain.  A Bit-level Text Compression Scheme Based on the ACW Algorithm . International Journal of Automation and Computing, 2010, 7(1): 123-131. doi: 10.1007/s11633-010-0123-6
    [16] Li-Wei Han,  De Xu.  Statistic Learning-based Defect Detection for Twill Fabrics . International Journal of Automation and Computing, 2010, 7(1): 86-94. doi: 10.1007/s11633-010-0086-7
    [17] G. Sahoo, Tapas Kumar, B. L. Rains, C. M. Bhatia.  Text Extraction and Enhancement of Binary Images Using Cellular Automata . International Journal of Automation and Computing, 2009, 6(3): 254-260. doi: 10.1007/s11633-009-0254-9
    [18] Yukiko Kenmochi, Lilian Buzer, Akihiro Sugimoto, Ikuko Shimizu.  Discrete Plane Segmentation and Estimation from a Point Cloud Using Local Geometric Patterns . International Journal of Automation and Computing, 2008, 5(3): 246-256. doi: 10.1007/s11633-008-0246-1
    [19] Xun Chen,  Thitikorn Limchimchol.  Monitoring Grinding Wheel Redress-life Using Support Vector Machines . International Journal of Automation and Computing, 2006, 3(1): 56-62. doi: 10.1007/s11633-006-0056-2
    [20] L. Meng,  Q. H. Wu.  Fast Training of Support Vector Machines Using Error-Center-Based Optimization . International Journal of Automation and Computing, 2005, 2(1): 6-12. doi: 10.1007/s11633-005-0006-4
  • 加载中
计量
  • 文章访问数:  601
  • HTML全文浏览量:  313
  • PDF下载量:  9
  • 被引次数: 0
出版历程
  • 收稿日期:  2014-01-14
  • 修回日期:  2014-06-20

Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance

doi: 10.1007/s11633-014-0833-2
    基金项目:

    This work was supported by National Natural Science Foundation of China (Nos. 61300163, 61125106 and 61300162) and Jiangsu Key Laboratory of Big Data Analysis Technology.

    通讯作者: Sheng-Ye Yan
  • 中图分类号:

摘要: This paper proposes a new two-phase approach to robust text detection by integrating the visual appearance and the geometric reasoning rules. In the first phase, geometric rules are used to achieve a higher recall rate. Specifically, a robust stroke width transform (RSWT) feature is proposed to better recover the stroke width by additionally considering the cross of two strokes and the continuousness of the letter border. In the second phase, a classification scheme based on visual appearance features is used to reject the false alarms while keeping the recall rate. To learn a better classifier from multiple visual appearance features, a novel classification method called double soft multiple kernel learning (DS-MKL) is proposed. DS-MKL is motivated by a novel kernel margin perspective for multiple kernel learning and can effectively suppress the influence of noisy base kernels. Comprehensive experiments on the benchmark ICDAR2005 competition dataset demonstrate the effectiveness of the proposed two-phase text detection approach over the state-of-the-art approaches by a performance gain up to 4.4% in terms of F-measure.

English Abstract

Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance[J]. 国际自动化与计算杂志(英)/International Journal of Automation and Computing, 2014, 11(5): 480-488. doi: 10.1007/s11633-014-0833-2
引用本文: Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance[J]. 国际自动化与计算杂志(英)/International Journal of Automation and Computing, 2014, 11(5): 480-488. doi: 10.1007/s11633-014-0833-2
Sheng-Ye Yan, Xin-Xing Xu and Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance. International Journal of Automation and Computing, vol. 11, no. 5, pp. 480-488, 2014. doi: 10.1007/s11633-014-0833-2
Citation: Sheng-Ye Yan, Xin-Xing Xu and Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance. International Journal of Automation and Computing, vol. 11, no. 5, pp. 480-488, 2014. doi: 10.1007/s11633-014-0833-2
参考文献 (30)

目录

    /

    返回文章
    返回