Abhijit Guha, Debabrata Samanta. Hybrid Approach to Document Anomaly Detection:  An Application to Facilitate RPA in Title Insurance. International Journal of Automation and Computing, vol. 18, no. 1, pp.55-72, 2021. https://doi.org/10.1007/s11633-020-1247-y
Citation: Abhijit Guha, Debabrata Samanta.

Hybrid Approach to Document Anomaly Detection:  An Application to Facilitate RPA in Title Insurance

. International Journal of Automation and Computing, vol. 18, no. 1, pp.55-72, 2021. https://doi.org/10.1007/s11633-020-1247-y

Hybrid Approach to Document Anomaly Detection:  An Application to Facilitate RPA in Title Insurance

doi: 10.1007/s11633-020-1247-y
More Information
  • Author Bio:

    Abhijit Guha received the B. Sc. degree (Chemistry Honors) from Calcutta University, India in 2006, and MCA (master of computer applications) degree in computer applications degree from Academy of Technology under West Bengal University of Technology, India 2009. He is a Ph. D. degree candidate in Department of Data Science, CHRIST (Deemed to be University), India. Presently, he is working as a research and development scientist in First American India Private Limited. His research areas include document image processing, data mining, statistical modeling, machine learning modelling in title insurance domain. He has delivered multiple business solutions using the AI technologies and received consecutive three “Innovation of the year” awards from 2015 to 2017 by First American India for his contribution towards his research.His research interests include artificial intelligence, natural language processing, text mining statistical learning and machine learning. E-mail: abhijitguha.research@gmail.com (Corresponding author) ORCID iD: 0000-0002-3280-5730

    Debabrata Samanta received the B. Sc. degree (Physics Honors) from Calcutta University, India in 2007, and MCA degree from Academy of Technology under West Bengal University of Technology, India in 2010, and the Ph. D. degree in computer science and engineering from National Institute of Technology, India in 2014. In 2015, he was a faculty member at Dayananda Sagar University, India and in 2019 he was at CHRIST (Deemed to be University ), India. Currently, he is an assistant professor in Department of Computer Science at CHRIST (Deemed to be University), India. He is a professional IEEE member, an associate life member of Computer Society of India (CSI) and a life member of Indian Society for Technical Education (ISTE). He has authored and coauthored over 127 papers in SCI/Scopus/Springer/Elsevier journals and IEEE/Springer/Elsevier conference proceedings in areas of artificial intelligence, natural language processing and image processing. He has received “Scholastic Award” at the 2nd International conference on Computer Science and IT Application, CSIT-2011, India. He has published 9 books, available for sale on Amazon and Flipkart. He has edited 1 book available on Google Book server. He has authored and coauthored of 2 Elsevier and 5 Springer Book Chapter. He is a convener, keynote speaker, technical programme committee (TPC) member in various conferences/workshops, etc. He was an invited speaker at several Institutions.His research interests include artificial intelligence, natural language processing and image processing. E-mail: debabrata.samanta369@gmail.com ORDID iD: 0000-0003-4118-2480

  • Received Date: 2020-04-05
  • Accepted Date: 2020-07-31
  • Publish Online: 2020-10-21
  • Publish Date: 2021-02-18
  • Anomaly detection (AD) is an important aspect of various domains and title insurance (TI) is no exception. Robotic process automation (RPA) is taking over manual tasks in TI business processes, but it has its limitations without the support of artificial intelligence (AI) and machine learning (ML). With increasing data dimensionality and in composite population scenarios, the complexity of detecting anomalies increases and AD in automated document management systems (ADMS) is the least explored domain. Deep learning, being the fastest maturing technology can be combined along with traditional anomaly detectors to facilitate and improve the RPAs in TI. We present a hybrid model for AD, using autoencoders (AE) and a one-class support vector machine (OSVM). In the present study, OSVM receives input features representing real-time documents from the TI business, orchestrated and with dimensions reduced by AE. The results obtained from multiple experiments are comparable with traditional methods and within a business acceptable range, regarding accuracy and performance.

     

  • loading
  • [1]
    X. D. Xu, H. W. Liu, M. H. Yao. Recent progress of anomaly detection. Complexity, vol. 2019, Article number 2686378, 2019. DOI: 10.1155/2019/2686378.
    [2]
    Y. Hao, Z. J. Xu, Y. Liu, J. Wang, J. L. Fan. Effective crowd anomaly detection through spatio-temporal texture analysis. International Journal of Automation and Computing, vol. 16, no. 1, pp. 27–39, 2019. DOI: 10.1007/s11633-018-1141-z.
    [3]
    M. Anderka, B. Stein, N. Lipka. Detection of text quality flaws as a one-class classification problem. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, Glasgow, UK, pp.2313–2316, 2011. DOI: 10.1145/2063576.2063954.
    [4]
    Z. G. Ding, D. J. Du, M. R. Fei. An isolation principle based distributed anomaly detection method in wireless sensor networks. International Journal of Automation and Computing, vol. 12, no. 4, pp. 402–412, 2015. DOI: 10.1007/s11633-014-0847-9.
    [5]
    V. Chandola, A. Banerjee, V. Kumar. Anomaly detection: A survey. ACM Computing Surveys, vol. 41, no. 3, Article number 15, 2009. DOI: 10.1145/1541880.1541882.
    [6]
    S. S. Khan, M. G. Madden. One-class classification: Taxonomy of study and review of techniques. The Knowledge Engineering Review, vol. 29, no. 3, pp. 345–374, 2014. DOI: 10.1017/S026988891300043X.
    [7]
    M. Kemmler, E. Rodner, E. S. Wacker, J. Denzler. One-class classification with Gaussian processes. Pattern Recognition, vol. 46, no. 12, pp. 3507–3518, 2013. DOI: 10.1016/j.patcog.2013.06.005.
    [8]
    Q. Leng, H. G. Qi, J. Miao, W. T. Zhu, G. P. Su. One-class classification with extreme learning machine. Mathematical Problems in Engineering, vol. 2015, Article number 412957, 2015. DOI: 10.1155/2015/412957.
    [9]
    P. F. Liang, W. T. Li, H. Tian, J. L. Hu. One-class classification using a support vector machine with a quasi-linear kernel. IEEJ Transactions on Electrical and Electronic Engineering, vol. 14, no. 3, pp. 449–456, 2019. DOI: 10.1002/tee.22826.
    [10]
    C. Bellinger, S. Sharma, N. Japkowicz. One-class versus binary classification: Which and when? In Proceedings of the 11th International Conference on Machine Learning and Applications, IEEE, Boca Raton, USA, pp.102–106, 2012. DOI: 10.1109/ICMLA.2012.212.
    [11]
    A. Guha, D. Samanta. Real-time application of document classification based on machine learning. In Proceedings of the 1st International Conference on Information, Communication and Computing Technology, Springer, Istanbul, Turkey, pp.366–379, 2020. DOI: 10.1007/978-3-030-38501-9_37.
    [12]
    Y. Chen, M. J. Zaki. Kate: K-competitive autoencoder for text. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, Canada, pp.85–94, 2017. DOI: 10.1145/3097983.3098017.
    [13]
    D. Cozzolino, L. Verdoliva. Single-image splicing localization through autoencoder-based anomaly detection. In Proceedings of IEEE International Workshop on Information Forensics and Security, IEEE, Abu Dhabi, United Arab Emirates, 2016. DOI: 10.1109/WIFS.2016.7823921.
    [14]
    D. Y. Oh, I. D. Yun. Residual error based anomaly detection using auto-encoder in SMD machine sound. Sensors, vol. 18, Article number 1308, 2018. DOI: 10.3390/s18051308.
    [15]
    J. Mourao-Miranda, D. R. Hardoon, T. Hahn, A. F. Marquand, S. C. R. Williams, J. Shawe-Taylor, M. Brammer. Patient classification as an outlier detection problem: An application of the one-class support vector machine. NeuroImage, vol. 58, no. 3, pp. 793–804, 2011. DOI: 10.1016/j.neuroimage.2011.06.042.
    [16]
    L. M. Manevitz, M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, vol. 2, no. 1, pp. 139–154, 2001.
    [17]
    T. Sukchotrat, S. B. Kim, F. Tsung. One-class classification-based control charts for multivariate process monitoring. ⅡE Transactions, vol. 42, no. 2, pp. 107–120, 2009. DOI: 10.1080/07408170903019150.
    [18]
    P. Perera, V. M. Patel. Learning deep features for one-class classification. IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5450–5463, 2019. DOI: 10.1109/TIP.2019.2917862.
    [19]
    L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Muller, M. Kloft. Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp.4393–4402, 2018.
    [20]
    B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, J. Platt. Support vector method for novelty detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems, ACM, Denver, USA, pp.582–588, 1999.
    [21]
    D. M. J. Tax, R. P. W. Duin. Support vector data description. Machine Learning, vol. 54, no. 1, pp. 45–66, 2004. DOI: 10.1023/B:MACH.0000008084.60811.49.
    [22]
    I. Goodfellow, Y. Bengio, A. Courville. Deep Learning, Cambridge, USA: MIT Press, 2016.
    [23]
    M. Sakurada, T. Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2nd Workshop on Machine Learning for Sensory Data Analysis, ACM, Gold Coast, Australia, pp.4–11, 2014. DOI: 10.1145/2689746.2689747.
    [24]
    M. Goldstein, S. Uchida. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS One, vol. 11, no. 4, Article number e0152173, 2016. DOI: 10.1371/journal.pone.0152173.
    [25]
    S. S. Khan, M. G. Madden. A survey of recent trends in one class classification. In Proceedings of the 20th Irish Conference on Artificial Intelligence and Cognitive Science, Springer, Dublin, Ireland, pp.188–197, 2010. DOI: 10.1007/978-3-642-17080-5_21.
    [26]
    V. Mahadevan, W. X. Li, V. Bhalodia, N. Vasconcelos. Anomaly detection in crowded scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp.1975–1981, 2010. DOI: 10.1109/CVPR.2010.5539872.
    [27]
    W. X. Li, V. Mahadevan, N. Vasconcelos. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 1, pp. 18–32, 2014. DOI: 10.1109/TPAMI.2013.111.
    [28]
    M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed, R. Klette. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Computer Vision and Image Understanding, vol. 172, pp. 88–97, 2018. DOI: 10.1016/j.cviu.2018.02.006.
    [29]
    G. Kim, S. Lee, S. Kim. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Systems with Applications, vol. 41, no. 4, pp. 1690–1700, 2014. DOI: 10.1016/j.eswa.2013.08.066.
    [30]
    R. C. Aygun, A. G. Yavuz. Network anomaly detection with stochastically improved autoencoder based models. In Proceedings of the 4th IEEE International Conference on Cyber Security and Cloud Computing, IEEE, New York, USA, pp.193–198, 2017. DOI: 10.1109/CSCloud.2017.39.
    [31]
    U. Fiore, F. Palmieri, A. Castiglione, A. De Santis. Network anomaly detection with the restricted Boltzmann machine. Neurocomputing, vol. 122, pp. 13–23, 2013. DOI: 10.1016/j.neucom.2012.11.050.
    [32]
    W. Li, Q. Du. Collaborative representation for hyperspectral anomaly detection. IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 3, pp. 1463–1474, 2015. DOI: 10.1109/TGRS.2014.2343955.
    [33]
    P. Papadimitriou, A. Dasdan, H. Garcia-Molina. Web graph similarity for anomaly detection. Journal of Internet Services and Applications, vol. 1, no. 1, pp. 19–30, 2010. DOI: 10.1007/s13174-010-0003-x.
    [34]
    C. W. Ten, J. B. Hong, C. C. Liu. Anomaly detection for cybersecurity of the substations. IEEE Transactions on Smart Grid, vol. 2, no. 4, pp. 865–873, 2011. DOI: 10.1109/TSG.2011.2159406.
    [35]
    S. Ahmad, A. Lavin, S. Purdy, Z. Agha. Unsupervised real-time anomaly detection for streaming data. Neurocomputing, vol. 262, pp. 134–147, 2017. DOI: 10.1016/j.neucom.2017.04.070.
    [36]
    T. Schlegl, P. Seebock, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the 25th International Conference on Information Processing in Medical Imaging, Springer, Boone, USA, pp.146–157, 2017. DOI: 10.1007/978-3-319-59050-9_12.
    [37]
    M. Du, F. F. Li, G. N. Zheng, V. Srikumar. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security, ACM, Dallas, USA, pp.1285–1298, 2017. DOI: 10.1145/3133956. 3134015.
    [38]
    H. M. Lu, Y. J. Li, S. L. Mu, D. Wang, H. Kim, S. Serikawa. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2315–2322, 2018. DOI: 10.1109/JIOT.2017.2737479.
    [39]
    P. V. Bindu, P. S. Thilagam. Mining social networks for anomalies: Methods and challenges. Journal of Network and Computer Applications, vol. 68, pp. 213–229, 2016. DOI: 10.1016/j.jnca.2016.02.021.
    [40]
    W. Z. Yan, L. J. Yu. On accurate and reliable anomaly detection for gas turbine combustors: A deep learning approach. https://arxiv.org/abs/1908.09238, 2019.
    [41]
    R. M. Alguliyev, R. M. Aliguliyev, Y. N. Imamverdiyev, L. V. Sukhostat. An anomaly detection based on optimization. International Journal of Intelligent Systems and Applications, vol. 9, no. 12, pp. 87–96, 2017. DOI: 10.5815/ijisa.2017.12.08.
    [42]
    M. H. Hassoun. Fundamentals of Artificial Neural Networks, Cambridge, USA: MIT Press, 1995.
    [43]
    M. D. Tissera, M. D. McDonnell. Deep extreme learning machines: Supervised autoencoding architecture for classification. Neurocomputing, vol. 174, pp. 42–49, 2016. DOI: 10.1016/j.neucom.2015.03.110.
    [44]
    R. Chalapathy, A. K. Menon, S. Chawla. Anomaly detection using one-class neural networks. https://arxiv.org/abs/1802.06360, 2018.
    [45]
    P. Oza, V. M. Patel. Active authentication using an autoencoder regularized CNN-based one-class classifier. In Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, IEEE, Lille, France, pp.1–8, 2019. DOI: 10.1109/FG.2019.8756525.
    [46]
    S. M. Erfani, S. Rajasegarar, S. Karunasekera, C. Leckie. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, vol. 58, pp. 121–134, 2016. DOI: 10.1016/j.patcog.2016.03.028.
    [47]
    J. An, S. Cho. Variational autoencoder based anomaly detection using reconstruction probability, Technical Report, SNU Data Mining Center, Korea, 2015.
    [48]
    W. Li, G. D. Wu, Q. Du. Transferred deep learning for anomaly detection in hyperspectral imagery. IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 597–601, 2017. DOI: 10.1109/LGRS.2017.2657818.
    [49]
    B. R. Kiran, D. M. Thomas, R. Parakkal. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. Journal of Imaging, vol. 4, no. 2, Article number 36, 2018. DOI: 10.3390/jimaging4020036.
    [50]
    T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, M. Ghogho. Deep learning approach for network intrusion detection in software defined networking. In Proceedings of International Conference on Wireless Networks and Mobile Communications, IEEE, Fez, Morocco, pp.258–263, 2016. DOI: 10.1109/WINCOM.2016.7777224.
    [51]
    V. L. Cao, M. Nicolau, J. McDermott. A hybrid autoencoder and density estimation model for anomaly detection. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Springer, Edinburgh, UK, pp.717–726, 2016. DOI: 10.1007/978-3-319-45823-6_67.
    [52]
    H. L. Yu, D. Sun, X. Y. Xi, X. B. Yang, S. Zheng, Q. Wang. Fuzzy one-class extreme auto-encoder. Neural Processing Letters, vol. 50, no. 1, pp. 701–727, 2019. DOI: 10.1007/s11063-018-9952-z.
    [53]
    D. Zimmerer, S. A. A. Kohl, J. Petersen, F. Isensee, K. H. Maier-Hein. Context-encoding variational autoencoder for unsupervised anomaly detection. [Online], Available: https://arxiv.org/abs/1812.05941, 2018.
    [54]
    M. Jeragh, M. AlSulaimi. Combining auto encoders and one class support vectors machine for fraudulant credit card transactions detection. In Proceedings of the 2nd World Conference on Smart Trends in Systems, Security and Sustainability, IEEE, London, UK, pp.178–184, 2018. DOI: 10.1109/WorldS4.2018.8611624.
    [55]
    Y. S. Chong, Y. H. Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In Proceedings of the 14th International Symposium on Neural Networks, Springer, Sapporo, Japan, pp.189–196, 2017. DOI: 10.1007/978-3-319-59081-3_23.
    [56]
    M. Amer, M. Goldstein, S. Abdennadher. Enhancing one-class support vector machines for unsupervised anomaly detection. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ACM, Chicago, USA, pp.8–15, 2013. DOI: 10.1145/2500853.2500857.
    [57]
    Y. C. Xiao, H. G. Wang, L. Zhang, W. L. Xu. Two methods of selecting Gaussian kernel parameters for one-class SVM and their application to fault detection. Knowledge-Based Systems, vol. 59, pp. 75–84, 2014. DOI: 10.1016/j.knosys.2014.01.020.
    [58]
    I. Irigoien, B. Sierra, C. Arenas. Towards application of one-class classification methods to medical data. The Scientific World Journal, vol. 2014, Article number 730712, 2014. DOI: 10.1155/2014/730712.
    [59]
    H. Yu. SVMC: Single-class classification with support vector machines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, ACM, Acapulco, Mexico, pp.567–572, 2003.
    [60]
    M. Hejazi, Y. P. Singh. One-class support vector machines approach to anomaly detection. Applied Artificial Intelligence, vol. 27, no. 5, pp. 351–366, 2013. DOI: 10.1080/08839514.2013.785791.
    [61]
    W. Khreich, B. Khosravifar, A. Hamou-Lhadj, C. Talhi. An anomaly detection system based on variable N-gram features and one-class SVM. Information and Software Technology, vol. 91, pp. 186–197, 2017. DOI: 10.1016/j.infsof.2017.07.009.
    [62]
    C. Gautam, R. Balaji, K. Sudharsan, A. Tiwari, K. Ahuja. Localized multiple kernel learning for anomaly detection: One-class classification. Knowledge-based Systems, vol. 165, pp. 241–252, 2019. DOI: 10.1016/j.knosys.2018.11.030.
    [63]
    B. Krawczyk, M. Wozniak, B. Cyganek. Clustering-based ensembles for one-class classification. Information Sciences, vol. 264, pp. 182–195, 2014. DOI: 10.1016/j.ins.2013.12.019.
    [64]
    D. M. J. Tax, K. R. Muller. Feature extraction for one-class classification. In Proceedings of Joint International Conference ICANN/ICONIP, Istanbul, Turbey, pp.342–349, 2003. DOI: 10.1007/3-540-44989-2_41.
    [65]
    Y. Goldberg, O. Levy. word2vec explained: Deriving Mikolov et al.′s negative-sampling word-embedding method. [Online], Available: https://arxiv.org/abs/1402.3722, 2014.
    [66]
    L. Van Der Maaten, G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
    [67]
    E. Mayoraz, E. Alpaydin. Support vector machines for multi-class classification. In Proceedings of the International Work-conference on Artificial Neural Networks, Springer, Alicante, Spain, pp.833–842, 1999. DOI: 10.1007/BFb0100551.
    [68]
    C. Zhou, R. C. Paffenroth. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, Canada, pp.665–674, 2017. DOI: 10.1145/3097983.3098052.
    [69]
    L. Manevitz, M. Yousef. One-class document classification via neural networks. Neurocomputing, vol. 70, no. 7–9, pp. 1466–1481, 2007. DOI: 10.1016/j.neucom.2006.05.013.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(14)  / Tables(15)

    Article Metrics

    Article views (195) PDF downloads(48) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return