Transfer Hierarchical Attention Network for Generative Dialog System

  • Share:

Recently, prof. Qiang Yang from HongKong University of Science and Technology has published a paper centered on Transfer Hierarchical Attention Network for Generative Dialog System in IJAC. This work aims to develop a more accurate dialog context representation model by proposing a novel attention mechanism. Read details below:

 

Download Full Text

Transfer Hierarchical Attention Network for Generative Dialog System

Xiang Zhang, Qiang Yang

1)SpringerLink:

https://link.springer.com/article/10.1007/s11633-019-1200-0

2)IJAC URL:

http://www.ijac.net/en/article/doi/10.1007/s11633-019-1200-0

 



Introduction

The Chit-chat dialog system is a promising natural language processing (NLP) technology which aims to enable computers to chat with human through natural language. Traditional chit-chat dialog systems are built by hand-crafted rules or directly selecting a human writing response from candidate pool using information retrieval (IR) technology. These systems are not robust and it is difficult to deploy them in new domains. In recent years, deep learning has accomplished great success in various domains and a new paradigm called the generative dialog system achieves better performance than traditional works. The generative dialog system utilizes deep neural networks to model the complex dependency in dialog context and directly generate natural language utterances to converse with user. Several successful applications like Microsoft′s XiaoIce use generative dialog system technology and they are interacting with millions of people every day.

 

There are three basic components to build a generative dialog system: dialog context representation learning, response content selection and response generation. Given the dialog context, the model firstly learns a representation to encode the semantic information of the context. Then the model will decide the content for reply based on the dialog context representation. A final response will be generated by the language generation algorithm. By using a large scale human dialog corpus, all the three components are optimized jointly in an end-to-end paradigm to make the model emulate the agents in the training corpus.

 



The bottleneck of the state-of-the-art representation model is the inaccurate attention scores. We assume the reason is because the information used to train the attention neural network is inadequate: the additive attention mechanism just utilizes token information and the current decoder state to compute the weight score. Intuitively, it is trained in an unsupervised learning nature and the model does not have sufficient prior knowledge to identify crucial words and sentences in the dialog context. We think transfer learning is an effective approach to enhance the additive attention mechanism where keyword extraction and sentence entailment are used as auxiliary tasks to help the target model to obtain more reasonable weight scores. By transferring the knowledge of parsing syntactic structure and analyzing semantic relationships to target tasks, prior bias is injected and they are beneficial for determining important linguistic elements. This idea is also similar to recent advances in the machine translation field where word alignment information is used in attention networks to train it in a supervised learning style.

 



Based on the above motivation,
we propose a novel transfer learning based attention mechanism and develop a new generative dialog framework: transfer hierarchical attention network (THAN). We apply two transfer learning methods to transfer knowledge from source task to target task: one is parameter pre-training and another one is network stacking. Various experiments have proved the effectiveness of these two methods. We build a single-turn and a multi-turn dialog model based on the THAN and we conduct comprehensive experiments on large scale public datasets including quantitative evaluation and qualitative analysis. The results demonstrate that the THAN slightly outperforms the state-of-the-art models and it is able to generate logically consistent and semantically informative response.

 



The outline of the following sections is: In Section 2, we give a brief review of the related works in generative dialog systems, and we introduce the cutting-edge design of the attention mechanism. We also review the parameter pre-training and network stacking techniques of transfer learning which are applied in our work. The formal problem definition and the notations we used are introduced in Section 3. Then we give a detailed description of the models in Section 4 including the single-turn THAN, the multi-turn THAN and the auxiliary source task models. The experimental evaluations will be covered in Section 5 and we will discuss the conclusions and future directions in Section 6.

 



Download Full Text

Transfer Hierarchical Attention Network for Generative Dialog System

Xiang Zhang, Qiang Yang

1)SpringerLink:

https://link.springer.com/article/10.1007/s11633-019-1200-0

2)IJAC URL:

http://www.ijac.net/en/article/doi/10.1007/s11633-019-1200-0

 



◇◇◇◇◇◇

For more up-to-date information:

1) WeChat: IJAC

2) Twitter:IJAC_Journal

3) Facebook:International Journal of Automation and Computing

4) Linkedin: Int.J. of Automation and Computing

5) Sina Weibo:IJAC-国际自动化与计算杂志

Current Issue

2019 Vol.16 No.5

Table of Contents

ISSN 1476-8186

E-ISSN 1751-8520

CN 11-5350/TP

Editors-in-chief
Tieniu TAN, Chinese Academy of Sciences Guoping LIU, University of South Wales Huosheng HU, University of Essex
Global Visitors