Unsupervised Modeling of Dialog Acts in Asynchronous Conversation

From Cohen Courses
Jump to navigationJump to search

Citation

Shafiq Joty, Giuseppe Carenini, Chin-Yew Lin. Unsupervised Modeling of Dialog Acts in Asynchronous Conversations. In Proceedings of the twenty second International Joint Conference on Artificial Intelligence (IJCAI) 2011. Barcelona, Spain.

Online version

Click here to download

Brief Summary

This paper aims at Modeling of Dialog Acts in asynchronous conversations in an unsupervised setting. There were 12 different dialog acts targeted, viz. Statement, Polite Mechanism, Yes-no question, Action motivator, Wh-question, Accept response, Open-ended question, Acknowledge and appreciate, Or-clause question, Reject response, Uncertain response, and Rhetorical Question. The experiments were done on conversations from two domains: emails, and discussion fora. The authors started with modeling the problem as a clustering problem. They used a graph theoretic framework and represented the conversation as a Fragment Quotation Graph (FQG), in which each email fragment or forum post was represented as a node, and an edge existed between two nodes if one fragment or post was in response to the other. The weights on the edges were decided using a number of features which we shall see later. An N min cut was then used to cluster the graphs. However, this experiment didn't prove to be doing well. The authors took specific measures so as to avoid topic-based clustering, but the model was still confusing dialog-acts with the topics. The authors, thus, resorted to HMM so that they could make use of the sequential structure of the conversations. However, based on the experiments with clustering they were apprehensive if they could separate topic-modeling from dialog-act-modeling even when HMM was used. So they tried a combination of HMM and Gaussian Mixtures to model the dialog acts. The final results beat the baseline by a significant margin.

Data-set

Data
The Dialog Act tagset was taken from the Meeting Recorder Dialog Act (MRDA) tagset created by Dhillon et al [1]. The training data used was unlabeled, whereas the test data was labeled by 2 human annotators. The training data for emails was a set of 23957 emails from W3C email corpus, while that for the discussion fora, was a set of 25,000 forum threads from the discussion fora of travel advising site TravelAdvisor. The test data for the emails was a set of 40 email threads from the BC3 corpus (Ulrich et. al.)[2], while that for discussion fora was a set of 200 forum threads. The dialog act categories labelled by human annotators had similar break-up in both the email set and the discussion thread set, as shown in the fig. below. The agreements between the two human annotators were 0.79 for email dataset and 0.73 for forum dataset.
Dialog act categories.jpg
Data Pre-processing
Out of the email and forum data, fragment quotation graphs (FQGs) were created, as mentioned above.

Graph-Theoretic Framework

The FQG was then transformed into a similarity graph , in which the sentences from the email or forum post would form the set of nodes and the nodes representing sentences in adjacent posts (as inferred from the FQG) would be joined with edges. Each edge would be assigned a weight, which would be some measure of similarity. A clustering of the nodes was then done with an assumption that sentences within the same cluster would represent the same dialog act. The clustering problem was modeled as an N-mincut graph clustering problem with the cut-criterion as below:

where is the total connection from nodes in partition A to nodes in partition B, is the total connection from nodes in A to all other nodes in the graph; is defined similarly. The authors experimented with a number of measures to find similarity between the sentences: A Bag-Of-Words based measure in which the similarity between two sentences will be the cosine similarity between the vector of TF-IDF scores of the words in the sentences; A Word-Subsequence Kernel based measure which would transform the vector of words (POS tags for the experiments in this paper) to a higher-dimensional space and find the similarity in that space; An Extended WSK in which syntactic/semantic features of the words were used along with the words (POS tags, rather); A dependency-similarity based measure in which the similarity will be scored by finding number of co-occurring Basic Elements (BEs) in the dependency parse trees of the two sentences (A BE is a (head, modifier, relation) triple); A syntactic tree similarity measure using Tree Kernel function (Collins and Duffy)[3] to find the similarity between the sentences; And finally, a linear combination of all these measures. As baseline, all sentences were assumed to represent the dialog act "Statement", as Statement was the most frequently occurring dialog act in the annotated test set. The results of these experiments are present in the below table. For evaluation a 1-to-1 metric was used, in which the clusters in annotated test set were made to overlap with the clusters in the result until the pair-wise overlap between the clusters from the two sets would be maximum. The mean of percentage of this overlap for each cluster would then be reported as the final score.
Results graph damodeling.jpg