Difference between revisions of "Chambers and Jurafsky, Unsupervised Learning of Narrative Event Chains, ACL 2008"

Revision as of 01:08, 29 November 2011

Reviews of this paper

Citation

Unsupervised Learning of Narrative Event Chains, by N. Chambers, D. Jurafsky. In Proceedings of ACL-08: HLT, 2008.

Online version

This paper is available online [1].

Summary

This paper addresses the problem of unsupervised extraction of narrative event chains, which are partially ordered sets of narrative events centered around a common protagonist. To extract narrative event chains, the paper assumes two things:

narrative coherence: verbs sharing co-referring arguments are semantically connected by virtue of narrative discourse structure
the protagonist: although a narrative has several participants, there is a central actor who characterizes a narrative chain, which is the protagonist

Narrative chains are somewhat related to structured sequences of participants and events that are called scripts (Schank and Abelson, 1977).

As covered in the guest lecture by Brendan O'Connor, scripts were central to natural language understanding research in the 1970s and 1980s for tasks such as question answering, story understanding, summarization, etc. For example, Schank and Abelson (1977)'s restaurant script which is proposed to understand text about restaurants based on the participants (customer, waiter, etc.), their actions or movements (entering, sitting down, ordering, ingesting, etc), and effects resulting from each actions which can be seen as a collection of events with some sort of (temporal) ordering. These scripts or template (or "sketchy scripts" in the case of DeJong's FRUMP system, 1982) were manually written. The purpose of this paper is to learn such "scripts" from a collection of documents automatically.

Brief description of the method

The objective of this paper is to automatically extract narrative event chains, which are collections of events that are sharing a common protagonist and are temporally ordered.

The paper proposes three steps process to learning narrative event chains:

Narrative event induction: using unsupervised distributional method by computing pointwise mutual information of events that share co-referring arguments
Temporal ordering of events: using Support Vector Machines to classify partial temporal ordering of the connected events
Structured selection: using pruning and clustering to extract self-contained chains from the space of events

In the first step, relatedness between pairs of narrative events are first computed using pointwise mutual information:

$pmi(e(w,d),e(v,g))=log{\frac {P(e(w,d),e(v,g))}{P(e(w,d))P(e(v,g))}}\,\!$

where $\textstyle P(e(w,d),e(v,g))={\frac {C(e(w,d),e(v,g))}{\sum _{x,y}\sum _{d,f}C(e(x,d),e(y,f))}}\,\!$

where e(w,d) is a narrative event (e.g. e(push,patient)) and C(e(x,d),e(y,f)) is the number of times two narrative events e(x,d) and e(y,f) share a co-referring argument filling the values of the typed dependencies d and f.

Thus, given a chain of narrative events, a most likely event to be put in the chain can be computed by:

$max_{j:0<j<m}\sum _{i=0}^{n}pmi(e_{i},f_{j})\,\!$

where n is the number of events already in the chain, $\textstyle e_{i}\,\!$ is the i th event, and m is the number of events f available to be selected next. An unordered set of narrative events related by a common protagonist can therefore be extracted using this method.

In the second step, given the unordered set of narrative events, the paper finds temporal ordering of the events. The paper uses two step-appproach as in Chambers et.al. (2007) Classifying temporal relations between events:

SVM to label the temporal attributes of events such as tense, grammatical aspect, etc
using attributes predicted in the first step combined with other linguistic features, SVM to classify the temporal relations between events

Experimental Result

The experiment is conducted on documents from the Gigaword corpus. The temporal classifier is trained on TimeBank Corpus.

In terms of extracting related events, the proposed method shows a 36% improvement over baseline that learns relatedness strictly based upon verb co-occurrence (PMI is computed between all occurrences of two verbs in the same document, without requiring the verbs to share a common protagonist). The proposed method also shows 25% improvement for temporal ordering over random ordering of the connected events.

The experiment is conducted on sentences extracted from Wikipedia articles. The ground truth is constructed by asking human subjects to label the temporal relations between all pairs of elements by comparing their start and ending points. Each pair is labeled by at least two people with a third person to resolve any disagreement.

Performance of TIE is measured using precision that measures the number of correct predictions over all predictions made in test data and temporal entropy (TE) that measures the number and "tightness" of the induced temporal bounds. A human or a system can give time bounds to the starting and end points of the element in the text. Temporal entropy of a time point is thus measured as the log of the length of the tightest bound produced for the time point. The lower the entropy, the better (tighter) is the bound.

The precision and TE of TIE are compared to three other systems: (1) PASCA, a pattern-based extraction system for question answering, (2) SRL, (3) TARSQI. In terms of precision, TIE is able to extract more correct constraints at comparable precision to PASCA and SRL. In terms of TE, TIE has temporal entropy that is the closest to a human's skill. From the experiments, it is also observed that SRL feature improves the precision of TIE while transitivity feature improves the recall of the system. On TempEval task, TIE is able to achieve an accuracy of 0.695 which is comparable to that of the current state of the art, 0.716 (Yoshikawa et al. 2009).

Discussion

This paper addresses the problem of temporal information extraction and Temporal ordering by using an assembly of existing systems in a pipeline called TIE to extract temporal elements and their features from text and Markov Logic Networks to classify the temporal constraints between the starting or ending time points of these extracted elements. The system also proposes a measure called temporal entropy to measure the number and the tightness of the constraints produced.

TIE is shown to outperform the other three approaches: PASCA, SRL, TARSQI in terms of precision and temporal entropy.

Despite the good performance results, the contribution of this paper to the area of temporal ordering and temporal information extraction seems rather minimal as it simply uses an array of already existing systems. The idea of using Markov Logic Network for temporal ordering is also not new and it (Yoshikawa et al. 2009) even outperforms the system proposed in this paper in terms of accuracy. The main argument for the paper is that it predicts a different temporal ordering than that of other systems. Instead of classifying temporal relations between elements into a set of classes: BEFORE, AFTER, OVERLAP; this paper classifies the temporal relations between the elements' starting or ending time points.

Related papers

A related paper is Bean and Riloff (2004) Unsupervised Learning of contextual role knowledge for coreference resolution that proposes the use of caseframe networks (pairs of related caseframes (a verb/event and a semantic role) that indicate synonymy or relatedness, for example '<patient> kidnapped' and '<patient> released' are related caseframes). This paper generalizes the use of these caseframes to find entire set of events (verbs) anchored on the same argument rather than just pairs of related frames.

Another related paper is Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations which, similar to this paper, is using pointwise mutual information to automatically find relations between verbs. This work is different in that it uses a protagonist as the indicator of relatedness between the verbs.

Most recent related paper is Chambers and Jurafsky, ACL 2010 which finds clusters of verbs that are sharing arguments to define templates automatically from documents.

@@ Line 12: / Line 12: @@
 == Summary ==
-This [[Category::paper]] addresses the problem of unsupervised extraction of [[AddressesProblem::narrative event chains]], which are partially ordered sets of events centered around a common protagonist. Narrative chains are related to structured sequences of participants and events that are called scripts ([[RelatedPaper::Roger C. Schank and Robert P. Abelson. 1977. Scripts, plans, goals and understanding. Lawrence Erlbaum|Schank and Abelson, 1977]]).
+This [[Category::paper]] addresses the problem of unsupervised extraction of [[AddressesProblem::narrative event chains]], which are partially ordered sets of narrative events centered around a common protagonist. To extract narrative event chains, the paper assumes two things:
-As covered in the [[Class_Meeting_for_10-710_11-17-2011|guest lecture]] by [http://brenocon.com Brendan O'Connor], scripts were central to natural language understanding research in the 1970s and 1980s for tasks such as question answering, story understanding, summarization, etc. For example, Schank and Abelson (1977)'s restaurant script which is proposed to understand text about restaurants based on the participants (customer, waiter, etc.), their actions or movements (entering, sitting down, ordering, ingesting, etc), and various preconditions and effects resulting from each actions which can be seen as a collection of events with some sort of (temporal) ordering. These scripts or template (or "sketchy scripts" in the case of  DeJong's FRUMP system, 1982) were manually written. The purpose of this paper is to learn such "scripts" from a collection of documents automatically.
+* ''narrative coherence'': verbs sharing co-referring arguments are semantically connected by virtue of narrative discourse structure
+* ''the protagonist'': although a narrative has several participants, there is a central actor who characterizes a narrative chain, which is the protagonist
+Narrative chains are somewhat related to structured sequences of participants and events that are called scripts ([[RelatedPaper::Roger C. Schank and Robert P. Abelson. 1977. Scripts, plans, goals and understanding. Lawrence Erlbaum|Schank and Abelson, 1977]]).
+As covered in the [[Class_Meeting_for_10-710_11-17-2011|guest lecture]] by [http://brenocon.com Brendan O'Connor], scripts were central to natural language understanding research in the 1970s and 1980s for tasks such as question answering, story understanding, summarization, etc. For example, Schank and Abelson (1977)'s restaurant script which is proposed to understand text about restaurants based on the participants (customer, waiter, etc.), their actions or movements (entering, sitting down, ordering, ingesting, etc), and effects resulting from each actions which can be seen as a collection of events with some sort of (temporal) ordering. These scripts or template (or "sketchy scripts" in the case of  DeJong's FRUMP system, 1982) were manually written. The purpose of this paper is to learn such "scripts" from a collection of documents automatically.
 == Brief description of the method ==
@@ Line 25: / Line 30: @@
 * Temporal ordering of events: using [[UsesMethod::Support Vector Machines]] to classify partial [[AddressesProblem::temporal ordering]] of the connected events
 * Structured selection: using pruning and [[UsesMethod::clustering]] to extract self-contained chains from the space of events
+In the first step, relatedness between pairs of narrative events are first computed using pointwise mutual information:
+<math>pmi(e(w,d),e(v,g)) = log \frac{P(e(w,d),e(v,g))}{P(e(w,d))P(e(v,g))}\,\!</math>
+where <math>\textstyle P(e(w,d),e(v,g)) = \frac{C(e(w,d),e(v,g))}{\sum_{x,y}\sum_{d,f} C(e(x,d),e(y,f))}\,\!</math>
+where ''e(w,d)'' is a narrative event (e.g. ''e(push,patient)'') and ''C(e(x,d),e(y,f))'' is the number of times two narrative events ''e(x,d)'' and ''e(y,f)'' share a co-referring argument filling the values of the typed dependencies ''d'' and ''f''.
+Thus, given a chain of narrative events, a most likely event to be put in the chain can be computed by:
+<math>max_{j:0<j<m} \sum_{i=0}^n pmi(e_i,f_j)\,\!</math>
+where ''n'' is the number of events already in the chain,  <math>\textstyle e_i\,\!</math> is the ''i'' th event, and ''m'' is the number of events ''f'' available to be selected next. An unordered set of narrative events related by a common protagonist can therefore be extracted using this method.
+In the second step, given the unordered set of narrative events, the paper finds temporal ordering of the events. The paper uses two step-appproach as in [[RelatedPaper::Chambers et.al. (2007) Classifying temporal relations between events]]:
+* SVM to label the temporal attributes of events such as tense, grammatical aspect, etc
+* using attributes predicted in the first step combined with other linguistic features, SVM to classify the temporal relations between events
 == Experimental Result ==
+The experiment is conducted on documents from the [[UsesDataset::Gigaword corpus]]. The temporal classifier is trained on [[UsesDataset::TimeBank Corpus]].
 In terms of extracting related events, the proposed method shows a 36% improvement over baseline that learns relatedness strictly based upon verb co-occurrence (PMI is computed between all occurrences of two verbs in the same document, without requiring the verbs to share a common protagonist). The proposed method also shows 25% improvement for temporal ordering over random ordering of the connected events.

Difference between revisions of "Chambers and Jurafsky, Unsupervised Learning of Narrative Event Chains, ACL 2008"

Revision as of 01:08, 29 November 2011

Contents

Reviews of this paper

Citation

Online version

Summary

Brief description of the method

Experimental Result

Discussion

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools