# Chambers and Jurafsky, Unsupervised Learning of Narrative Event Chains, ACL 2008

## Citation

Unsupervised Learning of Narrative Event Chains, by N. Chambers, D. Jurafsky. In Proceedings of ACL-08: HLT, 2008.

## Online version

This paper is available online [1].

## Summary

This paper addresses the problem of unsupervised extraction of narrative event chains, which are partially ordered sets of narrative events centered around a common protagonist. To extract narrative event chains, the paper assumes two things:

• narrative coherence: verbs sharing co-referring arguments are semantically connected
• the protagonist: although a narrative has several participants, there is a central actor who characterizes a narrative chain, which is the protagonist

Narrative chains are somewhat related to structured sequences of participants and events that are called scripts (Schank and Abelson, 1977).

As covered in the guest lecture by Brendan O'Connor, scripts were central to natural language understanding research in the 1970s and 1980s for tasks such as question answering, story understanding, summarization, etc. For example, Schank and Abelson (1977)'s restaurant script which is proposed to understand text about restaurants based on the participants (customer, waiter, etc.), their actions or movements (entering, sitting down, ordering, ingesting, etc), and effects resulting from each actions which can be seen as a collection of events with some sort of (temporal) ordering. These scripts or template (or "sketchy scripts" in the case of DeJong's FRUMP system, 1982) were manually written. The purpose of this paper is to learn such "scripts" from a collection of documents automatically.

## Brief description of the method

The objective of this paper is to automatically extract narrative event chains, which are collections of events that are sharing a common protagonist and are temporally ordered.

The paper proposes three steps process to learning narrative event chains:

In the first step, relatedness between pairs of narrative events are first computed using pointwise mutual information:

${\displaystyle pmi(e(w,d),e(v,g))=log{\frac {P(e(w,d),e(v,g))}{P(e(w,d))P(e(v,g))}}\,\!}$
${\displaystyle P(e(w,d),e(v,g))={\frac {C(e(w,d),e(v,g))}{\sum _{x,y}\sum _{d,f}C(e(x,d),e(y,f))}}\,\!}$

where e(w,d) is a narrative event (e.g. e(push,patient)) and C(e(x,d),e(y,f)) is the number of times two narrative events e(x,d) and e(y,f) share a co-referring argument filling the values of the typed dependencies d and f.

Thus, given a chain of narrative events, a most likely event to be put in the chain can be computed by:

${\displaystyle max_{j:0

where n is the number of events already in the chain, ${\displaystyle \textstyle e_{i}\,\!}$ is the i th event, and m is the number of events f available to be selected next. An unordered set of narrative events related by a common protagonist can therefore be extracted using this method.

In the second step, given the unordered set of narrative events, the paper finds temporal ordering of the events. The paper uses two step-appproach as in Chambers et.al. (2007) Classifying temporal relations between events:

• use SVM to label the temporal attributes of events such as tense, grammatical aspect, etc
• use SVM with attributes predicted in the first step combined with other linguistic features; to classify the temporal relations between events

In the third step, an agglomerative clustering using PMI scores from the first step is used to find sets of narrative events. Then, the ordering relations from the second step is used to produce a directed graph from the sets of narrative events.

## Experimental Result

The experiment is conducted on documents from the Gigaword corpus. The temporal classifier is trained on TimeBank Corpus. For a document, the protagonist is defined as an entity that is mentioned the most number of times in the document. All the methods that follow are built around this protagonist.

In terms of extracting related events, the proposed method shows a 36% improvement over baseline that learns relatedness strictly based upon verb co-occurrence (PMI is computed between all occurrences of two verbs in the same document, without requiring the verbs to share a common protagonist). The proposed method also shows 25% improvement for temporal ordering over random ordering of the connected events.

An example of an extracted narrative chain (in this case a possible Prosecution chain is shown below, where the arrow indicates the before relation.

## Discussion

This paper addresses the problem of unsupervised extraction of narrative event chains. The paper is interesting and novel in its idea of using a common protagonist to gather a set of related narrative events and temporally order them to form a narrative event chain. The implication of this work is multifolds. For example, this automatic extraction of event chains from documents can be used for automatic template construction and template filling (as in Chambers and Jurafsky, ACL 2010), or to inform better automatic understanding of documents. However, the paper focuses only on just one participant of the narrative event chain. In reality, a chain of events may involve more than one participants. The paper therefore only presents one view point of an event. It will be interesting to see if the method can therefore be used to find and extract different view points of the same event chain from the different participants.

## Related papers

A related paper is Bean and Riloff (2004) Unsupervised Learning of contextual role knowledge for coreference resolution that proposes the use of caseframe networks (pairs of related caseframes (a verb/event and a semantic role) that indicate synonymy or relatedness, for example '<patient> kidnapped' and '<patient> released' are related caseframes). This paper generalizes the use of these caseframes to find entire set of events (verbs) anchored on the same argument rather than just pairs of related frames.

Another related paper is Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations which, similar to this paper, is using pointwise mutual information to automatically find relations between verbs. This work is different in that it uses a protagonist as the indicator of relatedness between the verbs.

Most recent related paper is Chambers and Jurafsky, ACL 2010 which finds clusters of verbs that are sharing arguments to define templates automatically from documents.