Ling, X. and Weld, D. Temporal Information Extraction. AAAI-10

From Cohen Courses
Jump to navigationJump to search

Reviews of this paper

Citation

Temporal Information Extraction, by X. Ling, D.S. Weld. In Proceedings of the Twenty Fifth National Conference on Artificial Intelligence, 2010.

Online version

This paper is available online [1].

Summary

This paper addresses the problem of Temporal ordering. Unlike other works that focus on time-agnostic extraction of facts, this paper presents TIE, a pipeline system that tries to extract as many facts from text while also inducing as much temporal information as possible. Temporal relations between events and times in a sentence are identified while transitivity is enforced to bind the start and ending time points for each event. The paper also presents temporal entropy as a way to evaluate recall of temporal information extraction systems.

Brief description of the method

The objective of this paper is given a sentence, output (1) a maximal set of temporal elements that are either events (e.g. "officers were dispatched") or temporal references (e.g. "1999") and (2) the tightest set of temporal constraints as directly implied by the text.

The constraints are linear inequalities of the form where is a duration (often zero) and and denotes either the beginning () or ending time point () of a temporal element .

For example, the sentence 'Steve Jobs revealed the iPhone in 2007' might produce these constraints:


The paper introduces a pipeline system called TIE with these components:

  • Preprocessing: (1) parse the sentence using a syntactic parser (the Stanford parser), (2) detect semantic roles for verbs in the sentence using SRL, (3) use TARSQI to find temporal elements (events and times) in the sentence, (4) generate features for each temporal element as well as between elements.
The features include descriptional features (tense, grounded time values, etc) generated for each element by TARSQI and syntactic features (dependency features and SRL features).
  • Classification: use probabilistic model trained on TimeBank Corpus combined with transitivity rules to classify each pair of points ( or ) of elements the point-wise relation.
Markov Logic Networks is used to make predictions. Formula weights are learned from the TimeBank data. Some of the formula templates are as follows:
where the first formula is transitivity and the second formula integrates the temporal information provided by SRL.

Experimental Result

The experiment is conducted on sentences extracted from Wikipedia articles. The ground truth is constructed by asking human subjects to label the temporal relations between all pairs of elements by comparing their start and ending points. Each pair is labeled by at least two people with a third person to resolve any disagreement.

Performance of TIE is measured using precision that measures the number of correct predictions over all predictions made in test data and temporal entropy (TE) that measures the number and "tightness" of the induced temporal bounds. A human or a system can give time bounds to the starting and end points of the element in the text. Temporal entropy of a time point is thus measured as the log of the length of the tightest bound produced for the time point. The lower the entropy, the better (tighter) is the bound.

The precision and TE of TIE are compared to three other systems: (1) PASCA, a pattern-based extraction system for question answering, (2) SRL, (3) TARSQI. In terms of precision, TIE is able to extract more correct constraints at comparable precision to PASCA and SRL. In terms of TE, TIE has temporal entropy that is the closest to a human's skill. From the experiments, it is also observed that SRL feature improves the precision of TIE while transitivity feature improves the recall of the system. On TempEval task, TIE is able to achieve an accuracy of 0.695 which is comparable to that of the current state of the art, 0.716 (Yoshikawa et al. 2009).

Discussion

This paper addresses the problem of temporal information extraction and Temporal ordering by using an assembly of existing systems in a pipeline called TIE to extract temporal elements and their features from text and Markov Logic Networks to classify the temporal constraints between the starting or ending time points of these extracted elements. The system also proposes a measure called temporal entropy to measure the number and the tightness of the constraints produced.

TIE is shown to outperform the other three approaches: PASCA, SRL, TARSQI in terms of precision and temporal entropy.

Despite the good performance results, the contribution of this paper to the area of temporal ordering and temporal information extraction seems rather minimal as it simply uses an array of already existing systems. The idea of using Markov Logic Network for temporal ordering is also not new and it (Yoshikawa et al. 2009) even outperforms the system proposed in this paper in terms of accuracy. The main argument for the paper is that it predicts a different temporal ordering than that of other systems. Instead of classifying temporal relations between elements into a set of classes: BEFORE, AFTER, OVERLAP; this paper classifies the temporal relations between the elements' starting or ending time points.

Related papers

A related paper is Yoshikawa 2009 jointly identifying temporal relations with markov logic that also uses Markov Logic Networks to do a softer (non-deterministic) joint inference for temporal ordering of events and times.

Another related paper is Denis and Muller, Predicting Globally-Coherent Temporal Structures from Texts via Endpoint Inference and Graph Decomposition, IJCAI 2011 which, similar to this paper, attempts to classify all types of temporal relations (not just before/after) in TimeBank Corpus by translating these temporal interval relations to their end points (starting and ending).