Difference between revisions of "TimeBank Corpus"

From Cohen Courses
Jump to navigationJump to search
Line 11: Line 11:
 
The dataset is available at the [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08 Linguistic Data Consortium].
 
The dataset is available at the [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08 Linguistic Data Consortium].
  
Some statistics on the number of each type of relations tagged in the corpus is shown below. Each relation except simultaneous has an inverse relation. The majority class (including its inverse) is 28% of the total tagged relations.
+
Some statistics on the number of relations tagged in the corpus is shown below. Each relation except simultaneous has an inverse relation. The majority class (including its inverse) is 28% of the total tagged relations.
  
 
[[File:TimeBank.png]]
 
[[File:TimeBank.png]]

Revision as of 00:45, 28 September 2011

The TimeBank corpus (version 1.2) is a corpus of 186 newswire articles tagged for events, time expressions, and temporal relations between events and times.

Time expressions are tagged using TimeML markup language for temporal and event expressions.

Events are also tagged with temporal information such as tense, modality and grammatical aspect.

Temporal relations between events and times are categorized into 13 classes: before (event A is before event B) and its inverse, ibefore (immediately before) and its inverse, includes (the time of event A includes the time of event B) and its inverse, begins (event A begins event B) and its inverse, ends (event A ends event B) and its inverse, simultaneous (event A and event B happen at the same (i.e. equal) time), overlap (the time of event A overlaps with the time of event B) and its inverse.

The availability of tagged data in this corpus has encouraged numerous machine learning approaches for temporal ordering of events, especially with the introduction of competitive challenge for the task such as TempEval. The contest involves three tasks corresponding to three types of temporal relations: between events and time expressions in a sentence (task A), between events of a document and the document creation time (task B), and between events in consecutive sentences (task C).

The dataset is available at the Linguistic Data Consortium.

Some statistics on the number of relations tagged in the corpus is shown below. Each relation except simultaneous has an inverse relation. The majority class (including its inverse) is 28% of the total tagged relations.

TimeBank.png