Difference between revisions of "TimeBank Corpus"
Line 14: | Line 14: | ||
[[File:TimeBank.png]] | [[File:TimeBank.png]] | ||
+ | |||
+ | == Relevant Papers == | ||
{{#ask: [[UsesDataset::TimeBank Corpus]] | {{#ask: [[UsesDataset::TimeBank Corpus]] |
Latest revision as of 00:56, 28 September 2011
The TimeBank corpus (version 1.2) is a corpus of 186 newswire articles tagged for events, time expressions, and temporal relations between events and times.
Time expressions are tagged using TimeML markup language for temporal and event expressions.
Events are also tagged with temporal information such as tense, modality and grammatical aspect.
Temporal relations between events and times are categorized into 13 classes: before (event A is before event B) and its inverse, ibefore (immediately before) and its inverse, includes (the time of event A includes the time of event B) and its inverse, begins (event A begins event B) and its inverse, ends (event A ends event B) and its inverse, simultaneous (event A and event B happen at the same (i.e. equal) time), overlap (the time of event A overlaps with the time of event B) and its inverse.
The availability of tagged data in this corpus has encouraged numerous machine learning approaches for temporal ordering of events, especially with the introduction of competitive challenge for the task such as TempEval. The contest involves three tasks corresponding to three types of temporal relations: between events and time expressions in a sentence (task A), between events of a document and the document creation time (task B), and between events in consecutive sentences (task C).
The dataset is available at the Linguistic Data Consortium.
Some statistics on the number of relations tagged in the corpus is shown below. Each relation except simultaneous has an inverse relation. The majority class (including its inverse) is 28% of the total tagged relations.