Difference between revisions of "TimeBank Corpus"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'The TimeBank [[Category::Dataset|corpus]] is a corpus of 186 newswire articles tagged for events, time expressions (tagged using [http://timeml.org/site/index.html TimeML] markup…')
 
Line 1: Line 1:
The TimeBank [[Category::Dataset|corpus]] is a corpus of 186 newswire articles tagged for events, time expressions (tagged using [http://timeml.org/site/index.html TimeML] markup language), and temporal relations between events and times.  
+
The TimeBank [[Category::Dataset|corpus]] (version 1.2) is a corpus of 186 newswire articles tagged for events, time expressions, and temporal relations between events and times.
 +
 
 +
Time expressions are tagged using [http://timeml.org/site/index.html TimeML] markup language for temporal and event expressions.
  
 
Events are also tagged with temporal information such as tense, modality and grammatical aspect.
 
Events are also tagged with temporal information such as tense, modality and grammatical aspect.
  
 +
Temporal relations between events and times are categorized into 13 classes: ''before'' (event A is before event B) and its inverse, ''ibefore'' (immediately before) and its inverse, ''includes'' (the time of event A includes the time of event B) and its inverse, ''begins'' (event A begins event B) and its inverse, ''ends'' (event A ends event B) and its inverse, ''simultaneous'' (event A and event B happen at the same (i.e. equal) time), ''overlap'' (the time of event A overlaps with the time of event B) and its inverse.
  
 +
The availability of tagged data in this corpus has encouraged numerous machine learning approaches for temporal ordering of events, especially with the introduction of competitive challenge such as [http://www.timeml.org/tempeval/ TempEval]. The contest involves three tasks corresponding to three types of temporal relations: between events and time expressions in a sentence (task A), between events of a document and the document creation time (task B), and between events in consecutive sentences (task C).
  
 +
The dataset is available at the [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08 Linguistic Data Consortium].
  
 +
Some statistics on the number of each type of relations tagged in the corpus is shown below. Each relation  has an inverse relation (not shown in the table) except simultaneous.
  
 +
[[File:TimeBank.png]]
  
addresses five primary tasks – the recognition of entities, values, temporal expressions, [[Relation Extraction|relations]], and events.
+
{{#ask: [[UsesDataset::TimeBank Corpus]]
 
 
The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.
 
 
 
Four versions of each document are provided:
 
* Source text files (.sgm): All source files, including the Chinese files, are encoded in UTF-8.
 
* APF files (.apf.xml): The ACE Program Format.
 
* AG files (.ag.xml): The LDC Annotation Graph Format.
 
* TABLE files (.tab): Files that store mapping tables between the IDs used in each ag.xml file and their corresponding
 
apf.xml file.
 
 
 
The detailed statistics for the training portion of this corpus are as follows:
 
 
 
[[File:ACE05-1.png]]
 
 
 
[http://www.itl.nist.gov/iad/mig//tests/ace/2005/ External Link]
 
 
 
{{#ask: [[UsesDataset::ACE 2005 dataset]]
 
 
| ?AddressesProblem
 
| ?AddressesProblem
 
| ?UsesDataset
 
| ?UsesDataset
 
}}
 
}}

Revision as of 00:34, 28 September 2011

The TimeBank corpus (version 1.2) is a corpus of 186 newswire articles tagged for events, time expressions, and temporal relations between events and times.

Time expressions are tagged using TimeML markup language for temporal and event expressions.

Events are also tagged with temporal information such as tense, modality and grammatical aspect.

Temporal relations between events and times are categorized into 13 classes: before (event A is before event B) and its inverse, ibefore (immediately before) and its inverse, includes (the time of event A includes the time of event B) and its inverse, begins (event A begins event B) and its inverse, ends (event A ends event B) and its inverse, simultaneous (event A and event B happen at the same (i.e. equal) time), overlap (the time of event A overlaps with the time of event B) and its inverse.

The availability of tagged data in this corpus has encouraged numerous machine learning approaches for temporal ordering of events, especially with the introduction of competitive challenge such as TempEval. The contest involves three tasks corresponding to three types of temporal relations: between events and time expressions in a sentence (task A), between events of a document and the document creation time (task B), and between events in consecutive sentences (task C).

The dataset is available at the Linguistic Data Consortium.

Some statistics on the number of each type of relations tagged in the corpus is shown below. Each relation has an inverse relation (not shown in the table) except simultaneous.

TimeBank.png