TimeBank Corpus

From Cohen Courses
Revision as of 00:35, 28 September 2011 by Dwijaya (talk | contribs) (Created page with 'The TimeBank [[Category::Dataset|corpus]] is a corpus of 186 newswire articles tagged for events, time expressions (tagged using [http://timeml.org/site/index.html TimeML] markup…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The TimeBank corpus is a corpus of 186 newswire articles tagged for events, time expressions (tagged using TimeML markup language), and temporal relations between events and times.

Events are also tagged with temporal information such as tense, modality and grammatical aspect.




addresses five primary tasks – the recognition of entities, values, temporal expressions, relations, and events.

The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.

Four versions of each document are provided:

  • Source text files (.sgm): All source files, including the Chinese files, are encoded in UTF-8.
  • APF files (.apf.xml): The ACE Program Format.
  • AG files (.ag.xml): The LDC Annotation Graph Format.
  • TABLE files (.tab): Files that store mapping tables between the IDs used in each ag.xml file and their corresponding

apf.xml file.

The detailed statistics for the training portion of this corpus are as follows:

ACE05-1.png

External Link