Difference between revisions of "ACE 2005 Dataset"

From Cohen Courses
Jump to navigationJump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
The ACE 2005 [[Category::Dataset|dataset]] addresses five primary tasks – the recognition of entities, values, temporal expressions, relations, and events.  
+
The ACE 2005 [[Category::Dataset|dataset]] addresses five primary tasks – the recognition of entities, values, temporal expressions, [[Relation Extraction|relations]], and events.  
  
 
The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.
 
The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.
Line 12: Line 12:
 
The detailed statistics for the training portion of this corpus are as follows:
 
The detailed statistics for the training portion of this corpus are as follows:
  
[[File:Ace05-1.png]]
+
[[File:ACE05-1.png]]
  
 
[http://www.itl.nist.gov/iad/mig//tests/ace/2005/ External Link]
 
[http://www.itl.nist.gov/iad/mig//tests/ace/2005/ External Link]

Latest revision as of 01:23, 28 September 2010

The ACE 2005 dataset addresses five primary tasks – the recognition of entities, values, temporal expressions, relations, and events.

The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.

Four versions of each document are provided:

  • Source text files (.sgm): All source files, including the Chinese files, are encoded in UTF-8.
  • APF files (.apf.xml): The ACE Program Format.
  • AG files (.ag.xml): The LDC Annotation Graph Format.
  • TABLE files (.tab): Files that store mapping tables between the IDs used in each ag.xml file and their corresponding

apf.xml file.

The detailed statistics for the training portion of this corpus are as follows:

ACE05-1.png

External Link