ACE 2005 Dataset
The dataset is available at the Linguistic Data Consortium. The data is taken from a variety of sources and is available for the tasks in the following languages: Arabic, Chinese and English.
Four versions of each document are provided:
- Source text files (.sgm): All source files, including the Chinese files, are encoded in UTF-8.
- APF files (.apf.xml): The ACE Program Format.
- AG files (.ag.xml): The LDC Annotation Graph Format.
- TABLE files (.tab): Files that store mapping tables between the IDs used in each ag.xml file and their corresponding
The detailed statistics for the training portion of this corpus are as follows: