Difference between revisions of "Ancora"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
[[Category:dataset]]
+
[[Category::Dataset|dataset]]
 
AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
 
AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
  

Revision as of 20:38, 24 September 2011

dataset AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].