Ancora

From Cohen Courses
Revision as of 19:39, 24 September 2011 by Ysim (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Ancora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].