Ancora

From Cohen Courses

Revision as of 19:38, 24 September 2011 by Ysim (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Dataset AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

Lemma and Part of Speech
Syntactic constituents and functions
Argument structure and thematic roles
Semantic classes of the verb
Denotative type of deverbal nouns
Nouns related to WordNet synsets
Named Entities
Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].

Retrieved from "http://curtis.ml.cmu.edu/w/courses/index.php?title=Ancora&oldid=6688"

Navigation menu