Difference between revisions of "Ancora"
From Cohen Courses
Jump to navigationJump to search (Created page with 'AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels: * Lemma and Part o…') |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | [[Category::Dataset|Ancora]] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels: | |
* Lemma and Part of Speech | * Lemma and Part of Speech | ||
Line 12: | Line 12: | ||
AnCora corpus is mainly based on journalist texts. | AnCora corpus is mainly based on journalist texts. | ||
− | The corpus website is | + | The corpus website is [http://clic.ub.edu/corpus/en]. |
Latest revision as of 19:39, 24 September 2011
Ancora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
- Lemma and Part of Speech
- Syntactic constituents and functions
- Argument structure and thematic roles
- Semantic classes of the verb
- Denotative type of deverbal nouns
- Nouns related to WordNet synsets
- Named Entities
- Coreference relations
AnCora corpus is mainly based on journalist texts.
The corpus website is [1].