Difference between revisions of "Ancora"

From Cohen Courses
Jump to navigationJump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
AnCora [Category:Dataset] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
+
[[Category::Dataset|Ancora]] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
  
 
* Lemma and Part of Speech
 
* Lemma and Part of Speech

Latest revision as of 20:39, 24 September 2011

Ancora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].