Difference between revisions of "Ancora"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels: * Lemma and Part o…')
 
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
+
[[Category::Dataset|Ancora]] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
  
 
* Lemma and Part of Speech
 
* Lemma and Part of Speech
Line 12: Line 12:
 
AnCora corpus is mainly based on journalist texts.
 
AnCora corpus is mainly based on journalist texts.
  
The corpus website is [[http://clic.ub.edu/corpus/en]].
+
The corpus website is [http://clic.ub.edu/corpus/en].

Latest revision as of 20:39, 24 September 2011

Ancora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].