Difference between revisions of "Ancora"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels: * Lemma and Part o…')
 
Line 1: Line 1:
AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
+
AnCora [Category:Dataset] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:
  
 
* Lemma and Part of Speech
 
* Lemma and Part of Speech
Line 12: Line 12:
 
AnCora corpus is mainly based on journalist texts.
 
AnCora corpus is mainly based on journalist texts.
  
The corpus website is [[http://clic.ub.edu/corpus/en]].
+
The corpus website is [http://clic.ub.edu/corpus/en].

Revision as of 20:37, 24 September 2011

AnCora [Category:Dataset] consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts.

The corpus website is [1].