Difference between revisions of "Penn Treebank"
From Cohen Courses
Jump to navigationJump to search(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure for natural language. | + | The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure and [[Part_of_Speech_Tagging|Part of Speech Tagging]] for natural language. |
== Example == | == Example == | ||
Line 14: | Line 14: | ||
== Corpora == | == Corpora == | ||
Annotated corpus include: | Annotated corpus include: | ||
− | * Wall Street Journal; | + | * Wall Street Journal (WSJ); |
* The Brown Corpus; | * The Brown Corpus; | ||
* Switchboard; | * Switchboard; |
Latest revision as of 17:09, 30 September 2011
The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.
Contents
Example
For example, the sentence "John loves Mary" will be labelled like the following:
(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .))
POS tags
Corpora
Annotated corpus include:
- Wall Street Journal (WSJ);
- The Brown Corpus;
- Switchboard;
- ATIS