Difference between revisions of "Penn Treebank"
From Cohen Courses
Jump to navigationJump to searchLine 1: | Line 1: | ||
− | The Penn Treebank Project is | + | The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure for natural language. |
+ | |||
+ | == Example == | ||
+ | For example, the sentence "''John loves Mary''" will be labelled like the following: | ||
+ | (S (NP (NNP John)) | ||
+ | (VP (VPZ loves) | ||
+ | (NP (NNP Mary))) | ||
+ | (. .)) | ||
+ | |||
+ | == POS tags == | ||
+ | |||
+ | [ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz format] | ||
+ | |||
+ | == Corpora == | ||
+ | Annotated corpus include: | ||
+ | * Wall Street Journal; | ||
+ | * The Brown Corpus; | ||
+ | * Switchboard; | ||
+ | * ATIS | ||
== Relevant Papers == | == Relevant Papers == |
Revision as of 17:06, 30 September 2011
The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure for natural language.
Contents
Example
For example, the sentence "John loves Mary" will be labelled like the following:
(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .))
POS tags
Corpora
Annotated corpus include:
- Wall Street Journal;
- The Brown Corpus;
- Switchboard;
- ATIS