Difference between revisions of "Penn Treebank"
From Cohen Courses
Jump to navigationJump to search (Created page with 'stub') |
|||
| (3 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | + | The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure and [[Part_of_Speech_Tagging|Part of Speech Tagging]] for natural language. | |
| + | |||
| + | == Example == | ||
| + | For example, the sentence "''John loves Mary''" will be labelled like the following: | ||
| + | (S (NP (NNP John)) | ||
| + | (VP (VPZ loves) | ||
| + | (NP (NNP Mary))) | ||
| + | (. .)) | ||
| + | |||
| + | == POS tags == | ||
| + | |||
| + | [ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz format] | ||
| + | |||
| + | == Corpora == | ||
| + | Annotated corpus include: | ||
| + | * Wall Street Journal (WSJ); | ||
| + | * The Brown Corpus; | ||
| + | * Switchboard; | ||
| + | * ATIS | ||
| + | |||
| + | == Relevant Papers == | ||
| + | |||
| + | {{#ask: [[UsesDataset::Penn_Treebank]] | ||
| + | | ?AddressesProblem | ||
| + | | ?UsesMethod | ||
| + | }} | ||
Latest revision as of 17:09, 30 September 2011
The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.
Contents
Example
For example, the sentence "John loves Mary" will be labelled like the following:
(S (NP (NNP John))
(VP (VPZ loves)
(NP (NNP Mary)))
(. .))
POS tags
Corpora
Annotated corpus include:
- Wall Street Journal (WSJ);
- The Brown Corpus;
- Switchboard;
- ATIS