Difference between revisions of "Penn Treebank"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'stub')
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
stub
+
The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure and [[Part_of_Speech_Tagging|Part of Speech Tagging]] for natural language.
 +
 
 +
== Example ==
 +
For example, the sentence "''John loves Mary''" will be labelled like the following:
 +
(S (NP (NNP John))
 +
    (VP (VPZ loves)
 +
        (NP (NNP Mary)))
 +
    (. .))
 +
 
 +
== POS tags ==
 +
 
 +
[ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz format]
 +
 
 +
== Corpora  ==
 +
Annotated corpus include:
 +
* Wall Street Journal (WSJ);
 +
* The Brown Corpus;
 +
* Switchboard;
 +
* ATIS
 +
 
 +
== Relevant Papers ==
 +
 
 +
{{#ask: [[UsesDataset::Penn_Treebank]]
 +
| ?AddressesProblem
 +
| ?UsesMethod
 +
}}

Latest revision as of 18:09, 30 September 2011

The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.

Example

For example, the sentence "John loves Mary" will be labelled like the following:

(S (NP (NNP John))
   (VP (VPZ loves)
       (NP (NNP Mary)))
   (. .))

POS tags

format

Corpora

Annotated corpus include:

  • Wall Street Journal (WSJ);
  • The Brown Corpus;
  • Switchboard;
  • ATIS

Relevant Papers