Penn Treebank

From Cohen Courses

Revision as of 18:09, 30 September 2011 by Wpang (talk | contribs) (→‎Corpora)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.

Contents

1 Example
2 POS tags
3 Corpora
4 Relevant Papers

Example

For example, the sentence "John loves Mary" will be labelled like the following:

(S (NP (NNP John))
   (VP (VPZ loves)
       (NP (NNP Mary)))
   (. .))

POS tags

Corpora

Annotated corpus include:

Wall Street Journal (WSJ);
The Brown Corpus;
Switchboard;
ATIS

Relevant Papers

Retrieved from "http://curtis.ml.cmu.edu/w/courses/index.php?title=Penn_Treebank&oldid=8531"

Navigation menu