From Cohen CoursesJump to navigationJump to search
The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.
For example, the sentence "John loves Mary" will be labelled like the following:
(S (NP (NNP John)) (VP (VPZ loves) (NP (NNP Mary))) (. .))
Annotated corpus include:
- Wall Street Journal (WSJ);
- The Brown Corpus;