Difference between revisions of "Penn Treebank"

Latest revision as of 18:09, 30 September 2011

The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure and Part of Speech Tagging for natural language.

For example, the sentence "John loves Mary" will be labelled like the following:

(S (NP (NNP John))
   (VP (VPZ loves)
       (NP (NNP Mary)))
   (. .))

Annotated corpus include:

@@ Line 1: / Line 1: @@
-The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure for natural language.
+The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure and [[Part_of_Speech_Tagging|Part of Speech Tagging]] for natural language.
 == Example ==
@@ Line 14: / Line 14: @@
 == Corpora  ==
 Annotated corpus include:
-* Wall Street Journal;
+* Wall Street Journal (WSJ);
 * The Brown Corpus;
 * Switchboard;