Difference between revisions of "Penn Treebank"

Revision as of 18:06, 30 September 2011

The Penn Treebank Project is the first large-scale treebank dataset annotates phrase structure for natural language.

For example, the sentence "John loves Mary" will be labelled like the following:

(S (NP (NNP John))
   (VP (VPZ loves)
       (NP (NNP Mary)))
   (. .))

Annotated corpus include:

@@ Line 1: / Line 1: @@
-The Penn Treebank Project is a [[Category::dataset]] annotates naturally-occuring text for linguistic structure.
+The Penn Treebank Project is the first large-scale treebank [[Category::dataset]] annotates phrase structure for natural language.
+== Example ==
+For example, the sentence "''John loves Mary''" will be labelled like the following:
+ (S (NP (NNP John))
+    (VP (VPZ loves)
+        (NP (NNP Mary)))
+    (. .))
+== POS tags ==
+[ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz format]
+== Corpora  ==
+Annotated corpus include:
+* Wall Street Journal;
+* The Brown Corpus;
+* Switchboard;
+* ATIS
 == Relevant Papers ==