EUROPARL

From Cohen Courses
Revision as of 00:25, 30 September 2011 by Lingwang (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

The procedure for extracting the parallel data is described in Koehn et all, 2005