Difference between revisions of "EUROPARL"
From Cohen Courses
Jump to navigationJump to search(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The Europarl parallel corpus is extracted from the proceedings of the [http://www3.europarl.eu.int/omk/omnsapir.so/calendar?APP=CRE&LANGUE=EN European Parliament]. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. | + | The Europarl parallel [[Category::dataset | corpus]] is extracted from the proceedings of the [http://www3.europarl.eu.int/omk/omnsapir.so/calendar?APP=CRE&LANGUE=EN European Parliament]. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. |
The procedure for extracting the parallel data is described in [http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/europarl-mtsummit05.pdf Koehn et all, 2005] | The procedure for extracting the parallel data is described in [http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/europarl-mtsummit05.pdf Koehn et all, 2005] |
Latest revision as of 23:25, 29 September 2011
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.
The procedure for extracting the parallel data is described in Koehn et all, 2005