FBIS corpus

From Cohen Courses
Jump to navigationJump to search

The FBIS corpus is a collection of radio news casts and includes datasets of parallel text in multiple languages. For example, the Chinese-English parallel corpus contains 237.6 million English words and 215.4 million Chinese words.

Foreign Broadcast Information Service (FBIS) was an open source intelligence component of the Central Intelligence Agency's Directorate of Science and Technology. It monitored, translated, and disseminated within the U.S. government openly available news and information from media sources outside the United States.

The LDC catalog number is LDC2003E14.

Relevant Papers