FBIS corpus

From Cohen Courses
Revision as of 02:52, 2 November 2011 by Aanavas (talk | contribs)
Jump to navigationJump to search

The FBIS corpus is a collection of radio news casts and includes datasets of parallel text in multiple languages. For example, the Chinese-English parallel corpus contains 237.6 million English words and 215.4 million Chinese words.

Foreign Broadcast Information Service (FBIS) was an open source intelligence component of the Central Intelligence Agency's Directorate of Science and Technology. It monitored, translated, and disseminated within the U.S. government openly available news and information from media sources outside the United States.

Relevant Papers