Information Extraction 10-707 in Fall 2010

From Cohen Courses
Jump to navigationJump to search

Instructor and Venue

  • Instructor: William Cohen, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich,, 412-268-5196
  • When/where: Mon/Wed 1:30-2:50, Gates 4101.
  • Course Number: MLD 10-707, cross-listed in LTI as 11-748
  • Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
  • TA: None
  • Syllabus: Syllabus for Information_Extraction 10-707 in Fall 2010
  • Office hours: 1-2pm Thursdays, starting 9/23.


Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.

This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.

Readings will be based on research papers. Grades will be based on class participation, paper presentations, and a project. More specifically, students will be expected to:


I plan that the Fall 2010 course will spend about a third of the time covering various techniques for structured learning, a third of the time covering semi-supervised/bootstrapping methods, and the remainder on a wider variety of machine-learning methods that have been applied to information extraction.

Older syllabi:



Unless there's announcement to the contrary, required readings should be done before the class. After you've done the reading you should either ask or answer a question on the Google Moderator page for the class:


Grades are based on

  • The class project
  • The paper presentation
  • Contributions to the wiki
  • Class participation