Information Extraction 10-707 in Fall 2009

From Cohen Courses
Revision as of 11:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Instructor and Venue

Note this is the fall 2009 version of the course, i.e., it's over!

  • Instructor: William Cohen, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich,, 412-268-5196
  • When/where: Mon/Wed 1:30-2:50, Gates 5222. (That's in the middle of the spiral ramp.)
  • Course Number: MLD 10-707, cross-listed in LTI as 11-748
  • Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
  • TA: there is no TA for this course
  • Syllabus: Syllabus for Information_Extraction 10-707 in Fall 2009
  • Office hours: Thus 11:30-12:30 or by appointment.


Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.

This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.

Readings will be based on research papers. Grades will be based on class participation, paper presentations, and a project. More specifically, students will be expected to:


I plan that the Fall 2009 course will spend about half of the time covering various techniques for structured learning, and the remainder on a wider variety of machine-learning methods that have been applied to information extraction.

Older syllabi:



Grades are based on

  • The class project (50% - including the presentation and the writeup).
  • The paper presentation (20%).
  • The paper summaries submitted throughout the course (20%).
  • Class participation (10%).