Information Extraction 10-707 in Fall 2010

From Cohen Courses
Revision as of 13:05, 21 July 2010 by Wcohen (talk | contribs) (→‎Bibliography)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Instructor and Venue

  • Instructor: William Cohen, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich, sharonw+@cs.cmu.edu, 412-268-5196
  • When/where: Mon/Wed 1:30-2:50, Gates 4101.
  • Course Number: MLD 10-707, cross-listed in LTI as 11-748
  • Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
  • TA: there is no TA for this course
  • Syllabus: Syllabus for Information_Extraction 10-707 in Fall 2010
  • Office hours: TBA

Description

Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.

This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.

Readings will be based on research papers. Grades will be based on class participation, paper presentations, and a project. More specifically, students will be expected to:

Syllabus

I plan that the Fall 2010 course will spend about a third of the time covering various techniques for structured learning, a third of the time covering semi-supervised/bootstrapping methods, and the remainder on a wider variety of machine-learning methods that have been applied to information extraction.

Older syllabi:

Bibliography

Grading

Grades are based on

  • The class project (50% - including the presentation and the writeup).
  • The paper presentation (20%).
  • The paper summaries submitted throughout the course (20%).
  • Class participation (10%).