Difference between revisions of "Information Extraction 10-707 in Fall 2010"

From Cohen Courses
Jump to navigationJump to search
m (1 revision)
 
(4 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
* Course Number: MLD 10-707, cross-listed in LTI as 11-748
 
* Course Number: MLD 10-707, cross-listed in LTI as 11-748
 
* Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
 
* Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
* TA: there is no TA for this course
+
* TA: None
 
* Syllabus: [[Syllabus for Information_Extraction 10-707 in Fall 2010]]
 
* Syllabus: [[Syllabus for Information_Extraction 10-707 in Fall 2010]]
* Office hours: TBA
+
* Office hours: 1-2pm Thursdays, starting 9/23.
  
 
== Description ==
 
== Description ==
Line 41: Line 41:
  
 
* [[Bibliography for Information Extraction 10-707 in Fall 2010]]
 
* [[Bibliography for Information Extraction 10-707 in Fall 2010]]
 +
 +
== Readings ==
 +
 +
Unless there's announcement to the contrary, required readings should be done '''before''' the class.  After you've done the reading you should either ask or answer a question on the [http://goo.gl/mod/mSsZ Google Moderator page for the class]:
 +
 +
http://goo.gl/mod/mSsZ
  
 
== Grading ==
 
== Grading ==
  
 
Grades are based on
 
Grades are based on
* The class project (50% - including the presentation and the writeup).
+
* The class project
* The paper presentation (20%).
+
* The paper presentation
* The paper summaries submitted throughout the course (20%).
+
* Contributions to the wiki
* Class participation (10%).
+
* Class participation

Latest revision as of 10:40, 20 September 2010

Instructor and Venue

  • Instructor: William Cohen, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich, sharonw+@cs.cmu.edu, 412-268-5196
  • When/where: Mon/Wed 1:30-2:50, Gates 4101.
  • Course Number: MLD 10-707, cross-listed in LTI as 11-748
  • Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
  • TA: None
  • Syllabus: Syllabus for Information_Extraction 10-707 in Fall 2010
  • Office hours: 1-2pm Thursdays, starting 9/23.

Description

Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.

This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.

Readings will be based on research papers. Grades will be based on class participation, paper presentations, and a project. More specifically, students will be expected to:

Syllabus

I plan that the Fall 2010 course will spend about a third of the time covering various techniques for structured learning, a third of the time covering semi-supervised/bootstrapping methods, and the remainder on a wider variety of machine-learning methods that have been applied to information extraction.

Older syllabi:

Bibliography

Readings

Unless there's announcement to the contrary, required readings should be done before the class. After you've done the reading you should either ask or answer a question on the Google Moderator page for the class:

http://goo.gl/mod/mSsZ

Grading

Grades are based on

  • The class project
  • The paper presentation
  • Contributions to the wiki
  • Class participation