Difference between revisions of "Structured Prediction 10-710 in Fall 2011"

From Cohen Courses
Jump to navigationJump to search
Line 74: Line 74:
 
Here are sample pages for [[User:Wcohen|William]], [[User:Nasmith|Noah]], and [[User:Brendan|Brendan]].
 
Here are sample pages for [[User:Wcohen|William]], [[User:Nasmith|Noah]], and [[User:Brendan|Brendan]].
  
== Possible Projects ==
+
== Projects ==
  
If you have an idea for a possible project, list it here, as William has done in the example.  This is for coordination and brainstorming at this point. You probably want to include your name and the names of your team-mates in the project description.
+
* [[Automated Template Extraction]] - [[User:Fkeith|Francis Keith]], [[User:amr1|Andrew Rodriguez]]
 +
* [[Project:Tweet | Finding out who you are from where, when, what and with whom you tweet]] - [[User:Dwijaya|Derry Wijaya]], [[User:taruns|Tarun Sharma]]
 +
* [[Including a knowledge base into Haghighi & Klein's coreference resolution system]] - [[User:Mg1|Matt Gardner]]
 +
* [[Information_Extraction_to_Predict_Judgement|Relevant Information Extraction from Court-room Hearings To Predict Judgement]] - [[User:manajs|Manaj Srivastava]], [[User:mridulg|Mridul Gupta]]
  
* [[Wikipedia Infobox Generator Using Cross Lingual Unstructured Text]] - [[User:Daegunw|Daegun Won]] and [[User:Aanavas|Tony Navas]]
 
* [[Including a knowledge base into Haghighi & Klein's coreference resolution system]] - [[User:Mg1|Matt Gardner]] '''Looking for another team member, because Avinava dropped.'''
 
 
* [[Stylistic Structure in Historic Legal Text|Stylistic Structure Extraction from Early United States Slave-related Legal Opinions]] [[User:Yww|William Y. Wang]] and [[User:Emayfiel|Elijah Mayfield]]
 
* [[Stylistic Structure in Historic Legal Text|Stylistic Structure Extraction from Early United States Slave-related Legal Opinions]] [[User:Yww|William Y. Wang]] and [[User:Emayfiel|Elijah Mayfield]]
* [[Semi-supervised Generation of Wikipedia Infoboxes]] - [[User:wpang|Wangshu Pang]] and [[User:Yunwang|Yun Wang]]
 
 
* [[Word Alignments using an HMM-based model]] - [[User:Lingwang|Wang Ling]] and [[User:Ruipedrocorreia|Rui Correia]]
 
* [[Word Alignments using an HMM-based model]] - [[User:Lingwang|Wang Ling]] and [[User:Ruipedrocorreia|Rui Correia]]
 
* [[Improving SMT word alignment with binary feedback]] - [[User:Asaluja|Avneesh Saluja]]
 
* [[Improving SMT word alignment with binary feedback]] - [[User:Asaluja|Avneesh Saluja]]
 +
* [[Linearizing Dependency Trees]] - [[User:Jmflanig| Jeff Flanigan]]
 +
 +
* [[Wikipedia Infobox Generator Using Cross Lingual Unstructured Text]] - [[User:Daegunw|Daegun Won]] and [[User:Aanavas|Tony Navas]]
 +
* [[Semi-supervised Generation of Wikipedia Infoboxes]] - [[User:wpang|Wangshu Pang]] and [[User:Yunwang|Yun Wang]]
 
* [[Building domain specific NERs by using information from domain-general annotations]] - [[User:Junyangn|Junyang Ng]], [[User:Ysim| Yan Chuan Sim]], [[User:Cheuktol|Kelvin Law]]
 
* [[Building domain specific NERs by using information from domain-general annotations]] - [[User:Junyangn|Junyang Ng]], [[User:Ysim| Yan Chuan Sim]], [[User:Cheuktol|Kelvin Law]]
* [[Information_Extraction_to_Predict_Judgement|Relevant Information Extraction from Court-room Hearings To Predict Judgement]] - [[User:manajs|Manaj Srivastava]], [[User:mridulg|Mridul Gupta]]
+
* [[Automatic Segmentation of Receipts]] - [[User:howarth | Dan Howarth]]
 
* [[Project:Dmovshov_abbreviations | Identifying Abbreviations in Biomedical Text]] - [[User:Dmovshov|Dana Movshovitz-Attias]]
 
* [[Project:Dmovshov_abbreviations | Identifying Abbreviations in Biomedical Text]] - [[User:Dmovshov|Dana Movshovitz-Attias]]
* [[Project:Tweet | Finding out who you are from where, when, what and with whom you tweet]] - [[User:Dwijaya|Derry Wijaya]], [[User:taruns|Tarun Sharma]]
+
* [[Project:Learning_Indian_Classical_Using_Sequential_Models| Learning Indian Classical Music Using Sequential Models]] - [[User:dkulkarn|Dhananjay Kulkarni]], [[User:tkumar|Tarun Kumar]]
* [[Automated Template Extraction]] - [[User:Fkeith|Francis Keith]], [[User:amr1|Andrew Rodriguez]]
 
* [[Linearizing Dependency Trees]] - [[User:Jmflanig| Jeff Flanigan]]
 
* [[Automatic Segmentation of Receipts]] - [[User:howarth | Dan Howarth]]
 
  
 
* [[Mapping entity names in a document to places on a map]].
 
* [[Mapping entity names in a document to places on a map]].
 
* Automatically generating headings for sections (group of contiguous paragraph) in unstructured text  
 
* Automatically generating headings for sections (group of contiguous paragraph) in unstructured text  
* [[Project:Learning_Indian_Classical_Using_Sequential_Models| Learning Indian Classical Music Using Sequential Models]] - [[User:dkulkarn|Dhananjay Kulkarni]], [[User:tkumar|Tarun Kumar]]
 
  
 
In general, a nice way to find already-made datasets is to read papers in the literature and see what they use and reference.  A few data ideas: [[Project Brainstorming for 10-710 in Fall 2011/Some data ideas]]
 
In general, a nice way to find already-made datasets is to read papers in the literature and see what they use and reference.  A few data ideas: [[Project Brainstorming for 10-710 in Fall 2011/Some data ideas]]

Revision as of 15:22, 27 September 2011

Instructor and Venue

  • Instructors: William Cohen and Noah Smith, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich, sharonw+@cs.cmu.edu, 412-268-5196
  • When/where: Tues-Thursday 3:00-4:20 in Gates-Hillman 4211
  • Course Number: ML 10-710 and LTI 11-763
  • Prerequisites: a machine learning course (e.g., 10-701 or 10-601) or consent of the instructor.
  • TA: Brendan O'Connor
  • Syllabus: Syllabus for Structured Prediction 10-710 in Fall 2011
  • Office hours:
    • Noah, GHC 5723, Thursdays 4:30-5:30 (starting 9/8)
    • Brendan, GHC 8005, Tuesdays 4:30-5:30
    • William, GHC 8217, Fridays 11:00-12:00 (starting 9/16)

Description

This course seeks to cover statistical modeling techniques for discrete, structured data such as text. It brings together content previously covered in Language and Statistics 2 (11-762) and Information Extraction (10-707 and 11-748), and aims to define a canonical set of models and techniques applicable to problems in natural language processing, information extraction, and other application areas. Upon completion, students will have a broad understanding of machine learning techniques for structured outputs, will be able to develop appropriate algorithms for use in new research, and will be able to critically read related literature. The course is organized around methods, with example tasks introduced throughout.

The prerequisite is Machine Learning (10-601 or 10-701), or permission of the instructors.

Syllabus

Older syllabi:

Readings

Unless there's announcement to the contrary, required readings should be done before the class.

Grading

Grades are based on

  • The class project
    • Choose teams and a general project topic. (This can change in the coming weeks/month.) Create a team wiki page, add its members and the project topic. Every team member then should link to it from their own user homepage.
  • Wiki writeup assignments
  • Class participation

Attendees

People taking this class in Fall 2011 include:

Here are sample pages for William, Noah, and Brendan.

Projects

In general, a nice way to find already-made datasets is to read papers in the literature and see what they use and reference. A few data ideas: Project Brainstorming for 10-710 in Fall 2011/Some data ideas