Difference between revisions of "Machine Learning 10-601 in Fall 2013"

From Cohen Courses
Jump to navigationJump to search
 
(26 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
== Important Announcements ==
 +
 +
* Important announcements will be made here as well as on Piazza.
 +
 
== Important People and Places ==
 
== Important People and Places ==
  
Line 7: Line 11:
 
* Course Number: ML 10-601
 
* Course Number: ML 10-601
 
* TAs and recitation schedule:
 
* TAs and recitation schedule:
** Guanyu Wang (wgiveny@gmail.com, guanyuw@andrew), recitation: Mon. 6pm-7:30pm Porter Hall A18C
+
** Guanyu Wang (wgiveny@gmail.com, guanyuw@andrew), recitation: Mon. 6:30pm-7:30pm Porter Hall A18C
** William Yang Wang (ww@cmu.edu, yww@andrew), recitation: Tue. 5pm-6pm Porter Hall A18A
+
** William Yang Wang (ww@cmu.edu, yww@andrew), recitation: Tue. 5pm-6pm Gates 4215
 
** Shu-Hao Yu (shuhaoy@gmail.com, shuhaoy@andrew), recitation: Wed. 6:30pm-7:30pm Wean 5403
 
** Shu-Hao Yu (shuhaoy@gmail.com, shuhaoy@andrew), recitation: Wed. 6:30pm-7:30pm Wean 5403
 
** Avinava Dubey (akdubey@andrew.cmu.edu), recitation: Thu. 5pm-6pm Porter Hall A18C
 
** Avinava Dubey (akdubey@andrew.cmu.edu), recitation: Thu. 5pm-6pm Porter Hall A18C
Line 15: Line 19:
 
** Ying Shen (yingshen@andrew.cmu.edu), recitation leader-at-large
 
** Ying Shen (yingshen@andrew.cmu.edu), recitation leader-at-large
 
** ''Recitations will start after Sept 4''
 
** ''Recitations will start after Sept 4''
* Syllabus: [[Syllabus for Machine Learning  10-601]]
+
* Syllabus (including lecture slides and HWs): [[Syllabus for Machine Learning  10-601]]
 
* On-line lectures: [https://mediatech-stream.andrew.cmu.edu/Mediasite/Catalog/Full/05b468afef1d433d9f63ca39bf040f4a21 MediaSite] will post within 24 hrs of lecture, use your Andrew id to log in.
 
* On-line lectures: [https://mediatech-stream.andrew.cmu.edu/Mediasite/Catalog/Full/05b468afef1d433d9f63ca39bf040f4a21 MediaSite] will post within 24 hrs of lecture, use your Andrew id to log in.
 
* Office hours for William and Eric:
 
* Office hours for William and Eric:
** TBD
+
** William and Eric will hold office hours in DH 2315 immediately after class from 5:50 to 6:30pm. (I'm told the room is free until 7pm).  Typically Eric will have office hours Monday and William on Wed.
* We'll be using autolab for most assignments.
+
* We'll be using BlackBoard and Autolab for most assignments.
 +
* We've set up a [https://piazza.com/class/hjrdb0ci34531x Piazza page] for questions of general interest.
  
  
 
''For instructors only'':  
 
''For instructors only'':  
* The autolab directory is /afs/cs/academic/class/10601-f13/autolab'
+
* The autolab directory is /afs/cs/academic/class/10601-f13/autolab - you need to be in the right pts group to access it, ask wcohen if you don't.
 +
* '''New''': Save backup materials - eg handout .tex files, autolab scripts, etc - in /afs/cs.cmu.edu/academic/class/10601
 
* To-do lists and such are on  [https://docs.google.com/spreadsheet/ccc?key=0AqbWt5nnjNrYdEFheHNkVHRrWnRncV9fN2VST0VvR1E&usp=sharing| our GDoc spreadsheet]."
 
* To-do lists and such are on  [https://docs.google.com/spreadsheet/ccc?key=0AqbWt5nnjNrYdEFheHNkVHRrWnRncV9fN2VST0VvR1E&usp=sharing| our GDoc spreadsheet]."
  
Line 38: Line 44:
 
10-601 is open to all but is recommended for CS Seniors & Juniors, Quantitative Masters students, and non-SCS PhD students.
 
10-601 is open to all but is recommended for CS Seniors & Juniors, Quantitative Masters students, and non-SCS PhD students.
  
== Syllabus ==
+
== Syllabus and Text ==
 +
Syllabus for Machine Learning  10-601, including lecture slides and HWs
  
 
* [[Syllabus for Machine Learning  10-601]]
 
* [[Syllabus for Machine Learning  10-601]]
Line 45: Line 52:
  
 
* http://www.cs.cmu.edu/~roni/10601/
 
* http://www.cs.cmu.edu/~roni/10601/
 +
 +
No texts are required, but these are recommended:
 +
* [http://www.amazon.com/Machine-Learning-Probabilistic-Perspective-Computation-ebook/dp/B00AF1AYTQ Kevin Murphy's textbook, Machine Learning: A Probabilistic Perspective]. 
 +
*  [http://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/0070428077 Tom Mitchell's textbook, Machine Learning].  Older but clear and easy to read.
 +
 +
Most lectures will have readings suggested from both Murphy and Mitchell, and you can read either of these to get the necessary material.  Reading both is not required.  Mitchell doesn't cover all the topics in the course, but when it doesn't we will suggest other on-line materials.
  
 
== Prerequisites ==
 
== Prerequisites ==
Line 54: Line 67:
  
 
Self-assessment for students:
 
Self-assessment for students:
* Students, especially graduate students, come to CMU with a variety of different backgrounds, so formal course prereqs are hard to establish.  There is a short  [http://www.cs.cmu.edu/~wcohen/10-601/Intro_ML_Self_Evaluation.pdf self-assessment test] to see if you have the necessary background for 10-601.  We recommend that all students take this before enrolling in 10-601 to see if they have the necessary background knowledge already, or if they need to review and/or take additional courses.
+
* Students, especially graduate students, come to CMU with a variety of different backgrounds, so formal course prereqs are hard to establish.  There is a short  [http://www.cs.cmu.edu/~wcohen/10-601/self-assessment/Intro_ML_Self_Evaluation.pdf self-assessment test] to see if you have the necessary background for 10-601.  We recommend that all students take this before enrolling in 10-601 to see if they have the necessary background knowledge already, or if they need to review and/or take additional courses.
  
 
== Grading Policy ==
 
== Grading Policy ==
  
* Semi-final exam: 20%
+
'''To be announced.'''
** Instead of a final exam, we have an exam in class on the ''Monday before Thanksgiving (Nov 25)''
 
* Weekly homeworks (out Wed, due Wed): 60%
 
** Late assignment policy: We will grant up to 50% credit if an assignment is less than 48 hrs late.  Also, you can drop your lowest assignment grade entirely.
 
* Project: 20% (see below)
 
 
 
== Projects ==
 
 
 
More details will be posted later; here is an outline of the project. The goal is ''building and evaluating a robust out-of-the-box classifier learner.''
 
  
Some learning algorithms require more tuning to a new problem than others, but most of what is known about how to tune classifiers for a learning task is folklore, not science.  The question here is: which algorithms are most robust?  To address this I suggest a Kaggle-style competition with these rules.
+
== Policies ==
* Submitted learners will be scored by their average error rates (say) over 5 evaluation learning tasks, each of which has an associated train/test split.
 
* The evaluation tasks are not known in advance - instead there are 20 development learning tasks, each of which has an associated train/test split, to tune the learning system.
 
* The learning system could be, for example:
 
*# A plain classifier learner (eg, a standard implementation of random forests might be a good baseline)
 
*# A classifier learner with a wrapper around it that does a parameter sweep and picks a set of parameters.
 
*# A classifier learner with wrapper that is some sort of feature-selection mechanism.
 
*# A set of K classifier learners, which uses internal cross-validation to pick the best set.
 
*# A set of K classifier learners, including one or more than project team-mates have implemented and/or invented on their own.
 
*# A semi-automatic system, which requires some human input to make its final choice of classifier. (But we're not sure now how to score this....?)
 
*# Anything else you can think of.
 
  
== Policy on Collaboration among Students  ==
+
=== Collaboration between students ===
  
These policies are the same as were used in [http://www.cs.cmu.edu/~roni/10601/ Dr. Rosenfeld's previous version of 2013].
+
These policies are similar to those used by [http://www.cs.cmu.edu/~roni/10601/ Dr. Rosenfeld].
  
 
The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone, and the student should be ready to reproduce their solution upon request.
 
The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone, and the student should be ready to reproduce their solution upon request.
Line 96: Line 91:
  
 
As a related point, some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions.  Avoiding the use of heavily tested assignments will detract from the main purpose of these assignments, which is to reinforce the material and stimulate thinking.  Because some of these assignments may have been used before, solutions to them may be (or may have been) available online, or from other people.  It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before.  '''You must solve the homework assignments completely on your own'''. I will mostly rely on your wisdom and honor to follow this rule, but if a violation is detected it will be dealt with harshly.  Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated below.
 
As a related point, some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions.  Avoiding the use of heavily tested assignments will detract from the main purpose of these assignments, which is to reinforce the material and stimulate thinking.  Because some of these assignments may have been used before, solutions to them may be (or may have been) available online, or from other people.  It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before.  '''You must solve the homework assignments completely on your own'''. I will mostly rely on your wisdom and honor to follow this rule, but if a violation is detected it will be dealt with harshly.  Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated below.
 +
 +
=== Other Policies and FAQ ===
 +
 +
* '''Can I take the class pass/fail? Or, can I audit?'''  My policy is to give priority to students that are taking the class for a grade, so you cannot sign up for the class pass/fail or as an audit unless the waitlist clears.  However, I expect that this spring there will be no waitlist - we have a large room.
 +
* '''Can I get an extension on ....?''' Generally no, but you can get 50% credit for up to 48 hrs after the assignment is due.  If you have a documented medical issue or something similar email William or Nina.
 +
* '''What do I need to do if I want to audit?''' attend the lectures and sit for the mid-term and final, and quizzes.  You don't need to study for the exams - mainly we're interested to know how much you've absorbed in the audit.
 +
* '''What is the minimum grade to pass?''' Generally this depends on what your program considers a pass (typically D for undergrads, C for most MS programs, B for PhDs).  We will compute your actual grade for the course and then threshold it appropriately.

Latest revision as of 17:24, 6 January 2016

Important Announcements

  • Important announcements will be made here as well as on Piazza.

Important People and Places

  • Instructors: William Cohen and Eric Xing, Machine Learning Dept and LTI
  • Course secretary: Sharon Cavlovich, sharonw+@cs.cmu.edu, 412-268-5196
  • When/where: M/W 4:30-5:50, Doherty Hall 2315 (not 1:30-2:50 as was announced earlier!)
    • Classes will start on Wednesday, Sept 4 (the Wed after Labor Day)
  • Course Number: ML 10-601
  • TAs and recitation schedule:
    • Guanyu Wang (wgiveny@gmail.com, guanyuw@andrew), recitation: Mon. 6:30pm-7:30pm Porter Hall A18C
    • William Yang Wang (ww@cmu.edu, yww@andrew), recitation: Tue. 5pm-6pm Gates 4215
    • Shu-Hao Yu (shuhaoy@gmail.com, shuhaoy@andrew), recitation: Wed. 6:30pm-7:30pm Wean 5403
    • Avinava Dubey (akdubey@andrew.cmu.edu), recitation: Thu. 5pm-6pm Porter Hall A18C
    • Pengtao Xie (pengtaoxie2008@gmail.com, pxie1@andrew), recitation: Fri. 5pm-6pm GHC 4215
    • Shangqing Zhang (zsqhyhzyh@gmail.com, shangqiz@andrew), recitation leader-at-large
    • Ying Shen (yingshen@andrew.cmu.edu), recitation leader-at-large
    • Recitations will start after Sept 4
  • Syllabus (including lecture slides and HWs): Syllabus for Machine Learning 10-601
  • On-line lectures: MediaSite will post within 24 hrs of lecture, use your Andrew id to log in.
  • Office hours for William and Eric:
    • William and Eric will hold office hours in DH 2315 immediately after class from 5:50 to 6:30pm. (I'm told the room is free until 7pm). Typically Eric will have office hours Monday and William on Wed.
  • We'll be using BlackBoard and Autolab for most assignments.
  • We've set up a Piazza page for questions of general interest.


For instructors only:

  • The autolab directory is /afs/cs/academic/class/10601-f13/autolab - you need to be in the right pts group to access it, ask wcohen if you don't.
  • New: Save backup materials - eg handout .tex files, autolab scripts, etc - in /afs/cs.cmu.edu/academic/class/10601
  • To-do lists and such are on our GDoc spreadsheet."

Description

Machine Learning (ML) asks "how can we design programs that automatically improve their performance through experience?" This includes learning to perform many types of tasks based on many types of experience, e.g. spotting high-risk medical patients, recognizing speech, classifying text documents, detecting credit card fraud, or driving autonomous robots.

Topics covered in 10-601 include concept learning, version spaces, decision trees, neural networks, computational learning theory, active learning, estimation & the bias-variance tradeoff, hypothesis testing, Bayesian learning, Naïve Bayes classifier, Bayes Nets & Graphical Models, the EM algorithm, Hidden Markov Models, K-Nearest-Neighbors and nonparametric learning, reinforcement learning, bagging and boosting, neural networks, and other topics.

10-601 focuses on the mathematical, statistical and computational foundations of the field. It emphasizes the role of assumptions in machine learning. As we introduce different ML techniques, we work out together what assumptions are implicit in them. Grading is based on written assignments, programming assignments, and a final exam.

10-601 focuses on understanding what makes machine learning work. If your interest is primarily in learning the process of applying ML effectively, and in the practical side of ML for applications, you should consider Machine Learning in Practice (11-344/05-834).

10-601 is open to all but is recommended for CS Seniors & Juniors, Quantitative Masters students, and non-SCS PhD students.

Syllabus and Text

Syllabus for Machine Learning 10-601, including lecture slides and HWs

Previous syllabi, for the historically-minded:

No texts are required, but these are recommended:

Most lectures will have readings suggested from both Murphy and Mitchell, and you can read either of these to get the necessary material. Reading both is not required. Mitchell doesn't cover all the topics in the course, but when it doesn't we will suggest other on-line materials.

Prerequisites

Formal prerequisites:

  • Prerequisites are 15-122, Principles of Imperative Computation AND 21-127: Concepts of Mathematics.
  • Additionally, a probability course is a co-requisite: 36-217: Probability Theory and Random Processes OR 36-225: Introduction to Probability and Statistics I
  • A minimum grade of 'C' is required in all these courses.

Self-assessment for students:

  • Students, especially graduate students, come to CMU with a variety of different backgrounds, so formal course prereqs are hard to establish. There is a short self-assessment test to see if you have the necessary background for 10-601. We recommend that all students take this before enrolling in 10-601 to see if they have the necessary background knowledge already, or if they need to review and/or take additional courses.

Grading Policy

To be announced.

Policies

Collaboration between students

These policies are similar to those used by Dr. Rosenfeld.

The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone, and the student should be ready to reproduce their solution upon request.

The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved, on the first page of their assignment. Specifically, each assignment solution must start by answering the following questions:

(1) Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
If you answered 'yes', give full details: _______________ (e.g. "Jane explained to me what is asked in Question 3.4")
(2) Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
If you answered 'yes', give full details: _______________ (e.g. "I pointed Joe to section 2.3 to help him with Question 2".

Collaboration without full disclosure will be handled severely, in compliance with CMU's Policy on Cheating and Plagiarism.

As a related point, some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions. Avoiding the use of heavily tested assignments will detract from the main purpose of these assignments, which is to reinforce the material and stimulate thinking. Because some of these assignments may have been used before, solutions to them may be (or may have been) available online, or from other people. It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before. You must solve the homework assignments completely on your own. I will mostly rely on your wisdom and honor to follow this rule, but if a violation is detected it will be dealt with harshly. Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated below.

Other Policies and FAQ

  • Can I take the class pass/fail? Or, can I audit? My policy is to give priority to students that are taking the class for a grade, so you cannot sign up for the class pass/fail or as an audit unless the waitlist clears. However, I expect that this spring there will be no waitlist - we have a large room.
  • Can I get an extension on ....? Generally no, but you can get 50% credit for up to 48 hrs after the assignment is due. If you have a documented medical issue or something similar email William or Nina.
  • What do I need to do if I want to audit? attend the lectures and sit for the mid-term and final, and quizzes. You don't need to study for the exams - mainly we're interested to know how much you've absorbed in the audit.
  • What is the minimum grade to pass? Generally this depends on what your program considers a pass (typically D for undergrads, C for most MS programs, B for PhDs). We will compute your actual grade for the course and then threshold it appropriately.