Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016. Notes: * Homeworks, unless otherwise posted, will be due when the next HW comes out....")
 
 
(74 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Fall 2016]].   
 
This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Fall 2016]].   
 +
 +
----
  
 
Notes:  
 
Notes:  
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
 
+
* Classes are cancelled for Oct 27
''note: this is under construction''
+
* '''No classes will be held on Nov 24 (Thanksgiving)'''
 
 
Schedule:
 
* Tues Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
 
* Thus Sep 3. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
 
* Tues Sep 8.  [[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
 
** HW1 out: streaming naive Bayes in Java. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf PDF Handout]
 
* Thus Sep 10. [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
 
* Tues Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]].
 
** Lecture also discusses: map-reduce abstractions/dataflow
 
** Also: Guest lecture from Manik Varma, MSR.
 
* Thus Sep 17. [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
  
 
----
 
----
  
* Tues Sep 29.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
+
Schedule for 805 projects:
** HW3 out: Naive Bays in GuineaPig. [https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view PDF Handout]
+
* 11:59pm Sun 10/2: [[Initial 805 project proposal]] due.
* Thus Oct 1. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
+
* 11:59pm Sun 10/16: Final 805 project proposal due.
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
+
** This is a revised writeup that will address any comments William raises from the initial proposal.
* Tues Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
+
* 11:59pm Sun 11/13: [[Midterm 805 project report]] due.
* Thus Oct 8. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
+
* '''1:30-2:50pm Tues 12/6: Project presentations''' (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by '''10am 12/6''' (PDF is best).
* Tues Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
+
* 11:59pm Sun 12/11: [[Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016#Project_Info|Final 805 project writeup]] due.
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 
* Thus Oct 15. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
** For 805 students: the final project proposal is due.
 
* Tues Oct 20. Exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt], [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf pdf]) and guest lecture from '''Mark Torrance of RocketFuel'''
 
* Thus Oct 22. ''midterm exam''
 
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - v1].  This document also references the relevant questions from two previous review sheets:
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
 
  
----
 
 
* Tues Oct 27. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
 
* Thus Oct 29. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
 
** HW5 out: dSGD for modeling text ([https://drive.google.com/file/d/0BzQQ-spWKjhUYUM1LUVZakx0ZlE/view])
 
* Tues Nov 3. Finish up with randomized algorithms.
 
* Thus Nov 5. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]
 
* Tues Nov 10. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 
* Thus Nov 12. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
** HW6 out: approximate pagerank for sampling a graph ([https://goo.gl/ThtRc6])
 
* Tues Nov 17.  ''Guest lecture, Chris Dyer.'' [http://demo.clab.cs.cmu.edu/cdyer/bigdata-cuda.pdf Learning with GPUs].
 
* Thus Nov 19. ''Guest lecture: Aurick Qiao'', parameter servers [http://curtis.ml.cmu.edu/w/courses/images/8/85/Aurick_release.pptx ppt slides].
 
* Tues Nov 24.  [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
 
** HW7 out: LDA with a param server ([http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf PDF handout])
 
* Thus Nov 26. ''Happy Thanksgiving!''
 
  
 
----
 
----
  
* Tues Dec 1, Thus Dec 3.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
+
Schedule for lectures and 605 assignments:
* Tues Dec 8.  Review and project presentations (15 min each):
 
** Schedule:
 
*** Bhuwan Dingra/Yun Fu
 
*** Rohit Girdhar
 
*** Siddha Ganju/Sravya Popuri/Srikant Avasarala
 
*** Jingkun Gao/Yiming Gu
 
** HW7 due
 
* Thus Dec 10.  In-class final exam.
 
* Tues Dec 15.  Writeup for 10-805 projects are due (at 11:59pm).
 
  
== Topics covered in previous years but not in 2015 ==
+
* Tues Aug 30, 2016 [[Class meeting for 10-605 in Fall 2016 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
 
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605  in Fall 2016 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
*  [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
+
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
* [[Class meeting for 10-605 PIG|Workflows in PIG]]
+
* Tues Sep 6, 2016 [[Class meeting for 10-605  in Fall 2016 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Local counting in stream and sort
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
+
* Thurs Sep 8, 2016 [[Class meeting for 10-605 in Fall 2016 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
+
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
* [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
+
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Similarity joins, Similarity joins with TFIDF, Parallel simjoins
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
+
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw2nbwithguineapig/writeup
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
+
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
 +
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Phrase-finding in Pig, Other work with phrases
 +
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
 +
* Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Also wrapup for SGD, debugging ML algorithms
 +
** '''Start work on''' Assignment 3: scalable SGD at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw3sgd/writeup
 +
* Tues Oct 4, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]]
 +
* Thurs Oct 6, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]].  Structured perceptrons, Interative parameter mixing paper
 +
* Tues Oct 11, 2016 [[Class meeting for 10-605 SGD for MF|SGD for MF]].  Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
 +
* Thurs Oct 13, 2016 [[Class meeting for 10-605 Midterm review|Midterm review]].
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm from 2015].  This document also references the relevant questions from two previous review sheets:
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
 +
** '''Last assignment due'''
 +
* Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]]. 
 +
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
 +
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw4approximatepagerank/writeup
 +
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
 +
* Thurs Oct 27, 2016. '''No class.'''
 +
* Tues Nov 1, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]]. Reverse-mode differentiation, Recursive ANNs, Word2vec
 +
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
 +
** '''Start work on''' Assignment 5: Autodiff with IPM.  This is a new assignment for Fall 2016. View writeup at https://github.com/KarandeepJohar/10605-f16-hw5/blob/master/automatic-reverse-mode.pdf
 +
* Tues Nov 8, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Locality sensitive hashing
 +
* Thurs Nov 10, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
 +
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
 +
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 +
** '''Start work on''' Assignment 6:  Phrase-finding with Spark. Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw6phrasefindingwithspark/writeup
 +
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]].  DGMs for naive Bayes, Gibbs sampling for LDA
 +
* Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
 +
** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677
 +
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
 +
* Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]].
 +
** '''Last assignment due'''
 +
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].  Note that we've posted:
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 +
** Comments:
 +
*** Most of the exam (approximately 80%) covers material from after the midterm.
 +
*** You may bring in '''two''' 8 1/2 by 11 sheets of paper with notes.

Latest revision as of 11:54, 11 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Oct 27
  • No classes will be held on Nov 24 (Thanksgiving)

Schedule for 805 projects:

  • 11:59pm Sun 10/2: Initial 805 project proposal due.
  • 11:59pm Sun 10/16: Final 805 project proposal due.
    • This is a revised writeup that will address any comments William raises from the initial proposal.
  • 11:59pm Sun 11/13: Midterm 805 project report due.
  • 1:30-2:50pm Tues 12/6: Project presentations (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by 10am 12/6 (PDF is best).
  • 11:59pm Sun 12/11: Final 805 project writeup due.



Schedule for lectures and 605 assignments: