Difference between revisions of "Class meeting for 10-605 SGD and Hash Kernels"

Latest revision as of 12:09, 26 September 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2017.

For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.

Things to Remember

Approach of learning by optimization
Optimization goal for logistic regression
Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
Updates for logistic regression, with and without regularization
Formalization of logistic regression as matching expectations between data and model
Regularization and how it interacts with overfitting
How "sparsifying" regularization affects run-time and memory
What the "hash trick" is and why it should work

@@ Line 1: / Line 1: @@
-This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2015]].
+This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2017]].
 === Slides ===
@@ Line 5: / Line 5: @@
 Stochastic gradient descent:
-* [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pptx Slides in Powerpoint]
+* [http://www.cs.cmu.edu/~wcohen/10-605/2016/sgd.pptx Slides in Powerpoint]
-* [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pdf Slides in PDF]
+* [http://www.cs.cmu.edu/~wcohen/10-605/2016/sgd.pdf Slides in PDF]
+=== Quiz ===
+* [https://qna.cs.cmu.edu/#/pages/view/50 Today's quiz]
 === Readings for the Class ===
@@ Line 16: / Line 21: @@
 * For logistic regression, and the sparse updates for it:  [http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression], Carpenter, Bob. 2008. See also [http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html his blog post] on logistic regression.  I also recommend [http://www.cs.cmu.edu/~wcohen/10-605/notes/elkan-logreg.pdf Charles Elkan's notes on logistic regression] (local saved copy).
 * For hash kernels: [http://arxiv.org/pdf/0902.2206.pdf Feature Hashing for Large Scale Multitask Learning], Weinberger et al, ICML 2009.
+=== Things to Remember ===
+* Approach of learning by optimization
+* Optimization goal for logistic regression
+* Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
+* Updates for logistic regression, with and without regularization
+* Formalization of logistic regression as matching expectations between data and model
+* Regularization and how it interacts with overfitting
+* How "sparsifying" regularization affects run-time and memory
+* What the "hash trick" is and why it should work

Difference between revisions of "Class meeting for 10-605 SGD and Hash Kernels"

Latest revision as of 12:09, 26 September 2017

Contents

Slides

Quiz

Readings for the Class

Optional readings

Things to Remember

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools