Difference between revisions of "Class meeting for 10-605 SGD and Hash Kernels"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
|||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2017]]. |
=== Slides === | === Slides === | ||
Line 5: | Line 5: | ||
Stochastic gradient descent: | Stochastic gradient descent: | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pptx Slides in Powerpoint] | + | * [http://www.cs.cmu.edu/~wcohen/10-605/2016/sgd.pptx Slides in Powerpoint] |
− | * [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pdf Slides in PDF] | + | * [http://www.cs.cmu.edu/~wcohen/10-605/2016/sgd.pdf Slides in PDF] |
+ | |||
+ | === Quiz === | ||
+ | |||
+ | |||
+ | * [https://qna.cs.cmu.edu/#/pages/view/50 Today's quiz] | ||
=== Readings for the Class === | === Readings for the Class === | ||
Line 16: | Line 21: | ||
* For logistic regression, and the sparse updates for it: [http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression], Carpenter, Bob. 2008. See also [http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html his blog post] on logistic regression. I also recommend [http://www.cs.cmu.edu/~wcohen/10-605/notes/elkan-logreg.pdf Charles Elkan's notes on logistic regression] (local saved copy). | * For logistic regression, and the sparse updates for it: [http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression], Carpenter, Bob. 2008. See also [http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html his blog post] on logistic regression. I also recommend [http://www.cs.cmu.edu/~wcohen/10-605/notes/elkan-logreg.pdf Charles Elkan's notes on logistic regression] (local saved copy). | ||
* For hash kernels: [http://arxiv.org/pdf/0902.2206.pdf Feature Hashing for Large Scale Multitask Learning], Weinberger et al, ICML 2009. | * For hash kernels: [http://arxiv.org/pdf/0902.2206.pdf Feature Hashing for Large Scale Multitask Learning], Weinberger et al, ICML 2009. | ||
+ | |||
+ | === Things to Remember === | ||
+ | |||
+ | |||
+ | * Approach of learning by optimization | ||
+ | * Optimization goal for logistic regression | ||
+ | * Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent | ||
+ | * Updates for logistic regression, with and without regularization | ||
+ | * Formalization of logistic regression as matching expectations between data and model | ||
+ | * Regularization and how it interacts with overfitting | ||
+ | * How "sparsifying" regularization affects run-time and memory | ||
+ | * What the "hash trick" is and why it should work |
Latest revision as of 12:09, 26 September 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2017.
Slides
Stochastic gradient descent:
Quiz
Readings for the Class
Optional readings
- For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
- For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.
Things to Remember
- Approach of learning by optimization
- Optimization goal for logistic regression
- Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
- Updates for logistic regression, with and without regularization
- Formalization of logistic regression as matching expectations between data and model
- Regularization and how it interacts with overfitting
- How "sparsifying" regularization affects run-time and memory
- What the "hash trick" is and why it should work