Difference between revisions of "Class meeting for 10-405 SGD and Hash Kernels"
From Cohen Courses
Jump to navigationJump to search (Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...") |
(→Slides) |
||
Line 5: | Line 5: | ||
Stochastic gradient descent: | Stochastic gradient descent: | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-405 | + | * [http://www.cs.cmu.edu/~wcohen/10-405/sgd.pptx Slides in Powerpoint] |
− | * [http://www.cs.cmu.edu/~wcohen/10-405 | + | * [http://www.cs.cmu.edu/~wcohen/10-405/sgd.pdf Slides in PDF] |
=== Quiz === | === Quiz === |
Revision as of 10:13, 14 February 2018
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
Slides
Stochastic gradient descent:
Quiz
Readings for the Class
Optional readings
- For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
- For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.
Things to Remember
- Approach of learning by optimization
- Optimization goal for logistic regression
- Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
- Updates for logistic regression, with and without regularization
- Formalization of logistic regression as matching expectations between data and model
- Regularization and how it interacts with overfitting
- How "sparsifying" regularization affects run-time and memory
- What the "hash trick" is and why it should work