Difference between revisions of "Class meeting for 10-605 SGD and Hash Kernels"
From Cohen Courses
Jump to navigationJump to searchLine 8: | Line 8: | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pdf Slides in PDF] | * [http://www.cs.cmu.edu/~wcohen/10-605/sgd.pdf Slides in PDF] | ||
− | === | + | === Quiz === |
− | https://qna | + | * [https://qna.cs.cmu.edu/#/pages/view/50 Today's quiz] |
=== Readings for the Class === | === Readings for the Class === |
Revision as of 10:44, 27 September 2016
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.
Slides
Stochastic gradient descent:
Quiz
Readings for the Class
Optional readings
- For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
- For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.
Things to Remember
- Approach of learning by optimization
- Optimization goal for logistic regression
- Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
- Updates for logistic regression, with and without regularization
- Formalization of logistic regression as matching expectations between data and model
- Regularization and how it interacts with overfitting
- How "sparsifying" regularization affects run-time and memory
- What the "hash trick" is and why it should work