Class meeting for 10-605 SGD and Hash Kernels

For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.

Approach of learning by optimization
Optimization goal for logistic regression
Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
Updates for logistic regression, with and without regularization
Formalization of logistic regression as matching expectations between data and model
Regularization and how it interacts with overfitting
How "sparsifying" regularization affects run-time and memory
What the "hash trick" is and why it should work

Contents