Difference between revisions of "Class meeting for 10-405 SGD and Hash Kernels"

Latest revision as of 12:32, 5 March 2018

For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.

Approach of learning by optimization
Optimization goal for logistic regression
Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
Updates for logistic regression, with and without regularization
The formal properties of sparse logistic regression
- Whether it is exact or approximate
- How it changes memory and time usage
Formalization of logistic regression as matching expectations between data and model
Regularization and how it interacts with overfitting
How "sparsifying" regularization affects run-time and memory
What the "hash trick" is and why it should work

@@ Line 5: / Line 5: @@
 Stochastic gradient descent:
-* [http://www.cs.cmu.edu/~wcohen/10-405/2016/sgd.pptx Slides in Powerpoint]
+* [http://www.cs.cmu.edu/~wcohen/10-405/sgd.pptx Slides in Powerpoint]
-* [http://www.cs.cmu.edu/~wcohen/10-405/2016/sgd.pdf Slides in PDF]
+* [http://www.cs.cmu.edu/~wcohen/10-405/sgd.pdf Slides in PDF]
 === Quiz ===
@@ Line 29: / Line 29: @@
 * Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
 * Updates for logistic regression, with and without regularization
+* The formal  properties of sparse logistic regression
+** Whether it is exact or approximate
+** How it changes memory and time usage
 * Formalization of logistic regression as matching expectations between data and model
 * Regularization and how it interacts with overfitting
 * How "sparsifying" regularization affects run-time and memory
 * What the "hash trick" is and why it should work