# Difference between revisions of "Class meeting for 10-405 SGD and Hash Kernels"

From Cohen Courses

Jump to navigationJump to search (→Slides) |
|||

Line 29: | Line 29: | ||

* Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent | * Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent | ||

* Updates for logistic regression, with and without regularization | * Updates for logistic regression, with and without regularization | ||

+ | * The formal properties of sparse logistic regression | ||

+ | ** Whether it is exact or approximate | ||

+ | ** How it changes memory and time usage | ||

* Formalization of logistic regression as matching expectations between data and model | * Formalization of logistic regression as matching expectations between data and model | ||

* Regularization and how it interacts with overfitting | * Regularization and how it interacts with overfitting | ||

* How "sparsifying" regularization affects run-time and memory | * How "sparsifying" regularization affects run-time and memory | ||

* What the "hash trick" is and why it should work | * What the "hash trick" is and why it should work |

## Latest revision as of 12:32, 5 March 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

### Slides

Stochastic gradient descent:

### Quiz

### Readings for the Class

### Optional readings

- For logistic regression, and the sparse updates for it: Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression, Carpenter, Bob. 2008. See also his blog post on logistic regression. I also recommend Charles Elkan's notes on logistic regression (local saved copy).
- For hash kernels: Feature Hashing for Large Scale Multitask Learning, Weinberger et al, ICML 2009.

### Things to Remember

- Approach of learning by optimization
- Optimization goal for logistic regression
- Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
- Updates for logistic regression, with and without regularization
- The formal properties of sparse logistic regression
- Whether it is exact or approximate
- How it changes memory and time usage

- Formalization of logistic regression as matching expectations between data and model
- Regularization and how it interacts with overfitting
- How "sparsifying" regularization affects run-time and memory
- What the "hash trick" is and why it should work