Difference between revisions of "Class meeting for 10-405 SGD and Hash Kernels"

From Cohen Courses
Jump to: navigation, search
(Slides)
(Things to Remember)
 
Line 29: Line 29:
 
* Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
 
* Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
 
* Updates for logistic regression, with and without regularization
 
* Updates for logistic regression, with and without regularization
 +
* The formal  properties of sparse logistic regression
 +
** Whether it is exact or approximate
 +
** How it changes memory and time usage
 
* Formalization of logistic regression as matching expectations between data and model
 
* Formalization of logistic regression as matching expectations between data and model
 
* Regularization and how it interacts with overfitting
 
* Regularization and how it interacts with overfitting
 
* How "sparsifying" regularization affects run-time and memory
 
* How "sparsifying" regularization affects run-time and memory
 
* What the "hash trick" is and why it should work
 
* What the "hash trick" is and why it should work

Latest revision as of 12:32, 5 March 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Contents

Slides

Stochastic gradient descent:

Quiz

Readings for the Class

Optional readings

Things to Remember

  • Approach of learning by optimization
  • Optimization goal for logistic regression
  • Key terms: logistic function, sigmoid function, log conditional likelihood, loss function, stochastic gradient descent
  • Updates for logistic regression, with and without regularization
  • The formal properties of sparse logistic regression
    • Whether it is exact or approximate
    • How it changes memory and time usage
  • Formalization of logistic regression as matching expectations between data and model
  • Regularization and how it interacts with overfitting
  • How "sparsifying" regularization affects run-time and memory
  • What the "hash trick" is and why it should work