Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Datase...")
 
Line 11: Line 11:
  
 
* To be assigned
 
* To be assigned
 +
 +
=== What You Should Remember ===
 +
 +
* How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
 +
* Why adaptive, per-parameter learning rates can be useful.
 +
* The general form for the Adagrad update.
 +
* The dual form of the perceptron, as a kernel method
 +
* Definitions: kernel, Gram matrix, RKHS
 +
* How the "hash trick" can be formalized as a kernel

Revision as of 10:31, 16 October 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2015.

Slides


Optional Readings

  • To be assigned

What You Should Remember

  • How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
  • Why adaptive, per-parameter learning rates can be useful.
  • The general form for the Adagrad update.
  • The dual form of the perceptron, as a kernel method
  • Definitions: kernel, Gram matrix, RKHS
  • How the "hash trick" can be formalized as a kernel