Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
 
=== Optional Readings ===
 
=== Optional Readings ===
  
* To be assigned
+
* Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.
  
 
=== What You Should Remember ===
 
=== What You Should Remember ===

Revision as of 10:53, 16 October 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2015.

Slides


Optional Readings

  • Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.

What You Should Remember

  • How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
  • Why adaptive, per-parameter learning rates can be useful.
  • The general form for the Adagrad update.
  • The dual form of the perceptron, as a kernel method
  • Definitions: kernel, Gram matrix, RKHS
  • How the "hash trick" can be formalized as a kernel