Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Datase...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2015]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
  
 
=== Slides ===
 
=== Slides ===
Line 10: Line 10:
 
=== Optional Readings ===
 
=== Optional Readings ===
  
* To be assigned
+
* Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.
 +
 
 +
=== What You Should Remember ===
 +
 
 +
* How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
 +
* Why adaptive, per-parameter learning rates can be useful.
 +
* The general form for the Adagrad update.
 +
* The dual form of the perceptron, as a kernel method
 +
* Definitions: kernel, Gram matrix, RKHS
 +
* How the "hash trick" can be formalized as a kernel

Latest revision as of 14:20, 8 August 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides


Optional Readings

  • Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.

What You Should Remember

  • How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
  • Why adaptive, per-parameter learning rates can be useful.
  • The general form for the Adagrad update.
  • The dual form of the perceptron, as a kernel method
  • Definitions: kernel, Gram matrix, RKHS
  • How the "hash trick" can be formalized as a kernel