Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"
From Cohen Courses
Jump to navigationJump to search(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]]. |
=== Slides === | === Slides === | ||
Line 10: | Line 10: | ||
=== Optional Readings === | === Optional Readings === | ||
− | * | + | * Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133. |
=== What You Should Remember === | === What You Should Remember === |
Latest revision as of 14:20, 8 August 2016
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.
Slides
Optional Readings
- Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.
What You Should Remember
- How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
- Why adaptive, per-parameter learning rates can be useful.
- The general form for the Adagrad update.
- The dual form of the perceptron, as a kernel method
- Definitions: kernel, Gram matrix, RKHS
- How the "hash trick" can be formalized as a kernel