Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"

Revision as of 10:31, 16 October 2015

How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
Why adaptive, per-parameter learning rates can be useful.
The general form for the Adagrad update.
The dual form of the perceptron, as a kernel method
Definitions: kernel, Gram matrix, RKHS
How the "hash trick" can be formalized as a kernel

@@ Line 11: / Line 11: @@
 * To be assigned
+=== What You Should Remember ===
+* How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
+* Why adaptive, per-parameter learning rates can be useful.
+* The general form for the Adagrad update.
+* The dual form of the perceptron, as a kernel method
+* Definitions: kernel, Gram matrix, RKHS
+* How the "hash trick" can be formalized as a kernel