Difference between revisions of "Class meeting for 10-605 Advanced topics for SGD"

Latest revision as of 14:20, 8 August 2016

Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.

How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
Why adaptive, per-parameter learning rates can be useful.
The general form for the Adagrad update.
The dual form of the perceptron, as a kernel method
Definitions: kernel, Gram matrix, RKHS
How the "hash trick" can be formalized as a kernel

Revision as of 10:53, 16 October 2015 (view source) Wcohen (talk \| contribs) (→‎Optional Readings) ← Older edit		Latest revision as of 14:20, 8 August 2016 (view source) Wcohen (talk \| contribs)
Line 1:		Line 1:
−	This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall ~~2015~~\|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in ~~Fall_2015~~]].	+	This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016\|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].

	=== Slides ===		=== Slides ===