Class meeting for 10-605 Advanced topics for SGD

From Cohen Courses
Jump to navigationJump to search

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides


Optional Readings

  • Agarwal, Alekh, et al. "A reliable effective terascale linear learning system." The Journal of Machine Learning Research 15.1 (2014): 1111-1133.

What You Should Remember

  • How an AllReduce works and why it is more efficient than collecting and broadcasting parameters.
  • Why adaptive, per-parameter learning rates can be useful.
  • The general form for the Adagrad update.
  • The dual form of the perceptron, as a kernel method
  • Definitions: kernel, Gram matrix, RKHS
  • How the "hash trick" can be formalized as a kernel