Difference between revisions of "Class meeting for 10-605 Deep Learning"

From Cohen Courses
Jump to navigationJump to search
Line 9: Line 9:
  
 
* Automatic differentiation:
 
* Automatic differentiation:
** William's notes on [http://www.cs.cmu.edu/~wcohen/10-605/notes/autodiff.pdf automatic differentiation], and the sample Python code for an [http://www.cs.cmu.edu/~wcohen/10-605/code/xman.py expression manager] and a http://www.cs.cmu.edu/~wcohen/10-605/code/sample-use-of-xman.py] use of an expression manager.
+
** William's notes on [http://www.cs.cmu.edu/~wcohen/10-605/notes/autodiff.pdf automatic differentiation], and the sample Python code for an [http://www.cs.cmu.edu/~wcohen/10-605/code/xman.py expression manager] and a [http://www.cs.cmu.edu/~wcohen/10-605/code/sample-use-of-xman.py] use of an expression manager.
 
**  [https://justindomke.wordpress.com/2009/03/24/a-simple-explanation-of-reverse-mode-automatic-differentiation/ Domke's blog post] - clear but not much detail - and [http://colah.github.io/posts/2015-08-Backprop/  another nice blog post].
 
**  [https://justindomke.wordpress.com/2009/03/24/a-simple-explanation-of-reverse-mode-automatic-differentiation/ Domke's blog post] - clear but not much detail - and [http://colah.github.io/posts/2015-08-Backprop/  another nice blog post].
 
** The clearest paper I've found is [http://www.bcl.hamilton.ie/~barak/papers/toplas-reverse.pdf Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator]
 
** The clearest paper I've found is [http://www.bcl.hamilton.ie/~barak/papers/toplas-reverse.pdf Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator]

Revision as of 17:26, 17 October 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides

  • TBD


Readings

  • More general neural networks:
    • Neural Networks and Deep Learning An online book by Michael Nielsen, pitched at an appropriate level for 10-601, which has a bunch of exercises and on-line sample programs in Python.

For more detail, look at the MIT Press book (in preparation) from Bengio - it's very complete but also fairly technical.

Things to remember

  • The underlying reasons deep networks are hard to train
  • Exploding/vanishing gradients
  • Saturation
  • The importance of key recent advances in neural networks:
  • Matrix operations and GPU training
  • ReLU, cross-entropy, softmax
  • How backprop can be generalized to a sequence of assignment operations