Difference between revisions of "10-601 Deep Learning 1"
From Cohen Courses
Jump to navigationJump to search| Line 22: | Line 22: | ||
=== Summary === | === Summary === | ||
| − | * | + | * The underlying reasons deep networks are hard to train |
| + | ** Exploding/vanishing gradients | ||
| + | ** Saturation | ||
| + | * The importance of key recent advances in neural networks: | ||
| + | ** Matrix operations and GPU training | ||
| + | ** ReLU, cross-entropy, softmax | ||
| + | * Convolutional networks | ||
| + | ** 2-d convolution | ||
| + | ** How to construct a convolution layer | ||
| + | ** Architecture of CNN: convolution/downsampling pairs | ||
Revision as of 13:06, 5 April 2016
This a lecture used in the Syllabus for Machine Learning 10-601B in Spring 2016
Slides
Readings
This area is moving very fast and the textbooks are not up-to-date. Some recommended readings:
- Neural Networks and Deep Learning An online book by Michael Nielsen, pitched at an appropriate level for 10-601, which has a bunch of exercises and on-line sample programs in Python.
- Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition has nice on-line notes.
I also used some on-line visualizations in the materials for the lecture, especially the part on ConvNets.
- the Wikipedia page for convolutions has nice animations of 1-D convolutions.
- [http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo On-line demo of 2-D convolutions for image processing.
- There's an on-line demo of CNNs which are trained in your browser (!)
- 3D visualization of a trained net.
For more detail, look at the [http://www.deeplearningbook.org/ MIT Press book (in preparation) from Bengio
Summary
- The underlying reasons deep networks are hard to train
- Exploding/vanishing gradients
- Saturation
- The importance of key recent advances in neural networks:
- Matrix operations and GPU training
- ReLU, cross-entropy, softmax
- Convolutional networks
- 2-d convolution
- How to construct a convolution layer
- Architecture of CNN: convolution/downsampling pairs