Training Deep Neural Networks

intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf

Dropout

Improving neural networks by preventing co-adaptation of feature detectors (Dropout)

arxiv: http://arxiv.org/abs/1207.0580

Regularization of Neural Networks using DropConnect

homepage: http://cs.nyu.edu/~wanli/dropc/
gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
github: https://github.com/iassael/torch-dropconnect

Regularizing neural networks with dropout and with DropConnect

blog: http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/

Fast dropout training

paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
github: https://github.com/sidaw/fastdropout

Dropout as data augmentation

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

arxiv: http://arxiv.org/abs/1512.05287
github: https://github.com/yaringal/BayesianRNN

Improved Dropout for Shallow and Deep Learning

arxiv: http://arxiv.org/abs/1602.02220

Gradient Descent

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)

http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient Descent in Python

blog: http://tillbergmann.com/blog/articles/python-gradient-descent.html

Train faster, generalize better: Stability of stochastic gradient descent

arxiv: http://arxiv.org/abs/1509.01240

A Variational Analysis of Stochastic Gradient Algorithms

arxiv: http://arxiv.org/abs/1602.02666

The vanishing gradient problem: Oh no — an obstacle to deep learning!

blog: https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8

Gradient Descent For Machine Learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Revisiting Distributed Synchronous SGD

arxiv: http://arxiv.org/abs/1604.00981

Accelerate Training

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices

arxiv: http://arxiv.org/abs/1603.07341

Image Data Augmentation

DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm

github: https://github.com/takmin/DataAugmentation

Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques

github: https://github.com/ShaharKatz/Caffe-Data-Augmentation

Papers

Scalable and Sustainable Deep Learning via Randomized Hashing

arxiv: http://arxiv.org/abs/1602.08194

Tools

pastalog: Simple, realtime visualization of neural network training performance

github: https://github.com/rewonc/pastalog

torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance

github: https://github.com/Kaixhin/torch-pastalog