Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

原文地址:https://www.cnblogs.com/rsapaper/p/7601086.html