Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/rsapaper/p/7601086.html