optimizer.zero_grad()

        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

optimizer.zero_grad()意思是把梯度置零,也就是把loss关于weight的导数变成0.

原文地址:https://www.cnblogs.com/ziytong/p/11360147.html