RuntimeError: Function 'AddmmBackward' returned nan values in its 2th output.

1.训练报错

使用BCE损失时，出现的问题包括：

报错	参数batch_size \| epoch \| hidden_size \| lr_D \| lr_DZ \| lr_Eref \| lr_model \| z_dim
'ViewBackward' returned nan values	8 \| 50 \| 128 \| 5e-05 \| 0.001 \| 0.001 \| 0.001 \| 16
MvBackward	16 \| 40 \| 256 \| 0.01 \| 0.001 \| 0.001 \| 5e-05 \| 16
AddmmBackward	32 \| 40 \| 256 \| 0.01 \| 5e-05 \| 5e-05 \| 5e-05 \| 128
ViewBackward	32 \| 75 \| 128 \| 5e-05 \| 0.01 \| 0.001 \| 0.0001 \| 64
ViewBackward	8 \| 25 \| 64 \| 0.001 \| 5e-05 \| 0.01 \| 0.0001 \| 32

但是这里也观察不出来什么规律。

但是这是少量出现的，在50个模型中，只有6个是出现Nan值。是否可以忽略这个问题呢？

2.解决办法

https://github.com/pytorch/pytorch/issues/51196，这里提到说

This error is only here in anomaly mode to help you find where nans appeared in the backward pass. This is not related to a bug in PyTorch but just that your current code generate nan values.
You can remove this error by just disabling anomaly detection.

注释掉：

torch.autograd.set_detect_anomaly(True)

但是也不是根本解决办法吧？