Pytorch 自定义前馈反馈函数

参考：https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html

torch.autograd.Function

given a random x, y, W1, W2.
y = ( W1 * x) * W2, to predict y with input x using gradient descent by minimizing squared Eculidean distance.
We redefine ReLU and achieve the forward pass and backward pass.
这里自定义了 ReLU函数的前馈和反馈过程

import torch

class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """
    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input<0] = 0
        return grad_input

dtype = torch.float
device = torch.device("cpu")

# device = torch.device("cuda:0")  # Uncomment this to run on GPU
# torch.backends.cuda.matmul.allow_tf32 = False  # Uncomment this to run on GPU

# The above line disables TensorFloat32. This a feature that allows
# networks to run at a much faster speed while sacrificing precision.
# Although TensorFloat32 works well on most real models, for our toy model
# in this tutorial, the sacrificed precision causes convergence issue.
# For more information, see:
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

lr = 1e-6

relu = MyReLU.apply
for i in range(500):
    
    y_pred = relu(x.mm(w1)).mm(w2)

    loss = (y_pred-y).pow(2).sum()

    if i % 100 == 99:
        print(i, loss.item())

    loss.backward()

# 参数的更新常规所使用的是`optim.step()`，去对定义在`optim`里面的`model.parameters()`这里进行更新
# 由于这里我们不使用优化器，因此这里直接手动进行更新，注意这里已经不需要算梯度了，只是把已经算好的梯度进行更新
    with torch.no_grad(): 
        w1-=lr*w1.grad
        w2-=lr*w2.grad

        w1.grad.zero_()
        w2.grad.zero_()

输出结果:

99 952.6715087890625
199 6.376166820526123
299 0.06997707486152649
399 0.0012868450721725821
499 0.00012174161383882165