TensorFlow tf.gradients的用法详细解析以及具体例子

tf.gradients

官方定义：

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    stop_gradients=None,
)

Constructs symbolic derivatives of sum of ys w.r.t. x in xs.

ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by theys. The list must be the same length as ys.

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.

grad_ys is a list of tensors of the same length as ys that holds the initial gradients for each y in ys. When grad_ysis None, we fill in a tensor of '1's of the shape of y for each y in ys. A user can provide their own initial grad_ys to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).

stop_gradients is a Tensor or a list of tensors to be considered constant with respect to all xs. These tensors will not be backpropagated through, as though they had been explicitly disconnected using stop_gradient. Among other things, this allows computation of partial derivatives as opposed to total derivatives.

翻译：

1. xs和ys可以是一个张量，也可以是张量列表，tf.gradients(ys,xs) 实现的功能是求ys（如果ys是列表，那就是ys中所有元素之和）关于xs的导数（如果xs是列表，那就是xs中每一个元素分别求导），返回值是一个与xs长度相同的列表。

例如ys=[y1,y2,y3], xs=[x1,x2,x3,x4]，那么tf.gradients(ys,xs)=[d(y1+y2+y3)/dx1,d(y1+y2+y3)/dx2,d(y1+y2+y3)/dx3,d(y1+y2+y3)/dx4].具体例子见下面代码第16-17行。

2. grad_ys 是ys的加权向量列表，和ys长度相同，当grad_ys=[q1,q2,g3]时，tf.gradients(ys,xs，grad_ys)=[d(g1*y1+g2*y2+g3*y3)/dx1,d(g1*y1+g2*y2+g3*y3)/dx2,d(g1*y1+g2*y2+g3*y3)/dx3,d(g1*y1+g2*y2+g3*y3)/dx4].具体例子见下面代码第19-21行。

3. stop_gradients使得指定变量不被求导，即视为常量，具体的例子见官方例子，此处省略。

 1 import tensorflow as tf
 2 w1 = tf.Variable([[1,2]])
 3 w2 = tf.Variable([[3,4]])
 4 res = tf.matmul(w1, [[2],[1]])
 5 
 6 #ys必须与xs有关，否则会报错
 7 # grads = tf.gradients(res,[w1,w2])
 8 #TypeError: Fetch argument None has invalid type <class 'NoneType'>
 9 
10 # grads = tf.gradients(res,[w1])
11 # # Result [array([[2, 1]])]
12 
13 res2a=tf.matmul(w1, [[2],[1]])+tf.matmul(w2, [[3],[5]])
14 res2b=tf.matmul(w1, [[2],[4]])+tf.matmul(w2, [[8],[6]])
15 
16 # grads = tf.gradients([res2a,res2b],[w1,w2])
17 #result:[array([[4, 5]]), array([[11, 11]])]
18 
19 grad_ys=[tf.Variable([[1]]),tf.Variable([[2]])]
20 grads = tf.gradients([res2a,res2b],[w1,w2],grad_ys=grad_ys)
21 # Result: [array([[6, 9]]), array([[19, 17]])]
22 
23 with tf.Session() as sess:
24     tf.global_variables_initializer().run()
25     re = sess.run(grads)
26     print(re)