深层神经网络

1.Forward and backward propagation:


  • Forward propagation for layer L:
    • Z[L]=W[L]A[L-1]+b[L]
    • A[L]=g[L](Z[L])
  • Backward propagation for layer L:
    • dZ[L]=dA[L]*g'[L](Z[L])
    • dW[L]=(1.0/m)dZ[L].A[L-1].T
    • db[L]=(1.0/m)*np.sum(dZ[L],axis=1,keepdims=True)
    • dA[L-1]=W'[L]dZ[L]

  • 核对矩阵的维数(Getting your matrix dimensions right)
    • w的维数是(下一层的维数,前一层的维数); W[L]:(n[L],n[L-1])
    • b的维度是(下一层的维数,1),b[L]:(n[L],1);
    • Z[L],A[L]的维度是(神经元个数,样本个数):(n[L],m)
    • dW[L]和W[L]维度相同,db[L]和b[L]维度相同,dZ[L]和Z[L]维度相同,dA[L]和A[L]维度相同

2.搭建神经网络块(Building blocks of deep neural networks):


正向函数:输入A[L-1]、W[L]、b[L],输出A[L]并且缓存Z[L]

反向函数:输入dA[L]Z[L]W[L]、b[L],输出dA[L-1]dW[L]db[L]

3.参数与超参数


  • 什么是超参数:
    • learning rate(学习率)
    • Iterations(梯度下降法循环的数量)
    • L(隐藏层数目)
    • n[L](隐藏层单元数目)
    • choice of activation function(激活函数的选择)
    • momentum、mini batch szie、regularization parameters 等等

4.编程实践:


激活函数:

import numpy as np

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z
    
    return A, cache

def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """
    
    A = np.maximum(0,Z)
    
    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache


def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)
    
    return dZ

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ

def tanh_backward(dA,cache):
    Z=cache
    dZ=dA*(1-np.power(np.tanh(Z),2))
    assert(dZ.shape==Z.shape)
    return dZ

L层神经网络的编程实现


  • Initialize Deep Neural Network parameters:
      •  1 def initialize_parameters_deep(layer_dims):
         2         """
         3     Arguments:
         4     layer_dims -- python array (list) containing the dimensions of each layer in our network
         5     
         6     Returns:
         7     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
         8                     Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
         9                     bl -- bias vector of shape (layer_dims[l], 1)
        10     """
        11     np.random.seed(3)
        12     parameters={}
        13     L=len(layer_dims)
        14     for l in range(1,L):
        15         parameters['W'+str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
        16         parameters['b'+str(l)]=np.zeros((layer_dims[l],1))
        17         assert(parameters['W'+str(l)].shape==(layer_dims[l],layer_dims[l-1]))
        18         assert(parameters['b'+str(l)].shape==(layer_dims[l],1))
        19     return parameters
  • Forward propagation module:
    • Linear Forward
    • Linear_activation_forward:
    • L-Layer Model
      •  1 def linear_forward(A,W,b):
         2         """
         3     Implement the linear part of a layer's forward propagation.
         4 
         5     Arguments:
         6     A -- activations from previous layer (or input data): (size of previous layer, number of examples)
         7     W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
         8     b -- bias vector, numpy array of shape (size of the current layer, 1)
         9 
        10     Returns:
        11     Z -- the input of the activation function, also called pre-activation parameter 
        12     cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
        13     """
        14     Z=np.dot(W,A)+b
        15     assert(Z.shape==(W.shape[0],A.shape[1])
        16     linear_cache=(A,W,b)
        17     return Z,cache
        18 
        19 def linear_activation_forward(A_prev,W,b,activation):
        20      """
        21     Implement the forward propagation for the LINEAR->ACTIVATION layer
        22 
        23     Arguments:
        24     A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
        25     W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
        26     b -- bias vector, numpy array of shape (size of the current layer, 1)
        27     activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
        28 
        29     Returns:
        30     A -- the output of the activation function, also called the post-activation value 
        31     cache -- a python dictionary containing "linear_cache" and "activation_cache";
        32              stored for computing the backward pass efficiently
        33     """
        34     if activation=='sigmoid':
        35         Z,linear_cache=linear_forward(A_prev,W,b)
        36         A,activation_cache=sigmoid(Z)
        37     elif activation=='relu':
        38         Z,linear_cache=linear_forward(A_prev,W,b)
        39         A,activation_cache=relu(Z)
        40     assert(A.shape==(W.shape[0],A_prev.shape[1]))
        41     cache=(linear_cache,activation_cache)
        42     return A,cache
        43 
        44 def L_model_forward(X,parameters):
        45         """
        46     Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
        47     
        48     Arguments:
        49     X -- data, numpy array of shape (input size, number of examples)
        50     parameters -- output of initialize_parameters_deep()
        51     
        52     Returns:
        53     AL -- last post-activation value
        54     caches -- list of caches containing:
        55                 every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-1)
        56     """
        57     caches=[]
        58     A=X
        59     L=len(parameters)//2
        60     for l in range(1,L):
        61         A_prev=A
        62         A,cache=linear_activation_forward(A_prev,parameters['W'+str(l)],parameters['b'+str(l)],'relu')
        63         caches.append(cache)
        64     A_prev=A
        65     AL,cache=linear_activation_forward(A_prev,parameters['W'+str(L)],parameters['b'+str(L)],"sigmoid")
        66     assert(AL.shape==(1,X.shape[1]))
        67     return AL,caches
    • Cost function:
      •  1 def comput_cost(AL,Y):
         2         """
         3     Implement the cost function defined by equation (7).
         4 
         5     Arguments:
         6     AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
         7     Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
         8 
         9     Returns:
        10     cost -- cross-entropy cost
        11     """
        12     m=Y.shape[1]
        13     cost=(-1.0/m)*(np.dot(Y,np.log(AL).T)+np.dot(1-Y,np.log(1-AL).T))
        14     cost=np.squeeze(cost)
        15     asset(cost.shape==())
        16     return cost  
    •  Backward propagation module
      • Linear backward:
        •  1 def linear_backward(dZ, cache):
           2     """
           3     Implement the linear portion of backward propagation for a single layer (layer l)
           4 
           5     Arguments:
           6     dZ -- Gradient of the cost with respect to the linear output (of current layer l)
           7     cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer
           8 
           9     Returns:
          10     dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
          11     dW -- Gradient of the cost with respect to W (current layer l), same shape as W
          12     db -- Gradient of the cost with respect to b (current layer l), same shape as b
          13     """
          14     A_prev, W, b = cache
          15     m = A_prev.shape[1]
          16 
          17     ### START CODE HERE ### (≈ 3 lines of code)
          18     dW = (1.0/m)*np.dot(dZ,A_prev.T)
          19     db = (1.0/m)*np.sum(dZ,axis=1,keepdims=True)
          20     dA_prev =np.dot(W.T,dZ)
          21     ### END CODE HERE ###
          22     
          23     assert (dA_prev.shape == A_prev.shape)
          24     assert (dW.shape == W.shape)
          25     assert (db.shape == b.shape)
          26     
          27     return dA_prev, dW, db
        • Linear-Activation backward:
          •  1 def linear_activation_backward(dA, cache, activation):
             2     """
             3     Implement the backward propagation for the LINEAR->ACTIVATION layer.
             4     
             5     Arguments:
             6     dA -- post-activation gradient for current layer l 
             7     cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
             8     activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
             9     
            10     Returns:
            11     dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
            12     dW -- Gradient of the cost with respect to W (current layer l), same shape as W
            13     db -- Gradient of the cost with respect to b (current layer l), same shape as b
            14     """
            15     linear_cache, activation_cache = cache
            16     
            17     if activation == "relu":
            18         ### START CODE HERE ### (≈ 2 lines of code)
            19         dZ = relu_backward(dA,activation_cache)
            20         dA_prev, dW, db =linear_backward(dZ,linear_cache) 
            21         ### END CODE HERE ###
            22         
            23     elif activation == "sigmoid":
            24         ### START CODE HERE ### (≈ 2 lines of code)
            25         dZ = sigmoid_backward(dA,activation_cache)
            26         dA_prev, dW, db = linear_backward(dZ,linear_cache)
            27         ### END CODE HERE ###
            28     
            29     return dA_prev, dW, db
          • L-Model Backward:
            •  1 def L_model_backward(AL, Y, caches):
               2     """
               3     Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
               4     
               5     Arguments:
               6     AL -- probability vector, output of the forward propagation (L_model_forward())
               7     Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
               8     caches -- list of caches containing:
               9                 every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
              10                 the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
              11     
              12     Returns:
              13     grads -- A dictionary with the gradients
              14              grads["dA" + str(l)] = ... 
              15              grads["dW" + str(l)] = ...
              16              grads["db" + str(l)] = ... 
              17     """
              18     grads = {}
              19     L = len(caches) # the number of layers
              20     m = AL.shape[1]
              21     Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
              22     
              23     # Initializing the backpropagation
              24     ### START CODE HERE ### (1 line of code)
              25     dAL = -(np.divide(Y,AL)-np.divide(1-Y,1-AL))
              26     ### END CODE HERE ###
              27     
              28     # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
              29     ### START CODE HERE ### (approx. 2 lines)
              30     current_cache =caches[-1]
              31     grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] =linear_activation_backward(dAL, current_cache, "sigmoid") 
              32     ### END CODE HERE ###
              33     
              34     # Loop from l=L-2 to l=0
              35     for l in reversed(range(L-1)):
              36         # lth layer: (RELU -> LINEAR) gradients.
              37         # Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
              38         ### START CODE HERE ### (approx. 5 lines)
              39         current_cache =caches[l]
              40         dA_prev_temp, dW_temp, db_temp =linear_activation_backward(grads["dA"+str(l+1)], current_cache, "relu")
              41         grads["dA" + str(l)] = dA_prev_temp
              42         grads["dW" + str(l + 1)] = dW_temp
              43         grads["db" + str(l + 1)] = db_temp
              44         ### END CODE HERE ###
              45 
              46     return grads
      • Update Parameters
        •  1 def update_parameters(parameters, grads, learning_rate):
           2     """
           3     Update parameters using gradient descent
           4     
           5     Arguments:
           6     parameters -- python dictionary containing your parameters 
           7     grads -- python dictionary containing your gradients, output of L_model_backward
           8     
           9     Returns:
          10     parameters -- python dictionary containing your updated parameters 
          11                   parameters["W" + str(l)] = ... 
          12                   parameters["b" + str(l)] = ...
          13     """
          14     
          15     L = len(parameters) // 2 # number of layers in the neural network
          16 
          17     # Update rule for each parameter. Use a for loop.
          18     ### START CODE HERE ### (≈ 3 lines of code)
          19     for l in range(L):
          20         parameters["W" + str(l+1)] -=learning_rate*grads["dW" + str(l+1)]
          21         parameters["b" + str(l+1)] -= learning_rate*grads["db" + str(l+1)]
          22     ### END CODE HERE ###
          23     return parameters

5.深度神经网络的运用:


  • Build Deep Learning model step:
    • Initialize parameters / Define hyperparameters
    • Loop for num_iterations:
      • Forward propagation
      • Compute cost function
      • Backward propagation
      • Update parameters (using parameters,and grads from backprop)
    • Use trained parameters to predict labels
原文地址:https://www.cnblogs.com/easy-wang/p/10000629.html