学习笔记 | Udacity CarND Term 1: Computer Vision and Deep Learning

Computer Vision and Deep Learning 

Overview (6/22/2017 - 9/30/2017)

  1. Welcome (6/22/2017)
  2. Project: Finding Lane Lines Project (6/23/2017 - 7/5/2017)
  3. Career Services Available to You (6/23/2017)
  4. Introduction to Neural Networks (6/26/2017)
  5. MiniFlow (6/28/2017)
  6. Introduction to TensorFlow (6/28/2017)
  7. Deep Neural Networks (7/5/2017)
  8. Convolutional Neural Networks (7/6/2017)
  9. Project: Traffic Sign Classifier Project (7/7/2017 - 7/16/2017)
  10. Keras (7/15/2017)
  11. Transfer Learning (7/20/2017)
  12. Project: Behavioral Cloning Project (7/20/2017 - 8/5/2017)
  13. Project: Advanced Lane Finding Project (8/5/2017 - 9/8/2017)
  14. Machine Learning and Stanley (7/25/2017)
  15. Support Vector Machines (8/5/2017 - 8/25/2017)
  16. Decision Trees (8/30/2017 - 8/31/2017)
  17. Project: Vehicle Detection and Tracking Project (9/9/2017 - 9/19/2017)
  18. The End (9/5/2017)
  19. Software Setup
  20. Get Ready for Term 2 C++

Note

4. Introduction to Neural Networks

  • Backpropagation
  • Gradient Decentt
    • x = x - learning_rate * gradient_of_x
    • x=xαx​/cost​​
    • We adjust the old x pushing it in the direction of gradx with the forcelearning_rate. Subtracting learning_rate * gradx. Remember the gradient is initially in the direction of steepest ascent so subtracting learning_rate * gradxfrom x turns it into steepest descent. You can make sure of this yourself by replacing the subtraction with an addition.
  • Backpropagation is also called reverse-mode differentiation

6. Introduction to TensorFlow

  • logit 对元
  • What does softmax do?
    •  Use the softmax function to turn your logits into probabilities

  • momentum ?
  • What does None do here?The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

11. Transfer Learning

  • 几种training
  1. Feature extraction (train only the top-level of the network, the rest of the network remains fixed)
  2. Finetuning (train the entire network end-to-end, start with pre-trained weights)
  3. Training from scratch (train the entire network end-to-end, start from random weights)
  • Consider feature extraction when ...

    ... the new dataset is small and similar to the original dataset. The higher-level features learned from the original dataset should transfer well to the new dataset.

    Consider finetuning when ...

    ... the new dataset is large and similar to the original dataset. Altering the original weights should be safe because the network is unlikely to overfit the new, large dataset.

    ... the new dataset is small and very different from the original dataset. You could also make the case for training from scratch. If you choose to finetune, it might be a good idea to only use features from the first few layers of the pre-trained network; features from the final layers of the pre-trained network might be too specific to the original dataset.

    Consider training from scratch when ...

    ... the dataset is large and very different from the original dataset. In this case we have enough data to confidently train from scratch. However, even in this case it might be beneficial to initialize the entire network with pretrained weights and finetune it on the new dataset.

    Finally, keep in mind that for a lot of problems you won't need an architecture as complicated and powerful as VGG, Inception, or ResNet. These architectures were made for the task of classifying thousands of complex classes. A smaller network might be a better fit for a smaller problem, especially if you can comfortably train it on moderate hardware.

  • ...

13. Project: Advanced Lane Finding Project [Post]

14. Machine Learning and Stanley

15. Support Vector Machines

  • SVM is a Support Vector Machine.
    • [Wikipedia] SVM is a supervised machine learning model with associated learning algorithm that analyses classification and regression analysis data.
    • In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. 
    • When data is labeled then supervised SVM can be used , else SVM is not possible. In case of unsupervised data SVM clustering algorithm is used.
  • Uses of SVM:
    • Text and hypertext classification.
    • Hand written characters recognisation.
    • Image classification.
  • Kernal Trick
    • A kernal is  to map lower dimensional features into higher dimensional features so the original non-linear problem can be linear separable in the new feature space.
    • The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral

    • While solving the SVM, we only need to know the inner product of vectors in the coordinate space. Say, we choose a kernel K and P1 and P2  are two points in the original space. K would map these points to K(P1) and K(P2) in the transformed space. To find the solution using SVMs, we only need to compute inner product of the transformed points K(P1) and K(P2).

      If we denote S as the similarity function in transformed space (as expressed in the terms of the original space), then:

      S(P1,P2) = <K(P1),K(P2)>

      The Kernel trick essentially is to define S in terms of original space itself without even defining (or in fact, even knowing), what the transformation function K is. 

  • Parameters
    • Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.

    • The C parameter trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.

    • RBF SVM parameters

16. Decison Trees

原文地址:https://www.cnblogs.com/casperwin/p/7068357.html