机器学习路线图

Self-Study Guide to Machine Learning

by Jason Brownlee on December 4, 2013 in Start Machine Learning

Last Updated on April 21, 2018

There are lots of things you can do to learn about machine learning.

There are resources like books and courses you can follow, competitions you can enter and tools you can use.

In this post I want to put some structure around these activities and suggest a loose ordering of what to tackle when in your journey from programmer to machine learning master.

Four Levels of Machine Learning

Consider four levels of competence in machine learning. This is a model to help us think about the resources and activities available and when a good time to tackle them might be.

Beginner
Novice
Intermediate
Advanced

I want to separate beginner from novice here because I want to show that an absolute beginner (a programmer with an interest in the field) has a path before them if they choose.

We are going to tour through each of these four levels and look at resources and activities that can help someone at one level learn more and progress their understanding and skill levels.

The breakdown is just a suggestion, and it is very likely that some activity or resource at a level before or after can be very useful and appropriate at a given level in the breakdown.

I think the overall structure is useful, I’m keen to hear what you think, leave a comment below with your thoughts.

Credited to pugetsoundphotowalks, some rights reserved

Beginner

A beginner is a programmer with an interest in machine learning. They may have started to read a book, Wikipedia page, or taken a few lessons in a course, but they don’t really “get it” yet. They’re frustrated because the advice they are getting is for intermediates and advanced levels.

Beginners need a gentle introduction. Away from code and textbooks and courses. They need the whys and whats and hows pointed out first to lay the foundation for novice-level material.

Some activities and resources for the absolute beginner are:

Introductions from Books: Read introductions to good machine learning and data mining books for programmers, such as Machine Learning for Hackers (affiliate link), Programming Collective Intelligence (affiliate link) and Data Mining: Practical Machine Learning Tools and Techniques (affiliate link). These are great books for beginners, a topic you can read more about in the post Best Machine Learning Resources for Getting Started.
Overview Videos: Watch presentations that give an overview of machine learning to laymen audiences. Some examples include: Interview with Tom Mitchell and Peter Norvig on big data Facebook Tech Talk.
Talk to People: Ask how they got started in the field, what resources they recommend for beginners, what excites them about the field.

Novice

A novice has had some contact with the field of Machine Learning. They have read a book or taken a course. They know they are interested and they want to know more. They are starting to get it and want to start to get things done.

Novices need something to do. They need to be put into action to have the material grounded and integrated into existing knowledge structures like the programming languages they know or the problems they are used to solving.

Some activities and resources for the novice are:

Complete a Course: Take and complete a course like the Stanford Machine Learning course. Take a lot of notes, complete the homework if possible, ask a lot of questions.
Read some Books: Not textbooks, but friendly books like those listed above targeted at beginner programmers.
Learn a Tool: Learn to drive a tool or library like Scikit-Learn, WEKA, R or similar. Specifically, learn how to use an algorithm you have read or learned about in a book or course. See it in action and get used to trying things out as you learn them.
Write Some Code: Implement a simpler algorithm like a perceptron, k-nearest neighbour or linear regression. Write little programs to demystify methods and learn all the micro-decisions required to make it work.
Complete Tutorials: Follow and complete tutorials. Start building up a directory of small projects that you have completed with datasets, scripts and even source code you can look back on, read and think about.

Intermediate

A novice has read some books and completed some courses. They know how to drive some tools and have written a bunch of code both implementing simple algorithms and completing tutorials. An intermediate is breaking out on their own, devising their own projects to learn new techniques and interacting and learning from the greater community.

The intermediate is learning how to implement and wield algorithms accurately, competently and robustly. They are also building the skills of spending a lot of time with data up front, cleaning, summarizing and thinking about the types of questions that it can answer.

Some activities and resources for the intermediate are:

Small Projects: Devise small programming projects and experiments where machine learning can be used to solve a problem. This is like designing and executing your own tutorials in order to explore a technique you’re interested in. You may implement an algorithm or link to a library that provide the algorithm. Learn more about small projects.
Data Analysis: Get used to exploring and summarizing datasets. Automate reports, know which tools to use when, and look for data you can explore, clean, and on which you can practice techniques and communicate something interesting.
Read Textbooks: Read and internalize textbooks on machine learning. This may very well require skills to grok mathematical descriptions of techniques and acknowledging formalisms that describe classes of problems and algorithms.
Write Plugins: Write plugins and packages for open source machine learning platforms and libraries. This is an exercise in learning how to write robust and production level algorithm implementations. Use your own plugins on projects, ask for code reviews from the community and work to get the code included into the platform if possible. Getting feedback and learning is the goal.
Competitions: Participate in machine learning competitions, such as those associated with conferences or offered on platforms like Kaggle. Get involved in discussions, ask questions, learn how other practitioners are approaching the problem. Add to your repository of projects, methods and code from which you can draw.

Advanced

An advanced practitioner has written a lot of code either integrating machine learning algorithms or implementing algorithms themselves. They may have competed in competitions or written plugins. They have read the textbooks, completed the courses and have a broad knowledge of the field, as well as a deep knowledge on a few key techniques of which they prefer.

The advanced practitioner builds, deploys and maintains production systems that use machine learning. They keep abreast of new developments in the fields and eagerly seek out and learn the nuances of a method and tips passed around from other frontline practitioners like themselves.

Some activities and resources for the advanced practitioner are:

Customizing Algorithms: Modify algorithms to meet their needs, which may involve implementing customizations outlined in conference and journal papers for similar problem domains.
New Algorithms: Devising entirely new methods based on the underlying formalisms to meet the challenges they encounter. It is more about getting the best results possible rather than advancing the frontier of the field.
Case Studies: Read and even recreate case studies completed for machine learning competitions and by other practitioners. These “how I did it” papers and posts are usually chock full of subtle pro tips for data preparation, feature engineering and technique usage.
Methodology: Systemizing of processes, whether formally or for themselves. They have a way to approach problems and get results at this point and they are actively looking for ways to further refine and improve that process with tips, best practices and new and better techniques.
Research: Attending conferences, reading research papers and monographs, having conversations with experts in the field. They may write up some of their work and submit it for publication, or just drop it in a blog post and get back to work.

Mastery is continuous, the learning does not end. One could pause and detour at any point along this journey and become the “competition guy” or the “pro library guy“. In fact, I expect such detours to be the norm.

This breakdown could be read as a linear path of the technicians journey from beginner to advanced level, it’s intentionally programmer centric. I’m keen to hear criticisms of this reading so that I can make it better. This breakdown is just my suggestions of the types of activities to tackle if you find yourself hungering for more at a given level.

So what level are you and what are you going to take on next? Leave a comment!

UPDATE: Continue the discussion on Reddit.

===============================================================================================

===============================================================================================

===============================================================================================

===============================================================================================

http://machinelearningmastery.com/machine-learning-roadmap-your-self-study-guide-to-machine-learning/

在这个帖子中我将展示一个具体的用于机器学习的自学路线，它可以给你指引方向并指出下一步的工作。

我考虑了很多的框架和系统性的方法（从我的blog可以看出）。之前我在论坛里发表了一篇“机器学习自学指南”的帖子引起了大家的共鸣，我认为这篇文章将极大的扩展之前想法。

让我们投入进去吧......

机器学习的路线图

机器学习是一个巨大的的研究领域。对于要学习如此多的算法、理论、技术和各类问题，让我们感觉不知所措。

机器学习也是深度跨学科的。你可以从材料学研究员跳跃到程序员，或跳跃到统计学家，假定需要这么多的先验知识让人感到很沮丧。

我们需要一个结构化的方法，该方法能够为研究机器学习的相关主题和细节层次提供路线图，同时也可以整合受欢迎的书籍和公开课等资源。

结构化方法，通过将注意力集中在你需要学习的内容，以解决学习的压力。它通过侧重于实用性来依次展示相关的材料，以解决学习的挫败感，它是为工程师和程序员量身定制的。

这张线路图可以让你知道当前的位置和想到达的位置，明确学习的方向。

自学是路径

自学是指以你自己的节奏，在你自己的条件下按照自己的时间进度。

自学是学习机器学习最好的方法。但这并不意味你必须完全靠自己去做，千万不要这样。它指的是用你最有效的方式学习，并利用网络上提供的最好的课程，书籍和指南。

自学也兼容更正式的本科和研究生课程的研究。这意味着你可以积极的将机器学习的资料整合到你的知识体系中，并拥有这个过程。在这个过程中，你可以更加深入到非常感兴趣的领域。

机器学习像编程一样，是一门应用学科。学习理论很重要，但是你必须投入时间去应用理论。你必须去实践，这个非常关键。你需要建立对处理，算法和问题的直觉。

能力等级

学习机器学习的结构化方法分为如下四个等级的能力：

1.新手（Beginner）

2.初级（Novice）

3.中级（Intermediate）

4.高级（Advanced）

这四个等级是基于他们面临的问题和学习目标来划分的。按照顺序，每个等级为了达到各自的目标都会有不同的活动。

各等级的问题

各个能力等级面临不同的问题集合，如下：

新手：对机器学习到底是什么感到困惑。对大量的信息感到不知所措。对大多数可用的信息未指明先验知识感到沮丧。

初级：被数学描述的算法吓到。苦苦挣扎于机器学习的应用问题。缺少寻找问题探索机器学习的能力。

中级：对于介绍性的材料感到厌倦。非常需要了解更多的细节和更深入的见解。渴望证明和展示他们的知识和技能。

高级：痴迷于从各系统和解决方案中获得更多的知识。同时寻求对机器学习更大的贡献机会，不断激励自己超越自己。

学习的目标

每个层次的能力有一个单一的目标,很多任务可以追求这一目标。这些目标如下：

新手：制定一个明确的基础并开始机器学习之旅

初级：开发和实践应用机器学习的过程

中级：开发对算法，问题和工具有更深入的理解

高级：开发对算法，问题和工具等领域的扩展

自学的行动

每个等级的目标定义了为了完成目标的行动类型。强烈建议你自己设计行动，尽管下面为各个等级列出了建议的行动。

新手

发现关于机器学习的“为什么”（例如为什么机器学习重要，为什么机器学习对你重要）

识别可能阻碍你的自我限制的信念（例如没有学位，数学不好）

调查机器学习领域的基本定义和概念（例如机器学习的问题，机器学习的算法）

初级

研究和学习应用机器学习过程中的各个步骤

充分理解在应用机器学习各个步骤中所使用工具和库的细节（基本熟悉工具和库）

实践应用机器学习端到端解决问题的全过程

中级

小而集中的调研算法，问题和工具

通过参与和学习机器学习竞赛，不断提高应用机器学习的技能

高级

以结构化的方式开发扩展算法，问题和工具

参与并对社区做出贡献

如何使用

这个路线图是一个有用的工具，你可以使用多种方法掌握机器学习。

学习指南：把它作为你完成目标和行动的直线导轨。耐心和努力将会让你在短期内达到更高的等级。

流线型指南：和上一个指南一样，但是将你希望掌握的机器学习目标缩小到一个特定的领域，而不是在广泛的应用机器学习的领域。这可能是一个特定的问题或某一类的算法。

信息过滤：路线图可以用来过滤你遇到的信息和资源。这是一个非常有用的功能，它可以帮助你迅速的评估一篇博客，文章或书籍是否与你现在的能力等级相关。

这条路径是适合你的！

我已经给其他的工程师和程序员设计了这个指南。

你可能懂如何编程

你可能是（或曾经是）职业化的工程师或程序员

你可能是本科或研究生

你对机器学习或数据挖掘感兴趣

你可能正在从事机器学习或数据挖掘

这个方法是专门为已经熟悉开发和构建系统过程的工程师和程序员定制的。他们有计算或逻辑的思维方式以及系统化的思考方法。尤其是程序员，他们已经熟悉自动化的力量和算法的复杂性和特性。

这个方法不仅对专职程序员有效，也对正在学习工程学，计算机科学或相关学科的学生有效。

你不必成为一名程序员或一名优秀的程序员。你可以使用现成的工具，如Weka，通过可视化界面处理机器学习问题和使用算法。

你不必成为一名数学家或统计学家。当你需要学习某个算法时，只需要回顾这个算法所需的统计学，概率论和线性代数的相关知识。

你可以阅读指南，书籍和公开课。他们非常适用于上述四个能力等级。一本给定的书籍可能对初级或中级（或者跨越这两级）是很好的参考。同样的，一个课程能介绍各种机器学习活动的例子，它可能适用于一个特定的等级或跨越两或多个等级。

范围

我建议你把学习范围集中在分类和回归问题相关的算法和工具。这是两个非常常见的潜在机器学习问题，其它问题都可以演变成这两类问题。

机器学习的子领域，如计算机视觉，自然语言处理，推荐系统或强化学习。这些领域可以演变成分类和回归问题，他们的学习也适用于上面提出的结构化路线图。我建议在你达到中级水平时再进入这些领域。

原则

我有几个务实的原则，可以帮助你让你朝着机器学习的目标快速而有效的前进。它们实际上构成了路线图的框架。

机器学习是一种旅行。你需要知道你现在在哪里和你想去的地方。这将需要大量的时间和艰苦的工作，但对你非常有用。

创建半正式的工作产品。在阅读博客文章，技术报告和开源代码的时候，记下从中学习和发现到的东西。你会很快的积攒一系列可以展示的技能，和一系列你和其他人可以仔细思考的知识。

及时学习。不要学习复杂的专题，直到你需要它的时候。例如，你只要掌握足够的概率和线性代数来了解你正在使用的算法，而不是先花三年的时间学习统计学和数学，然后再开始学习机器学习。

利用现有的技能。如果你会编码，实现算法理解他们，而不是学习数学。使用你知道的编程语言。专注于你正在学习的一件事情，而不是在同一时间学习新的语言，工具或库，使事情变的复杂。

精通是一种理想状态。精通机器学习需要持续不断的学习。事实上你是不可能达到的，你只能不断的学习，理解和提高。

提示

以下有3个提示可以让你更有效的使用这篇指南，帮助你走上机器学习的旅程。

从一个小项目开始，你可以在一个小时内完成

每周完成一个项目，保持你学习的动力，建立和维护项目的工作空间

在你的社交网络上分享你的成果，如博客，Facebook，Google+，Github或者任何你可以展示兴趣，技能和知识并获得反馈的地方。

行动步骤

花点时间回答下面两个问题：

你认为你处于什么水平，同时你现在在努力做什么？

你想达到什么水平，同时你认为你能够做些什么？

作者：郭建聪
链接：https://www.jianshu.com/p/82f39ed4f089
来源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。