学习笔记 | What are the advantages of ReLU over sigmoid function in deep neural network?

The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

Best answer in stackexchange:

Two additional major benefits of ReLUs are sparsity and a reduced likelihood of vanishing gradient. But first recall the definition of a ReLU is $h = max (0, a)$

One major benefit is the reduced likelihood of the gradient to vanish. This arises when $a > 0$

The other benefit of ReLUs is sparsity. Sparsity arises when $a \leq 0$

$a \leq 0$

$a \leq 0$ rectified linear unit。上面的回答基本上涵盖了它胜过sigmoid function的几个方面：

faster
more biological inspired
sparsity
less chance of vanishing gradient (梯度消失问题)

$a \leq 0$