Local Response Normalization 60 million parameters and 500,000 neurons

CNN是工具，在图像识别中是发现图像中待识别对象的特征的工具，是剔除对识别结果无用信息的工具。

ImageNet Classification with Deep Convolutional Neural Networks

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

http://caffe.berkeleyvision.org/tutorial/layers/lrn.html

【侧抑制】

The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions.

https://prateekvjoshi.com/2016/04/05/what-is-local-response-normalization-in-convolutional-neural-networks/

Why do we need normalization layers in the first place?

A typical CNN consists of the following layers: convolution, pooling, rectified linear unit (ReLU), fully connected, and loss. If the previous sentence didn’t make sense, you may want to go through a quick CNN tutorial before proceeding further. Anyway, the reason we may want to have normalization layers in our CNN is that we want to have some kind of inhibition scheme.

In neurobiology, there is a concept called “lateral inhibition”. Now what does that mean? This refers to the capacity of an excited neuron to subdue its neighbors. We basically want a significant peak so that we have a form of local maxima. This tends to create a contrast in that area, hence increasing the sensory perception. Increasing the sensory perception is a good thing! We want to have the same thing in our CNNs.

What exactly is Local Response Normalization?

Local Response Normalization (LRN) layer implements the lateral inhibition we were talking about in the previous section. This layer is useful when we are dealing with ReLU neurons. Why is that? Because ReLU neurons have unbounded activations and we need LRN to normalize that. We want to detect high frequency features with a large response. If we normalize around the local neighborhood of the excited neuron, it becomes even more sensitive as compared to its neighbors.

At the same time, it will dampen the responses that are uniformly large in any given local neighborhood. If all the values are large, then normalizing those values will diminish all of them. So basically we want to encourage some kind of inhibition and boost the neurons with relatively larger activations. This has been discussed nicely in Section 3.3 of the original paper by Krizhevsky et al.