如何设计神经网络结构

start small
gradually increase the model size
small parameter, deep is better than wider; deep network is hard to optimize, 使用resnet的思想进行优化
kernel size ： 3*3 and 1*1 work the best
stride :
1. 保留空间分辨率使用 stride = 1
2. 下采样使用stride = 2
3. 上采样使用stride = 1 or 2
batch Size:
1. 通常使用32为batch
2. Noisy gradient : larger batch
3. local minima : smaller batch
划分数据集
1. 大的数据集如10W+: 99% train 1% test and valid
2. 小的数据集如1W: 80% train 20% test and valid