调参tips

对于一个模型,都可以从以下几个方面进行调参:

1. 对weight和bias进行初始化(效果很好,一般都可以提升1-2%)

 Point 1 (CNN):

1 for conv in self.convs1:
2    init.xavier_normal(conv.weight, gain=np.sqrt(2.0))  # 对weight进行正态分布初始化
3    # init.normal(conv.weight, mean=0, std=0.1)
4    # init.constant(conv.bias, 0.1)              # 对bias初始化为0.1

Point 2 (LSTM):

(1)Bias vectors are initialized to zero, except the bias b f for the forget gate in LSTM , which is initialized to 1.0 .(参见论文End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF)。weight 使用高斯分布或是均匀分布都可以。详细讲解参考博文Deep Learning 之 参数初始化

(2)简单的设置就是,weight设为0.1,bias设为0。

1         init.xavier_normal(self.lstm.all_weights[0][0], gain=np.sqrt(2.0))
2         self.lstm.all_weights[0][3].data[20:40].fill_(1)    # forget gate
3         self.lstm.all_weights[0][3].data[0:20].fill_(0)
4         self.lstm.all_weights[0][3].data[40:80].fill_(0)    

注:对于封装好的lstm,其提供了all_weights接口统一对其参数进行初始化,不能单个定义,forget gate对应的下标是20-39。若是使用lstmcell则可以对单个想要修改的参数进行修改。

2. clip  gradients让权重的梯度更新限制在一定范围内,防止单个节点出现梯度爆炸、梯度消失。

1 optimizer.zero_grad()
2 logit = model(feature)
3 loss = F.cross_entropy(logit, target)
4 loss.backward()
5 # clip gradients
6 utils.clip_grad_norm(model.parameters(), 5)
7 optimizer.step()

3. L2 regularization

L2值也叫惩罚值,是为了防止过拟合问题。提供了接口可直接设值,一般设为1e-8。

1 optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=0.01)

4. batch normalization批标准化若设置正确,据说会大大加大迭代速度,效果明显。

若是BatchNorm2d(x),input是(batchsize,channel,height,width),x值对应channel,即维度1。所以channel=0时,求一次mean,var,做一次normalize;channel=1时,求一次.......channel=x时,求一次。BatchNorm1d时情况也是一样的,x对应的是维度1的值,若是不对应,则需要进行转置,如下示例。

1 m = nn.BatchNorm1d(2)
2 input = torch.randn(2, 10)
3 input = Variable(input)
4 input = Variable(torch.transpose(input.data, 0, 1))
5 print(input)
6 output = m(input)
7 print(output)

 Point 1 (CNN):

 1     def __init__(self, args):
 2         super(CNN, self).__init__()
 3         self.bn = nn.BatchNorm2d(1)
 4     
 5     def forward(self, x):
 6             for conv in self.convs1:
 7             xx = conv(x)                        # variable [torch.FloatTensor of size 16x200x35x1]
 8             xx = Variable(torch.transpose(xx.data, 2, 3))
 9             xx = Variable(torch.transpose(xx.data, 1, 2))
10             xx = self.bn(xx)
11             xx = F.relu(xx)
12             xx = xx.squeeze(1)
13             a.append(xx)

Point 2 (LSTM):

1 class BiLSTM(nn.Module):
2     def __init__(self, args):
3         super(BiLSTM, self).__init__()
4         self.bn1 = nn.BatchNorm1d(2*self.hidden_size) 
5 
6   def forward(self, sentence):
7         out = self.bn1(out)
8         out = F.tanh(out)
9         y = self.hidden2label(out)            

结果:以上两种设置并没有提高准确率。

Point 3 (BN-LSTM):

参看论文RECURRENT BATCH NORMALIZATION,不使用pytorch框架,自己实现。

原文地址:https://www.cnblogs.com/Joyce-song94/p/7347775.html