jrae源代码解析(二)

本文细述上文引出的RAECost和SoftmaxCost两个类。

SoftmaxCost

我们已经知道。SoftmaxCost类在给定features和label的情况下(超參数给定),衡量给定权重(hidden×catSize)的误差值cost,并指出当前的权重梯度。看代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@Override
    public double valueAt(double[] x)
    {
        if( !requiresEvaluation(x) )
            return value;
        int numDataItems = Features.columns;
         
        int[] requiredRows = ArraysHelper.makeArray(0, CatSize-2);
        ClassifierTheta Theta = new ClassifierTheta(x,FeatureLength,CatSize);
        DoubleMatrix Prediction = getPredictions (Theta, Features);
         
        double MeanTerm = 1.0 / (double) numDataItems;
        double Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
        double RegularisationTerm = 0.5 * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
         
        DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
        DoubleMatrix Delta = Features.mmul(Diff.transpose());
     
        DoubleMatrix gradW = Delta.getColumns(requiredRows);
        DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
         
        //Regularizing. Bias does not have one.
        gradW = gradW.addi(Theta.W.mul(Lambda));
         
        Gradient = new ClassifierTheta(gradW,gradb);
        value = Cost + RegularisationTerm;
        gradient = Gradient.Theta;
        return value;
    }<br><br>public DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>        int numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));<br>        return Activation.valueAt(Input); <br>    }

 是个典型的2层神经网络,没有隐层,首先依据features预測labels,预測结果用softmax归一化,然后依据误差反向传播算出权重梯度。

此处添加200字。

这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每一个label的概率。

计算cost的函数为getLoss。如果目标label的预測输出为p,则每一个样本的cost也即误差函数为:

cost=E(p)=log(p)

依据前述的神经网络后向传播算法,我们得到(j为目标label时,否则为0):

Ewij=Epjhjnetjxi=1pjpj(1pj)xi=(1pj)xi=(labeljpj)featurei

因此我们便理解了以下代码的含义:

1
DoubleMatrix Delta = Features.mmul(Diff.transpose());

 

RAECost

先看实现代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@Override
    public double valueAt(double[] x)
    {
        if(!requiresEvaluation(x))
            return value;
         
        Theta Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);
        FineTunableTheta Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
        Theta2.setWe( Theta2.We.add(WeOrig) );
         
        final RAEClassificationCost classificationCost = new RAEClassificationCost(
                catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
        final RAEFeatureCost featureCost = new RAEFeatureCost(
                AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
     
        Parallel.For(DataCell,
            new Parallel.Operation<LabeledDatum<Integer,Integer>>() {
                public void perform(int index, LabeledDatum<Integer,Integer> Data)
                {
                    try {
                        LabeledRAETree Tree = featureCost.Compute(Data);
                        classificationCost.Compute(Data, Tree);                
                    } catch (Exception e) {
                        System.err.println(e.getMessage());
                    }
                }
        });
         
        double costRAE = featureCost.getCost();
        double[] gradRAE = featureCost.getGradient().clone();
             
        double costSUP = classificationCost.getCost();
        gradient = classificationCost.getGradient();
             
        value = costRAE + costSUP;
        for(int i=0; i<gradRAE.length; i++)
            gradient[i] += gradRAE[i];
         
        System.gc();    System.gc();
        System.gc();    System.gc();
        System.gc();    System.gc();
        System.gc();    System.gc();
         
        return value;
    }

cost由两部分组成,featureCost和classificationCost。程序遍历每一个样本,用featureCost.Compute(Data)生成一个递归树,同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。

原文地址:https://www.cnblogs.com/mfrbuaa/p/5344125.html