jrae源代码解析（二）

本文细述上文引出的RAECost和SoftmaxCost两个类。

SoftmaxCost

我们已经知道。SoftmaxCost类在给定features和label的情况下（超參数给定），衡量给定权重（hidden×catSize）的误差值cost,并指出当前的权重梯度。看代码。

@Override

    public double valueAt(double[]
 x) 

    {

        if(
 !requiresEvaluation(x) )

            return value;

        int numDataItems
 = Features.columns;

         

        int[]
 requiredRows = ArraysHelper.makeArray(0,
 CatSize-2);

        ClassifierTheta
 Theta = new ClassifierTheta(x,FeatureLength,CatSize);

        DoubleMatrix
 Prediction = getPredictions (Theta, Features);

         

        double MeanTerm
 = 1.0 /
 (double)
 numDataItems;

        double Cost
 = getLoss (Prediction, Labels).sum() * MeanTerm; 

        double RegularisationTerm
 = 0.5 *
 Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);

         

        DoubleMatrix
 Diff = Prediction.sub(Labels).muli(MeanTerm);

        DoubleMatrix
 Delta = Features.mmul(Diff.transpose());

     

        DoubleMatrix
 gradW = Delta.getColumns(requiredRows);

        DoubleMatrix
 gradb = ((Diff.rowSums()).getRows(requiredRows));

         

        //Regularizing.
 Bias does not have one.

        gradW
 = gradW.addi(Theta.W.mul(Lambda));

         

        Gradient
 = new ClassifierTheta(gradW,gradb);

        value
 = Cost + RegularisationTerm;

        gradient
 = Gradient.Theta;

        return value;

    }<br><br>public DoubleMatrix
 getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>        int numDataItems
 = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));<br>  
      return Activation.valueAt(Input);
 <br>    }

是个典型的2层神经网络，没有隐层，首先依据features预測labels，预測结果用softmax归一化，然后依据误差反向传播算出权重梯度。

此处添加200字。

这个典型的2层神经网络，label为一列向量，目标label置1，其余为0；转换函数为softmax函数，输出为每一个label的概率。

计算cost的函数为getLoss。如果目标label的预測输出为p∗，则每一个样本的cost也即误差函数为：

c o s t = E (p *) = - log (p *)

依据前述的神经网络后向传播算法，我们得到(j为目标label时，否则为0)：

\partial E \partial w i j = \partial E \partial p j \partial h j \partial n e t j x i = - 1 p j p j (1 - p j) x i = - (1 - p j) x i = - (l a b e l j - p j) f e a t u r e i

因此我们便理解了以下代码的含义：

DoubleMatrix
 Delta = Features.mmul(Diff.transpose());

RAECost

先看实现代码：

@Override

    public double valueAt(double[]
 x)

    {

        if(!requiresEvaluation(x))

            return value;

        Theta
 Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);

        FineTunableTheta
 Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);

        Theta2.setWe(
 Theta2.We.add(WeOrig) );

        final RAEClassificationCost
 classificationCost = new RAEClassificationCost(

                catSize,
 AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);

        final RAEFeatureCost
 featureCost = new RAEFeatureCost(

                AlphaCat,
 Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);

        Parallel.For(DataCell,

            new Parallel.Operation<LabeledDatum<Integer,Integer>>()
 {

                public void perform(int index,
 LabeledDatum<Integer,Integer> Data)

                {

                    try {

                        LabeledRAETree
 Tree = featureCost.Compute(Data);

                        classificationCost.Compute(Data,
 Tree);                 

                    }
catch (Exception
 e) {

                        System.err.println(e.getMessage());

                    }

                }

        });

        double costRAE
 = featureCost.getCost();

        double[]
 gradRAE = featureCost.getGradient().clone();

        double costSUP
 = classificationCost.getCost();

        gradient
 = classificationCost.getGradient();

        value
 = costRAE + costSUP;

        for(int i=0;
 i<gradRAE.length; i++)

            gradient[i]
 += gradRAE[i];

        System.gc();   
 System.gc();

        System.gc();   
 System.gc();

        System.gc();   
 System.gc();

        System.gc();   
 System.gc();

        return value;

    }

cost由两部分组成，featureCost和classificationCost。程序遍历每一个样本，用featureCost.Compute(Data)生成一个递归树，同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。