误差计算

Outline
MSE
Entropy
Cross Entropy
Binary Classification
- Single output
Classification
Why not MSE？
logits-->CrossEntropy

Outline

MSE
Cross Entropy Loss
Hinge Loss

MSE

$l o s s = \frac{1}{N} \sum (y - o u t)^{2}$
$L_{2 - n o r m} = \sqrt{\sum (y - o u t)}$

import tensorflow as tf

y = tf.constant([1, 2, 3, 0, 2])
y = tf.one_hot(y, depth=4)  # max_label=3种
y = tf.cast(y, dtype=tf.float32)
out = tf.random.normal([5, 4])

out

<tf.Tensor: id=117, shape=(5, 4), dtype=float32, numpy=
array([[ 0.8138832 , -1.1521571 ,  0.05197939,  2.3684442 ],
       [ 0.28827545, -0.35568208, -0.3952962 , -1.2576817 ],
       [-0.4354525 , -1.9914867 ,  0.37045303, -0.38287213],
       [-0.7680094 , -0.98293644,  0.62572837, -0.5673917 ],
       [ 1.5299634 ,  0.38036177, -0.28049606, -0.708137  ]],
      dtype=float32)>

loss1 = tf.reduce_mean(tf.square(y - out))
loss1

<tf.Tensor: id=122, shape=(), dtype=float32, numpy=1.5140966>

loss2 = tf.square(tf.norm(y - out)) / (5 * 4)
loss2

<tf.Tensor: id=99, shape=(), dtype=float32, numpy=1.3962512>

loss3 = tf.reduce_mean(tf.losses.MSE(y, out))
loss3

<tf.Tensor: id=105, shape=(), dtype=float32, numpy=1.3962513>

Entropy

Uncertainty
measure of surprise
lower entropy --> more info.

Entropy = - \sum_{i} P (i) l o g P (i)

a = tf.fill([4], 0.25)
a * tf.math.log(a) / tf.math.log(2.)

<tf.Tensor: id=134, shape=(4,), dtype=float32, numpy=array([-0.5, -0.5, -0.5, -0.5], dtype=float32)>

-tf.reduce_sum(a * tf.math.log(a) / tf.math.log(2.))

<tf.Tensor: id=143, shape=(), dtype=float32, numpy=2.0>

a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a * tf.math.log(a) / tf.math.log(2.))

<tf.Tensor: id=157, shape=(), dtype=float32, numpy=1.3567797>

a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a * tf.math.log(a) / tf.math.log(2.))

<tf.Tensor: id=167, shape=(), dtype=float32, numpy=0.24194068>

Cross Entropy

H (p, q) = - \sum p (x) l o g q (x) H (p, q) = H (p) + D_{K L} (p | q)

for p = q
- Minima: H(p,q) = H(p)
for P: one-hot encodint
- $h (p : [0, 1, 0]) = - 1 l o g 1 = 0$
- $H ([0, 1, 0], [p_{0}, p_{1}, p_{2}]) = 0 + D_{K L} (p | q) = - 1 l o g q_{1}$ # p,q即真实值和预测值相等的话交叉熵为0

Binary Classification

Two cases（第二种格式只需要输出一种情况，节省计算，无意义）

Single output

H (P, Q) = - P (c a t) l o g Q (c a t) - (1 - P (c a t)) l o g (1 - Q (c a t)) P (d o g) = (1 - P (c a t))

\begin{aligned} H (P, Q) & = - \sum_{i = (c a t, d o g)} P (i) l o g Q (i) \\ = - P (c a t) l o g Q (c a t) - P (d o g) l o g Q (d o g) - (y l o g (p) + (1 - y) l o g (1 - p)) \end{aligned}

Classification

$H ([0, 1, 0], [p_{0}, p_{1}, p_{2}]) = 0 + D_{K L} (p | q) = - 1 l o g q_{1}$

\begin{aligned} P_{1} = [1, 0, 0, 0, 0] \\ Q_{1} = [0.4, 0.3, 0.05, 0.05, 0.2] \end{aligned}

\begin{aligned} H (P_{1}, Q_{1}) & = - \sum P_{1} (i) l o g Q_{1} (i) \\ = - (1 l o g 0.4 + 0 l o g 0.3 + 0 l o g 0.05 + 0 l o g 0.05 + 0 l o g 0.2) \\ = - l o g 0.4 \\ \approx 0.916 \end{aligned}

\begin{aligned} P_{1} = [1, 0, 0, 0, 0] \\ Q_{1} = [0.98, 0.01, 0, 0, 0.01] \end{aligned}

\begin{aligned} H (P_{1}, Q_{1}) & = - \sum P_{1} (i) l o g Q_{1} (i) \\ = - l o g 0.98 \\ \approx 0.02 \end{aligned}

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.25, 0.25, 0.25, 0.25])

<tf.Tensor: id=186, shape=(), dtype=float32, numpy=1.3862944>

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.1, 0.8, 0.1])

<tf.Tensor: id=205, shape=(), dtype=float32, numpy=2.3978953>

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.1, 0.7, 0.1, 0.1])

<tf.Tensor: id=243, shape=(), dtype=float32, numpy=0.35667497>

tf.losses.categorical_crossentropy([0, 1, 0, 0], [0.01, 0.97, 0.01, 0.01])

<tf.Tensor: id=262, shape=(), dtype=float32, numpy=0.030459179>

tf.losses.BinaryCrossentropy()([1],[0.1])

<tf.Tensor: id=306, shape=(), dtype=float32, numpy=2.3025842>

tf.losses.binary_crossentropy([1],[0.1])

<tf.Tensor: id=333, shape=(), dtype=float32, numpy=2.3025842>

Why not MSE？

sigmoid + MSE
- gradient vanish
converge slower
However
- e.g. meta-learning

误差计算

Outline

MSE

Entropy

Cross Entropy

Binary Classification

Single output

Classification

Why not MSE？

logits-->CrossEntropy