TensorFlow2_200729系列---9、前k项准确率实例

一、总结

一句话总结：

就是用tf.math.top_k可以得到概率前k的项的索引构成的数组，然后在普通计算即可，核心点就是矩阵的一些操作

def accuracy(output, target, topk=(1,)):
    maxk = max(topk)
    # print("maxk: ",maxk)
    batch_size = target.shape[0]
    # print("batch_size: ",batch_size)
    pred = tf.math.top_k(output, maxk).indices
    pred = tf.transpose(pred, perm=[1, 0])
    target_ = tf.broadcast_to(target, pred.shape)
    # [10, b]
    correct = tf.equal(pred, target_)

    res = []
    for k in topk:
        correct_k = tf.cast(tf.reshape(correct[:k], [-1]), dtype=tf.float32)
        correct_k = tf.reduce_sum(correct_k)
        acc = float(correct_k* (100.0 / batch_size) )
        res.append(acc)

    return res

1、随机正态分布，4行3列？

output = tf.random.normal([4, 3])

2、softmax，使概率和为1，横向概率和为1？

output = tf.math.softmax(output, axis=1) # 表示取第一件物品、第二件物品、第三件物品的概率和为1

3、在output中找最大值的索引，横向？

pred = tf.argmax(output, axis=1)

4、将target扩充到pred.shape的shape？

target_ = tf.broadcast_to(target, pred.shape)

二、前k项准确率实例

博客对应课程的视频位置：

import  tensorflow as tf
import  os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.random.set_seed(2467)

In [4]:

# 随机正态分布，4行3列
output = tf.random.normal([4, 3])
output

Out[4]:

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[ 0.13880357,  0.48238105,  0.23294356],
       [ 0.66247475, -0.6401166 ,  0.38740396],
       [-0.63246864, -0.18041448, -0.63046396],
       [-1.7205269 ,  1.5346191 ,  0.741934  ]], dtype=float32)>

In [5]:

# softmax，使概率和为1，横向概率和为1
# 表示取第一件物品、第二件物品、第三件物品的概率和为1
output = tf.math.softmax(output, axis=1)
output

Out[5]:

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[0.28500617, 0.40185377, 0.31314012],
       [0.4922847 , 0.133816  , 0.37389928],
       [0.27983427, 0.43976992, 0.2803958 ],
       [0.0258685 , 0.6705995 , 0.303532  ]], dtype=float32)>

In [8]:

# 0-2，随机label做target
# uniform：均匀分布：产生于low和high之间，产生的值是均匀分布的
target = tf.random.uniform([4], maxval=3, dtype=tf.int32)
target

Out[8]:

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 0, 2, 2])>

In [9]:

print('prob:', output.numpy())

prob: [[0.28500617 0.40185377 0.31314012]
 [0.4922847  0.133816   0.37389928]
 [0.27983427 0.43976992 0.2803958 ]
 [0.0258685  0.6705995  0.303532  ]]

In [10]:

# 在output中找最大值的索引，横向
pred = tf.argmax(output, axis=1)
print('pred:', pred.numpy())
print('label:', target.numpy())

pred: [1 0 1 1]
label: [1 0 2 2]

根据output和target做判断

output对应：
```
array([[0.28500617, 0.40185377, 0.31314012],
   [0.4922847 , 0.133816  , 0.37389928],
   [0.27983427, 0.43976992, 0.2803958 ],
   [0.0258685 , 0.6705995 , 0.303532  ]], dtype=float32)>
```
这是output中预测概率最大的物品的序号
pred: [1 0 1 1]

target对应：label: [1 0 2 2]

In [11]:

def accuracy(output, target, topk=(1,)):
    maxk = max(topk)
    # print("maxk: ",maxk)
    batch_size = target.shape[0]
    # print("batch_size: ",batch_size)
    pred = tf.math.top_k(output, maxk).indices
    pred = tf.transpose(pred, perm=[1, 0])
    target_ = tf.broadcast_to(target, pred.shape)
    # [10, b]
    correct = tf.equal(pred, target_)

    res = []
    for k in topk:
        correct_k = tf.cast(tf.reshape(correct[:k], [-1]), dtype=tf.float32)
        correct_k = tf.reduce_sum(correct_k)
        acc = float(correct_k* (100.0 / batch_size) )
        res.append(acc)

    return res

output中概率最大物品的序号：pred: [1 0 1 1]

target对应：label: [1 0 2 2]

从上面可以看到，前两个是一样的，top1是50%

总共三件物品，top3肯定是100%

仔细对比数据，也可以发现top2刚好是100%

label：

[1 0 2 2]

output：

array([[0.28500617, 0.40185377, 0.31314012],
   [0.4922847 , 0.133816  , 0.37389928],
   [0.27983427, 0.43976992, 0.2803958 ],
   [0.0258685 , 0.6705995 , 0.303532  ]], dtype=float32)>

[1 0 2 2] 对应的值

array([[***, 0.40185377, ***],
   [0.4922847 , ***  , ***],
   [***, ***, 0.2803958 ],
   [*** , *** , 0.303532  ]], dtype=float32)>

top2正确率显然是100%

In [13]:

# topk 表示前k项的正确率
acc = accuracy(output, target, topk=(1,2,3))
print('top-1-3 acc:', acc)

top-1-3 acc: [50.0, 100.0, 100.0]

详细测试accuracy函数

In [14]:

topk=(1,2,3)
maxk = max(topk)
print("maxk: ",maxk)

maxk:  3

In [15]:

print(target)
batch_size = target.shape[0]
print("batch_size: ",batch_size)

tf.Tensor([1 0 2 2], shape=(4,), dtype=int32)
batch_size:  4

In [16]:

output

Out[16]:

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[0.28500617, 0.40185377, 0.31314012],
       [0.4922847 , 0.133816  , 0.37389928],
       [0.27983427, 0.43976992, 0.2803958 ],
       [0.0258685 , 0.6705995 , 0.303532  ]], dtype=float32)>

In [18]:

# 计算output的前k大（概率大）的索引
# 比如 [0.28500617, 0.40185377, 0.31314012]中，按照大小排序，那就是[1 2 0]
pred = tf.math.top_k(output, maxk).indices
print(pred)
print("===========================================")
# 转置
pred = tf.transpose(pred, perm=[1, 0])
print(pred)

tf.Tensor(
[[1 2 0]
 [0 2 1]
 [1 2 0]
 [1 2 0]], shape=(4, 3), dtype=int32)
===========================================
tf.Tensor(
[[1 0 1 1]
 [2 2 2 2]
 [0 1 0 0]], shape=(3, 4), dtype=int32)

In [21]:

# 将target扩充到pred.shape的shape
target_ = tf.broadcast_to(target, pred.shape)
print("对比pred和target_，结果很明显：")
print(pred)
print(target_)
print("===============================================")
# [10, b]
correct = tf.equal(pred, target_)
print(correct)

对比pred和target_，结果很明显：
tf.Tensor(
[[1 0 1 1]
 [2 2 2 2]
 [0 1 0 0]], shape=(3, 4), dtype=int32)
tf.Tensor(
[[1 0 2 2]
 [1 0 2 2]
 [1 0 2 2]], shape=(3, 4), dtype=int32)
===============================================
tf.Tensor(
[[ True  True False False]
 [False False  True  True]
 [False False False False]], shape=(3, 4), dtype=bool)

In [22]:

# 很简单的就可以在correct中统计tok的正确率了
res = []
for k in topk:
    correct_k = tf.cast(tf.reshape(correct[:k], [-1]), dtype=tf.float32)
    correct_k = tf.reduce_sum(correct_k)
    acc = float(correct_k* (100.0 / batch_size) )
    res.append(acc)
res

Out[22]:

[50.0, 100.0, 100.0]

In [ ]:

我的旨在学过的东西不再忘记（主要使用艾宾浩斯遗忘曲线算法及其它智能学习复习算法）的偏公益性质的完全免费的编程视频学习网站： fanrenyi.com；有各种前端、后端、算法、大数据、人工智能等课程。

版权申明：欢迎转载，但请注明出处

一些博文中有一些参考内容因时间久远找不到来源了没有注明，如果侵权请联系我删除。

聊技术，交朋友，修心境，qq404006308，微信fan404006308

人工智能群：939687837

作者相关推荐

感悟总结