Micro和Macro性能学习【转载】

转自：https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance-in-a-multiclass-classification-settin

1.计算方式不同

A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally)

宏平均会对每类独立地计算指标（精度、召回率、F1值），并且取平均，每类都会平等计算。

a micro-average will aggregate the contributions of all classes to compute the average metric.

微平均会统计所有类的分布来计算平均指标。

In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance (i.e you may have many more examples of one class than of other classes).

在多分类中，如果你怀疑有类不均衡存在，使用微平均更好。

2.例子

假设对于精度计算，，假设现有1对多分类任务，共有4类：

Class A: 1 TP and 1 FP
Class B: 10 TP and 90 FP
Class C: 1 TP and 1 FP
Class D: 1 TP and 1 FP

以上为测试用的数据，可以根据上式计算得到：

对于以上计算结果，由于A、C、D三类的精度是0.5，所以宏平均看起来得到了一个不错的精度为0.4，但是具有误导性，因为B中有一大部分并没有进行正确的分类；

在本例中，B类数据占了94.3%，很明显是存在类不平衡的，微平均能更好的反应结果。

3.计算方法

3.1在计算中，可以先计算类平均，然后是宏平均，之后给出总的标准差：

3.2另一种是使用加权计算的方法，权重是本类样本总数所占的比例：

3.3从以上可以看出，0.173标准差，意味着精度为0.4并不代表各类都是均匀分布的；第二种使用加权的计算方式正是微平均的本质。