ZJU-ICPC Summer 2020 Contest 8 C-Thresholding

题面：

In digital image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images. Here, a grayscale image is an image with only gray colors (but the gray scale may not be the same), and a binary image is an image with only black and white colors.

To create a binary image from a grayscale image, we set a global threshold. For those pixels with gray scales greater than the threshold, we set them white. For the others, we set them black. The quality of the result image depends on the threshold we set.

One of the most famous method of finding an optimal threshold is Otsu's method. In brief, we divide the pixels in the image into two classes, so that the intra-class variance is minimized.

Assume that if we randomly select a pixel, the probability that it is in the first class is (omega_0), and the probability that is is in the second class is (omega_1). The variance of the first class is (sigma_{0}^{2}), and the variance of the second class is (sigma_{1}^{2}). We define intra-class variance as a weighted sum of variances of the two classes:

[sigma_{omega}^{2} = omega_{0}sigma_{0}^{2} + omega_{1}sigma_{1}^{2} ]

Note that the variance of (k) numbers (a_1,a_2,a_3, dots, a_k) is defined as follows:

[sigma^{2} = frac{sum_{i=1}^{k} (a_i - overline{a})^2 }{k} \ overline{a}={sum_{i=1}^k a_i over k}\ ]

The variance of an empty class is defined as (0).

You are given a grayscale image. For some reason, this image stores one pixel using two bytes, so the gray scale of a pixel is from (0) to (65535). Please find a threshold using Otsu's method. Considering the insufficient precision, you only need to output the minimal intra-class variance. Your answer will be considered correct if it has an absolute or relative error not greater than (10^{−6}).

Input
The first line contains two integers (h) and (w) ((1leq h,w leq 640)), representing the height and the weight of the image.

Each of the next (h) lines contains (w) integers (ai,j) ((0 leq ai,j leq 65535)), representing the gray scale of each pixel.

Output
The only line contains a real number, representing the minimal intra-class variance.

Example
input
2 3
5 7 6
9 8 2
output
1.5833333333

思路

首先所有的数会分成两类，所以排序所有数，枚举分类的断点。

很容易得到两类数的总和/平均数/数的个数。

再展开方差的公式

[sigma^{2} = frac{sum_{i=1}^{k} (a_i - overline{a})^2 }{k} \ k * sigma^{2} = sum_{i=1}^{k} a_i^2 + overline{a}^2 - 2 * a_i*overline{a} \ ]

明显预处理(a_i^2,a_i)的前缀和即可

#include<bits/stdc++.h>
using namespace std;

int n,m;
int cnt;

int a[450000];
long double tot1,tot2;
long double s1[450000],s2[450000];

int main(){
    scanf("%d%d",&n,&m);
    for(int i = 1; i <= n; ++ i)
    for(int j = 1; j <= m; ++ j){
        int x; scanf("%d",&x);
        a[++ cnt] = x; tot2 += x;
    }
    sort(a + 1, a + cnt + 1);
    
    for(int i = 1; i <= cnt; ++ i){
        s1[i] = a[i];
        s2[i] = 1ll * a[i] * a[i];
        s1[i] += s1[i - 1];
        s2[i] += s2[i - 1];
    }
    
    long double ans = 1e30;
    
    tot1 = 0;
    for(int i = 0; i <= cnt; ++ i){
        tot1 += a[i]; tot2 -= a[i];
        if(a[i] == a[i + 1]) continue;
        long double t1 = 0, t2 = 0;
        
        if(tot1 != 0){
            t1 += s2[i];
            t1 += (tot1 * tot1 - 2 * s1[i] * tot1) / i;
        }
        if(tot2 != 0){
            t2 += s2[cnt] - s2[i];
         	t2 += (tot2 * tot2 - 2 * (s1[cnt] - s1[i]) * tot2 )/ (cnt - i);
        }
        
        //printf("%Lf
",t1 + t2);
        if(ans > t1 + t2) ans = t1 + t2;
    }
    printf("%.20Lf
",ans / cnt);
    return 0;
}