statistics of Python

statistics

统计模块支持普通的int float类型,还支持封装的 Decimal 和 Fraction的统计计算。

且输入数据的类型要保持一致。

统计功能分为两个部分:

(1)均值和中心位置度量。-- 均值和中位数。

(2)延展度度量。-- 偏差和标准差。

https://docs.python.org/3.7/library/statistics.html

This module provides functions for calculating mathematical statistics of numeric (Real-valued) data.

Note

Unless explicitly noted otherwise, these functions support int, float, decimal.Decimal and fractions.Fraction. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Mixed types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use map() to ensure a consistent result, e.g. map(float, input_data).

Averages and measures of central location

These functions calculate an average or typical value from a population or sample.

mean()

Arithmetic mean (“average”) of data.

harmonic_mean()

Harmonic mean of data.

median()

Median (middle value) of data.

median_low()

Low median of data.

median_high()

High median of data.

median_grouped()

Median, or 50th percentile, of grouped data.

mode()

Mode (most common value) of discrete data.

Measures of spread

These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values.

pstdev()

Population standard deviation of data.

pvariance()

Population variance of data.

stdev()

Sample standard deviation of data.

variance()

Sample variance of data.

DEMO

https://pymotw.com/3/statistics/index.html

使用方差或者标准差来度量数据的分散度。

其值越小,表示数据更加向均值聚拢。

Statistics uses two values to express how disperse a set of values is relative to the mean. The variance is the average of the square of the difference of each value and the mean, and the standard deviation is the square root of the variance (which is useful because taking the square root allows the standard deviation to be expressed in the same units as the input data). Large values for variance or standard deviation indicate that a set of data is disperse, while small values indicate that the data is clustered closer to the mean.

from statistics import *
import subprocess


def get_line_lengths():
    cmd = 'wc -l ../[a-z]*/*.py'
    out = subprocess.check_output(
        cmd, shell=True).decode('utf-8')
    for line in out.splitlines():
        parts = line.split()
        if parts[1].strip().lower() == 'total':
            break
        nlines = int(parts[0].strip())
        if not nlines:
            continue  # skip empty files
        yield (nlines, parts[1].strip())


data = list(get_line_lengths())

lengths = [d[0] for d in data]
sample = lengths[::2]

print('Basic statistics:')
print('  count     : {:3d}'.format(len(lengths)))
print('  min       : {:6.2f}'.format(min(lengths)))
print('  max       : {:6.2f}'.format(max(lengths)))
print('  mean      : {:6.2f}'.format(mean(lengths)))

print('
Population variance:')
print('  pstdev    : {:6.2f}'.format(pstdev(lengths)))
print('  pvariance : {:6.2f}'.format(pvariance(lengths)))

print('
Estimated variance for sample:')
print('  count     : {:3d}'.format(len(sample)))
print('  stdev     : {:6.2f}'.format(stdev(sample)))
print('  variance  : {:6.2f}'.format(variance(sample)))

计算方差和标准差,有抽样 和 全集模式。

stdev -- 表示抽样

pstdev -- 表示统计所有数据。

Python includes two sets of functions for computing variance and standard deviation, depending on whether the data set represents the entire population or a sample of the population. This example uses wc to count the number of lines in the input files for all of the example programs and then uses pvariance() and pstdev() to compute the variance and standard deviation for the entire population before using variance() and stddev() to compute the sample variance and standard deviation for a subset created by using the length of every second file found.

$ python3 statistics_variance.py

Basic statistics:
  count     : 1282
  min       :   4.00
  max       : 228.00
  mean      :  27.79

Population variance:
  pstdev    :  17.86
  pvariance : 319.04

Estimated variance for sample:
  count     : 641
  stdev     :  16.94
  variance  : 286.99
原文地址:https://www.cnblogs.com/lightsong/p/13994422.html