covariance of sklearn

Covariance estimation

https://scikit-learn.org/stable/modules/covariance.html#

协方差矩阵可以看成是数据集分散布局的估计。

理解： 在矩阵中，高相关系数越多，则数据集分布越集中，反之越分散。例如各个特征之间的相关系数都是0，则成分散状态。

相关度高的特征，在降维操作中，可以合并为一个维度。

一般在抽样样本上做估计，样本的特性（大小、结构、同质性）影响了估计的质量。

我们假设观察的数据是独立同分布的。

Many statistical problems require the estimation of a population’s covariance matrix, which can be seen as an estimation of data set scatter plot shape. Most of the time, such an estimation has to be done on a sample whose properties (size, structure, homogeneity) have a large influence on the estimation’s quality. The sklearn.covariance package provides tools for accurately estimating a population’s covariance matrix under various settings.

We assume that the observations are independent and identically distributed (i.i.d.).

如何通俗易懂地解释「协方差」与「相关系数」的概念？

有必要说明，协方差和相关系数，都是针对线性角度的观察，非线性关系，没有考虑。

https://www.zhihu.com/question/20852004

一、协方差：可以通俗的理解为：两个变量在变化过程中是同方向变化？还是反方向变化？同向或反向程度如何？

你变大，同时我也变大，说明两个变量是同向变化的，这时协方差就是正的。

你变大，同时我变小，说明两个变量是反向变化的，这时协方差就是负的。

从数值来看，协方差的数值越大，两个变量同向程度也就越大。

概率中的相关性与独立性

https://www.cnblogs.com/hkycs/p/5111201.html

Empirical covariance（经验协方差）

使用极大似然估计逼近方法。

其实渐进无偏差估计。

理解：此估计是采用原始统计公式进行计算的。

https://scikit-learn.org/stable/modules/covariance.html#empirical-covariance

The covariance matrix of a data set is known to be well approximated by the classical maximum likelihood estimator (or “empirical covariance”), provided the number of observations is large enough compared to the number of features (the variables describing the observations). More precisely, the Maximum Likelihood Estimator of a sample is an asymptotically unbiased estimator of the corresponding population’s covariance matrix.

The empirical covariance matrix of a sample can be computed using the empirical_covariance function of the package, or by fitting an EmpiricalCovariance object to the data sample with the EmpiricalCovariance.fit method. Be careful that results depend on whether the data are centered, so one may want to use the assume_centered parameter accurately. More precisely, if assume_centered=False, then the test set is supposed to have the same mean vector as the training set. If not, both should be centered by the user, and assume_centered=True should be used.

Shrunk Covariance （收缩型协方差）

https://scikit-learn.org/stable/modules/covariance.html#shrunk-covariance

极大似然估计是渐进无偏的，但是它对协方差矩阵特征值的估计并不有益处。

所以引入此估计方法。

Despite being an asymptotically unbiased estimator of the covariance matrix, the Maximum Likelihood Estimator is not a good estimator of the eigenvalues of the covariance matrix, so the precision matrix obtained from its inversion is not accurate. Sometimes, it even occurs that the empirical covariance matrix cannot be inverted for numerical reasons. To avoid such an inversion problem, a transformation of the empirical covariance matrix has been introduced: the shrinkage.

In scikit-learn, this transformation (with a user-defined shrinkage coefficient) can be directly applied to a pre-computed covariance with the shrunk_covariance method. Also, a shrunk estimator of the covariance can be fitted to data with a ShrunkCovariance object and its ShrunkCovariance.fit method. Again, results depend on whether the data are centered, so one may want to use the assume_centered parameter accurately.

Sparse inverse covariance（稀疏逆协方差）

协方差矩阵的逆矩阵，也叫精度矩阵，正比于偏相关矩阵。

如果两个特征是独立的，则对应的系数是0。

https://scikit-learn.org/stable/modules/covariance.html#sparse-inverse-covariance

The matrix inverse of the covariance matrix, often called the precision matrix, is proportional to the partial correlation matrix. It gives the partial independence relationship. In other words, if two features are independent conditionally on the others, the corresponding coefficient in the precision matrix will be zero. This is why it makes sense to estimate a sparse precision matrix: the estimation of the covariance matrix is better conditioned by learning independence relations from the data. This is known as covariance selection.

In the small-samples situation, in which n_samples is on the order of n_features or smaller, sparse inverse covariance estimators tend to work better than shrunk covariance estimators. However, in the opposite situation, or for very correlated data, they can be numerically unstable. In addition, unlike shrinkage estimators, sparse estimators are able to recover off-diagonal structure.

The GraphicalLasso estimator uses an l1 penalty to enforce sparsity on the precision matrix: the higher its alpha parameter, the more sparse the precision matrix. The corresponding GraphicalLassoCV object uses cross-validation to automatically set the alpha parameter.

比较

从图中看出，

经验矩阵，带有很多的噪音；

收缩矩阵，收缩的太厉害，把相关性都都屏蔽掉了。

稀疏逆矩阵，最能表达原始的相关性。

A comparison of maximum likelihood, shrinkage and sparse estimates of the covariance and precision matrix in the very small samples

Robust Covariance Estimation

鲁棒性协方差矩阵估计，针对存在异常值得情况。

因为前面的估计都对异常值比较敏感。

https://scikit-learn.org/stable/modules/covariance.html#robust-covariance-estimation

Real data sets are often subject to measurement or recording errors. Regular but uncommon observations may also appear for a variety of reasons. Observations which are very uncommon are called outliers. The empirical covariance estimator and the shrunk covariance estimators presented above are very sensitive to the presence of outliers in the data. Therefore, one should use robust covariance estimators to estimate the covariance of its real data sets. Alternatively, robust covariance estimators can be used to perform outlier detection and discard/downweight some observations according to further processing of the data.

The sklearn.covariance package implements a robust estimator of covariance, the Minimum Covariance Determinant 3.