主成分分析

Evernote Export

主成分分析(PCA)

  • 非监督学习的机器学习算法
  • 主要用于数据的降维
  • 通过降维,可以发现更便于人类理解的特征
  • 其他应用:可视化;去噪

主成分分析概念

对于二维数据,只选取其中一个特征,就达到降维的目的
第一步:dmean,所有的样本归零

Var(x)=m1i=1m(xix)2

x=0

主成分分析法:
1.对所有的样本进行demean处理
2.求一个轴的方向w=(w1,w2)
3.使得所有的样本,映射到w以后,有Var(Xproject)=m1i=1m(Xproject(i)Xproject)2最大
最终会化简为求一个目标函数的最优化问题,使用梯度上升法解决
线性回归和主成分分析差异在于:线性回归最终是求得输出标记,回归结果使得MSE尽量小

import numpy as np
import matplotlib.pyplot as plt
X = np.empty((100,2))
X[:,0] = np.random.uniform(0.,100.,size=100)
X[:,1] = 0.75 * X[:,0]+3+np.random.normal(0,10.,size=100)
plt.scatter(X[:,0],X[:,1])
plt.show()

降维过程 demean

def demean(x):   
    return x- np.mean(x,axis=0)
X_demean = demean(X)
plt.scatter(X_demean[:,0],X_demean[:,1])
plt.show()

主成分分析法是一组坐标系转换去另一组坐标系的方式
数据进行改变,将数据在第一个主成分上的分量去掉后就可以实现另一个主成分分析

高维数据向低维数据映射

XWkT˙

在降维的过程中丢失的信息是难以恢复的,XkWk=Xm

%23%23%23%20%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90(PCA)%0A*%20%E9%9D%9E%E7%9B%91%E7%9D%A3%E5%AD%A6%E4%B9%A0%E7%9A%84%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95%0A*%20%E4%B8%BB%E8%A6%81%E7%94%A8%E4%BA%8E%E6%95%B0%E6%8D%AE%E7%9A%84%E9%99%8D%E7%BB%B4%0A*%20%E9%80%9A%E8%BF%87%E9%99%8D%E7%BB%B4%EF%BC%8C%E5%8F%AF%E4%BB%A5%E5%8F%91%E7%8E%B0%E6%9B%B4%E4%BE%BF%E4%BA%8E%E4%BA%BA%E7%B1%BB%E7%90%86%E8%A7%A3%E7%9A%84%E7%89%B9%E5%BE%81%0A*%20%E5%85%B6%E4%BB%96%E5%BA%94%E7%94%A8%EF%BC%9A%E5%8F%AF%E8%A7%86%E5%8C%96%EF%BC%9B%E5%8E%BB%E5%99%AA%0A%0A%23%23%23%23%20%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%E6%A6%82%E5%BF%B5%0A%E5%AF%B9%E4%BA%8E%E4%BA%8C%E7%BB%B4%E6%95%B0%E6%8D%AE%EF%BC%8C%E5%8F%AA%E9%80%89%E5%8F%96%E5%85%B6%E4%B8%AD%E4%B8%80%E4%B8%AA%E7%89%B9%E5%BE%81%EF%BC%8C%E5%B0%B1%E8%BE%BE%E5%88%B0%E9%99%8D%E7%BB%B4%E7%9A%84%E7%9B%AE%E7%9A%84%0A%E7%AC%AC%E4%B8%80%E6%AD%A5%EF%BC%9Admean%EF%BC%8C%E6%89%80%E6%9C%89%E7%9A%84%E6%A0%B7%E6%9C%AC%E5%BD%92%E9%9B%B6%0A%24%24Var(x)%3D%5Cfrac%7B1%7D%7Bm%7D%5Csum%5E%7Bm%7D_%7Bi%3D1%7D(x_i-%5Coverline%20x)%5E2%24%24%0A%24%24%5Coverline%20x%20%3D0%24%24%0A%0A%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%E6%B3%95%EF%BC%9A%0A1.%E5%AF%B9%E6%89%80%E6%9C%89%E7%9A%84%E6%A0%B7%E6%9C%AC%E8%BF%9B%E8%A1%8Cdemean%E5%A4%84%E7%90%86%0A2.%E6%B1%82%E4%B8%80%E4%B8%AA%E8%BD%B4%E7%9A%84%E6%96%B9%E5%90%91%24w%20%3D%20(w_1%2Cw_2)%24%0A3.%E4%BD%BF%E5%BE%97%E6%89%80%E6%9C%89%E7%9A%84%E6%A0%B7%E6%9C%AC%EF%BC%8C%E6%98%A0%E5%B0%84%E5%88%B0w%E4%BB%A5%E5%90%8E%EF%BC%8C%E6%9C%89%24Var(X_%7Bproject%7D)%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5E%7Bm%7D(X%5E%7B(i)%7D_%7Bproject%7D-%5Coverline%20X_%7Bproject%7D)%5E2%24%E6%9C%80%E5%A4%A7%0A%E6%9C%80%E7%BB%88%E4%BC%9A%E5%8C%96%E7%AE%80%E4%B8%BA%E6%B1%82%E4%B8%80%E4%B8%AA%E7%9B%AE%E6%A0%87%E5%87%BD%E6%95%B0%E7%9A%84%E6%9C%80%E4%BC%98%E5%8C%96%E9%97%AE%E9%A2%98%EF%BC%8C%E4%BD%BF%E7%94%A8%E6%A2%AF%E5%BA%A6%E4%B8%8A%E5%8D%87%E6%B3%95%E8%A7%A3%E5%86%B3%0A%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92%E5%92%8C%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%E5%B7%AE%E5%BC%82%E5%9C%A8%E4%BA%8E%EF%BC%9A**%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92%E6%9C%80%E7%BB%88%E6%98%AF%E6%B1%82%E5%BE%97%E8%BE%93%E5%87%BA%E6%A0%87%E8%AE%B0%EF%BC%8C%E5%9B%9E%E5%BD%92%E7%BB%93%E6%9E%9C%E4%BD%BF%E5%BE%97MSE%E5%B0%BD%E9%87%8F%E5%B0%8F**%0A%0A%60%60%60python%0Aimport%20numpy%20as%20np%0Aimport%20matplotlib.pyplot%20as%20plt%0AX%20%3D%20np.empty((100%2C2))%0AX%5B%3A%2C0%5D%20%3D%20np.random.uniform(0.%2C100.%2Csize%3D100)%0AX%5B%3A%2C1%5D%20%3D%200.75%20*%20X%5B%3A%2C0%5D%2B3%2Bnp.random.normal(0%2C10.%2Csize%3D100)%0Aplt.scatter(X%5B%3A%2C0%5D%2CX%5B%3A%2C1%5D)%0Aplt.show()%0A%60%60%60%0A!%5B329d3473aeb06ab967d17e71dea1ce61.png%5D(en-resource%3A%2F%2Fdatabase%2F1334%3A1)%0A%0A%23%23%23%20%E9%99%8D%E7%BB%B4%E8%BF%87%E7%A8%8B%20demean%0A%60%60%60python%0Adef%20demean(x)%3A%C2%A0%C2%A0%C2%A0%0A%20%20%20%20return%20x-%20np.mean(x%2Caxis%3D0)%0AX_demean%20%3D%20demean(X)%0Aplt.scatter(X_demean%5B%3A%2C0%5D%2CX_demean%5B%3A%2C1%5D)%0Aplt.show()%0A%60%60%60%0A!%5B86498efa9af05ece282d38aa645b1cd0.png%5D(en-resource%3A%2F%2Fdatabase%2F1336%3A1)%0A%0A%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%E6%B3%95%E6%98%AF%E4%B8%80%E7%BB%84%E5%9D%90%E6%A0%87%E7%B3%BB%E8%BD%AC%E6%8D%A2%E5%8E%BB%E5%8F%A6%E4%B8%80%E7%BB%84%E5%9D%90%E6%A0%87%E7%B3%BB%E7%9A%84%E6%96%B9%E5%BC%8F%0A%E6%95%B0%E6%8D%AE%E8%BF%9B%E8%A1%8C%E6%94%B9%E5%8F%98%EF%BC%8C%E5%B0%86%E6%95%B0%E6%8D%AE%E5%9C%A8%E7%AC%AC%E4%B8%80%E4%B8%AA%E4%B8%BB%E6%88%90%E5%88%86%E4%B8%8A%E7%9A%84%E5%88%86%E9%87%8F%E5%8E%BB%E6%8E%89%E5%90%8E%E5%B0%B1%E5%8F%AF%E4%BB%A5%E5%AE%9E%E7%8E%B0%E5%8F%A6%E4%B8%80%E4%B8%AA%E4%B8%BB%E6%88%90%E5%88%86%E5%88%86%E6%9E%90%0A%0A%23%23%23%23%20%E9%AB%98%E7%BB%B4%E6%95%B0%E6%8D%AE%E5%90%91%E4%BD%8E%E7%BB%B4%E6%95%B0%E6%8D%AE%E6%98%A0%E5%B0%84%0A%24%24X%20%5Cdot%20%7BW%5ET_k%7D%24%24%0A%E5%9C%A8%E9%99%8D%E7%BB%B4%E7%9A%84%E8%BF%87%E7%A8%8B%E4%B8%AD%E4%B8%A2%E5%A4%B1%E7%9A%84%E4%BF%A1%E6%81%AF%E6%98%AF%E9%9A%BE%E4%BB%A5%E6%81%A2%E5%A4%8D%E7%9A%84%EF%BC%8C%24X_k%20W_k%20%3D%20X_m%24%0A
Win a contest, win a challenge
原文地址:https://www.cnblogs.com/pandaboy1123/p/10343565.html