PCA的算法实现(更新)

1. 直推式的PCA

基本步骤:

  • 对样本数据进行中心化处理(这步操作比较重要,特别是对推导公式)
  • 求样本的协方差矩阵;
  • 对样本的协方差矩阵进行特征值分解,并通过前k个特征值对应的特征向量进行映射:

PCA的优化目标是:

  X = D + N,即低秩矩阵D和独立同分布的Gaussian噪声;

 1 def pca(X, d):
 2     """直推式RPCA
 3     input:
 4         X: vector samples(row)
 5         d: target dimension
 6     output:
 7         reduced vector samples
 8     """
 9     # 1. zero-equalization
10     size = X.shape[0]
11     mean_x = np.mean(X)
12     new_X = np.array([x - mean_x for x in X])     # 零均值化
13     # 2.Cov(X)
14     tmp_a = new_X.reshape(size, 1)
15     tmp_b = new_X.reshape(1, size)
16     cov_X = np.dot(tmp_a, tmp_b)            # 协方差矩阵
17     # 3.eig_value & eig_vectors
18     e_val, e_vecs = linalg.eig(cov_X)
19     eval_idx = np.argsort(e_val)[::-1]
20     # 4.取出前d大的特征值对应特征向量
21     sorted_eval = [e_val[i] for i in eval_idx]
22     sorted_vec = e_vecs[:, eval_idx]
23     d_vec = sorted_vec[:, :d]               # 每个特征向量对应一个主成分方向
24     # 5. X降维结果
25     reduced_X = np.dot(d_vec.T, X)
26     i = np.dot(d_vec.T, d_vec)
27 
28     return reduced_X, i
29 
30 
31 # -----------------------Test Part-------------------------
32 if __name__ == '__main__':
33     # a = np.array(([1, 2, 3],
34     #               [2, 5, 6],
35     #               [3, 6, 9]))
36     a = np.array(([1,2,3,4,5,6,7,8]))
37     a,i = pca(a,4)
38     print(a)
39     print(i)

Notes:

# 一维矩阵相乘
将向量都reshape成n*1和1*n的形式
a = a.reshape(size, 1)
b = a.reshape(1, size)
np.dot(a, b)

2.RPCA
  

# 这里代码还没写

  RPCA的优化目标是:

    D=L+SD=L+S,即低秩矩阵L和稀疏尖锐噪声矩阵S


原文地址:https://www.cnblogs.com/KrianJ/p/12178237.html