Conjugate Gradient Descent (1)

  • 算法特征:
    ①. 将线性方程组等效为最优化问题; ②. 以共轭方向作为搜索方向.
  • 算法推导:
    Part Ⅰ 算法细节
    现以如下线性方程组为例进行算法推导,
    egin{equation}
    Ax = b
    label{eq_1}
    end{equation}
    如​上式$eqref{eq_1}$解存在, 则等效如下最优化问题,
    egin{equation}
    minquad frac{1}{2}|Ax - b|_2^2 quad Rightarrowquad minquad frac{1}{2}x^mathrm{T}A^mathrm{T}Ax - b^mathrm{T}Ax
    label{eq_2}
    end{equation}
    上式$eqref{eq_2}$之最优解$x^star$即为式$eqref{eq_1}$的解. 为简化叙述, 令$C = A^mathrm{T}A, c=A^mathrm{T}b$, 则式$eqref{eq_2}$转换如下,
    egin{equation}
    minquad frac{1}{2}x^mathrm{T}Cx - c^mathrm{T}x
    label{eq_3}
    end{equation}
    其中, $C$为对称半正定矩阵. 上式$eqref{eq_3}$为无约束凸二次规划问题, 拟设其目标函数符号为$J$, 则梯度表示如下,
    egin{equation}
    g = abla J = Cx - c
    label{eq_4}
    end{equation}
    不失一般性, 在迭代求解过程中, 令第$k$步迭代形式如下,
    egin{equation}
    x_{k+1} = x_k + alpha_kd_k
    label{eq_5}
    end{equation}
    其中$alpha_k$、$d_k$分别代表第$k$步迭代步长与迭代方向.
    根据线搜法要求,
    egin{equation}
    alpha_k = mathop{argmin}_{alphain mathbb{R}} J(x_k+alpha d_k)quadRightarrowquad
    frac{mathrm{d}J}{mathrm{d}alpha}igg|_{k+1} = 0
    label{eq_6}
    end{equation}
    解之得,
    egin{equation}
    alpha_k = frac{-g_k^mathrm{T}d_k}{d_k^mathrm{T}Cd_k}
    label{eq_7}
    end{equation}
    根据共轭方向要求,
    egin{gather}
    d_k = -g_k + eta_kd_{k-1}label{eq_8} \
    d_{k}^mathrm{T}Cd_{k-1} = 0label{eq_9}
    end{gather}
    解之得,
    egin{equation}
    eta_k = frac{g_k^mathrm{T}Cd_{k-1}}{d_{k-1}^mathrm{T}Cd_{k-1}} =
    frac{g_k^mathrm{T}(g_k-g_{k-1})}{d_{k-1}^mathrm{T}(g_k-g_{k-1})}
    label{eq_10}
    end{equation}
    结合式$eqref{eq_5}$、$eqref{eq_7}$、$eqref{eq_8}$、$eqref{eq_10}$即可迭代求解式$eqref{eq_3}$, 其最优解$x^star$即为式$eqref{eq_1}$的解.
    Part Ⅱ 算法流程
    初始化收敛判据$epsilon$、迭代起点$x_0$
    计算当前梯度值$g_0=Cx_0 - c$, 令: $d_0 = -g_0$, $k=0$, 重复以下步骤,
      step1: 如果$|g_k|<epsilon$, 收敛, 迭代停止
      step2: 计算迭代步长$displaystyle alpha_k=frac{-g_k^mathrm{T}d_k}{d_k^mathrm{T}Cd_k}$
      step3: 更新迭代点$displaystyle x_{k+1}=x_k + alpha_kd_k$
      step4: 更新梯度值$g_{k+1}=Cx_{k+1}-c$
      step5: 计算组合系数$displaystyle eta_{k+1}=frac{g_{k+1}^mathrm{T}(g_{k+1}-g_k)}{d_{k}^mathrm{T}(g_{k+1}-g_k)}$
      step6: 更新共轭方向$displaystyle d_{k+1}=-g_{k+1}+eta_{k+1}d_k$
      step7: 令$k = k+1$, 转step1
  • 代码实现:
    笔者以求解随机生成的线性方程组$Ax = b$为例进行算法实施, conjugate gradient descent实现如下,
      1 # conjugate gradient descent之实现
      2 # 求解线性方程组Ax = b
      3 
      4 import numpy
      5 from matplotlib import pyplot as plt
      6 
      7 
      8 numpy.random.seed(0)
      9 
     10 
     11 # 随机生成待求解之线性方程组Ax = b
     12 def get_A_b(n=30):
     13     A = numpy.random.uniform(-10, 10, (n, n))
     14     b = numpy.random.uniform(-10, 10, (n, 1))
     15     return A, b
     16     
     17     
     18 # 共轭梯度法之实现
     19 class CGD(object):
     20     
     21     def __init__(self, A, b, epsilon=1.e-6, maxIter=30000):
     22         self.__A = A                               # 系数矩阵
     23         self.__b = b                               # 偏置向量
     24         self.__epsilon = epsilon                   # 收敛判据
     25         self.__maxIter = maxIter                   # 最大迭代次数
     26         
     27         self.__C, self.__c = self.__build_C_c()    # 构造优化问题相关参数
     28         self.__JPath = list()
     29 
     30 
     31     def get_solu(self):
     32         '''
     33         获取数值解
     34         '''
     35         self.__JPath.clear()
     36         
     37         x = self.__init_x()
     38         JVal = self.__calc_JVal(self.__C, self.__c, x)
     39         self.__JPath.append(JVal)
     40         grad = self.__calc_grad(self.__C, self.__c, x)
     41         d = -grad
     42         for idx in range(self.__maxIter):
     43             # print("iterCnt: {:3d},   JVal: {}".format(idx, JVal))
     44             if self.__converged1(grad, self.__epsilon):
     45                 self.__print_MSG(x, JVal, idx)
     46                 return x, JVal, True
     47                 
     48             alpha = self.__calc_alpha(self.__C, grad, d)
     49             xNew = x + alpha * d
     50             JNew = self.__calc_JVal(self.__C, self.__c, xNew)
     51             self.__JPath.append(JNew)
     52             if self.__converged2(xNew - x, JNew - JVal, self.__epsilon ** 2):
     53                 self.__print_MSG(xNew, JNew, idx + 1)
     54                 return xNew, JNew, True
     55             
     56             gNew = self.__calc_grad(self.__C, self.__c, xNew)
     57             beta = self.__calc_beta(gNew, grad, d)
     58             dNew = self.__calc_d(gNew, d, beta)
     59             
     60             x, JVal, grad, d = xNew, JNew, gNew, dNew
     61         else:
     62             if self.__converged1(grad, self.__epsilon):
     63                 self.__print_MSG(x, JVal, self.__maxIter)
     64                 return x, JVal, True
     65         
     66         print("CGD not converged after {} steps!".format(self.__maxIter))
     67         return x, JVal, False
     68         
     69         
     70     def get_path(self):
     71         return self.__JPath
     72         
     73         
     74     def __print_MSG(self, x, JVal, iterCnt):
     75         print("Iteration steps: {}".format(iterCnt))
     76         print("Solution:
    {}".format(x.flatten()))
     77         print("JVal: {}".format(JVal))
     78                 
     79                 
     80     def __converged1(self, grad, epsilon):
     81         if numpy.linalg.norm(grad, ord=numpy.inf) < epsilon:
     82             return True
     83         return False
     84         
     85         
     86     def __converged2(self, xDelta, JDelta, epsilon):
     87         val1 = numpy.linalg.norm(xDelta, ord=numpy.inf)
     88         val2 = numpy.abs(JDelta)
     89         if val1 < epsilon or val2 < epsilon:
     90             return True
     91         return False
     92         
     93     
     94     def __init_x(self):
     95         x0 = numpy.zeros((self.__C.shape[0], 1))
     96         return x0
     97         
     98         
     99     # def __calc_JVal(self, C, c, x):
    100         # term1 = numpy.matmul(x.T, numpy.matmul(C, x))[0, 0] / 2
    101         # term2 = numpy.matmul(c.T, x)[0, 0]
    102         # JVal = term1 - term2
    103         # return JVal
    104         
    105         
    106     def __calc_JVal(self, C, c, x):
    107         term1 = numpy.matmul(self.__A, x) - self.__b
    108         JVal = numpy.sum(term1 ** 2) / 2
    109         return JVal
    110         
    111         
    112     def __calc_grad(self, C, c, x):
    113         grad = numpy.matmul(C, x) - c
    114         return grad
    115         
    116         
    117     def __calc_d(self, grad, dOld, beta):
    118         d = -grad + beta * dOld
    119         return d
    120         
    121         
    122     def __calc_alpha(self, C, grad, d):
    123         term1 = numpy.matmul(grad.T, d)[0, 0]
    124         term2 = numpy.matmul(d.T, numpy.matmul(C, d))[0, 0]
    125         alpha = -term1 / term2
    126         return alpha
    127         
    128         
    129     def __calc_beta(self, grad, gOld, dOld):
    130         term0 = grad - gOld
    131         term1 = numpy.matmul(grad.T, term0)[0, 0]
    132         term2 = numpy.matmul(dOld.T, term0)[0, 0]
    133         beta = term1 / term2
    134         return beta        
    135         
    136         
    137     def __build_C_c(self):
    138         C = numpy.matmul(A.T, A)
    139         c = numpy.matmul(A.T, b)
    140         return C, c
    141         
    142         
    143 class CGDPlot(object):
    144     
    145     @staticmethod
    146     def plot_fig(cgdObj):
    147         x, JVal, tab = cgdObj.get_solu()
    148         JPath = cgdObj.get_path()
    149         
    150         fig = plt.figure(figsize=(6, 4))
    151         ax1 = plt.subplot()
    152         
    153         ax1.plot(numpy.arange(len(JPath)), JPath, "k.")
    154         ax1.plot(0, JPath[0], "go", label="starting point")
    155         ax1.plot(len(JPath)-1, JPath[-1], "r*", label="solution")
    156         
    157         ax1.legend()
    158         ax1.set(xlabel="$iterCnt$", ylabel="$JVal$", title="JVal-Final = {}".format(JVal))
    159         fig.tight_layout()
    160         fig.savefig("plot_fig.png", dpi=100)
    161     
    162     
    163     
    164 if __name__ == "__main__":
    165     A, b = get_A_b()
    166     
    167     cgdObj = CGD(A, b)
    168     CGDPlot.plot_fig(cgdObj)
    View Code 
  • 结果展示:

    注意, 上式中JVal之计算采用式$eqref{eq_2}$左侧表达式, 该值收敛时趋于0则数值解可靠.
  • 使用建议:
    ①. 适用于迭代求解大型线性方程组;
    ②. 对于凸二次型目标函数可利用子空间扩展定理进一步简化组合系数$eta$之计算.
  • 参考文档:
    https://zhuanlan.zhihu.com/p/338838078
原文地址:https://www.cnblogs.com/xxhbdk/p/15008748.html