CVX notes

CVX notes

Preliminaries

1. PSD

M is positive semidefinite matrix (iff) all principal submatrices (P) of (M) are PSD

Note: This follows by considering the quadratic form (x^T Mx) and looking at the components of (x) corresponding to the defining subset of principal submatrix. The converse is trivially true.

M is PSD (iff) all principal minors are non-negative (所有主子式非负)

将M写成二次型：

[x^T M x = sum_{i,j}M_{ij}x_ix_j ]

于是取 (x) 为标准基 (e_i ~implies M_{ii} ge 0 implies mathbf{tr}(M) ge 0) , 再取(x)为零向量只有 i，j两个位置为 1，则

[egin{gathered} x^T M x = M_{ii}M_{jj} - M_{ij}^2 ge 0 ~~(PSD) \ implies M_{ij} le sqrt{M_{ii}M_{jj}} le frac{M_{ii} + M_{jj}}{2} end{gathered} ]

2. Matrix norm

General definition of a norm:![2018-06-05 20-39-34 的屏幕截图](/home/yychi/Pictures/2018-06-05 20-39-34 的屏幕截图.png)

Matrix norm:![2018-06-05 20-44-07 的屏幕截图](/home/yychi/Pictures/2018-06-05 20-44-07 的屏幕截图.png)

Frobenius norm: (|A|_F := sqrt{langle A,A angle_F} = sqrt{mathbf{tr}(A^*A)})
Induced norm: (|A|_p := sup_limits{|x|_p = 1} |Ax|_p)
Nuclear norm: (|A|_{nuclear} := sum sigma_i(A)) (奇异值之和)
Spectral norm: (|A|_{spectral} := lambda_1) (最大特征值)

Spectrial radius

![2018-06-05 21-01-46 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-01-46 的屏幕截图.png)

![2018-06-05 21-02-47 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-02-47 的屏幕截图.png)

3. Duality

Two equivalent ways to represent a convex set:

standard representation: The family of points in the set
dual representation: The set of halfspaces containing the set (半平面的交集)

A closed convex set (S) is the intersection of all closed halfspaces (H) containing it.![2018-06-05 21-27-43 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-27-43 的屏幕截图.png)

![2018-06-05 21-29-20 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-29-20 的屏幕截图.png)

![2018-06-05 21-31-13 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-31-13 的屏幕截图.png)

Polar
Let (S subseteq mathbb{R}^n) be a convex set containing the origin. The polar of (S) is defined as follows

[S^{circ} := {y ~|~ y^Tx le 1, ~forall x in S} ]

Note

polar is one way of representing the all halfspaces containing a convex set
every halfspace (a^Tx le b) with (b eq 0) can be written as a “normalized” inequality (y^T x le 1), by dividing by (b)
(S^{circ}) can be thought of as the normalized representations of halfspaces containing (S)

Properties of the polar:

(S^{circcirc} = S)
(S^{circ}) is a closed convex set containing the origin
When 0 is in the interior of (S), then (S^{circ}) is bounded
When (S) is non-convex, (S^{circ} = (mathbf{conv}(S))^{circ}), and (S^{circcirc} = mathbf{conv}(S))

![2018-06-05 22-03-51 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-03-51 的屏幕截图.png)

![2018-06-05 22-04-27 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-04-27 的屏幕截图.png)

Polar duality of convex cones![2018-06-05 22-17-25 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-17-25 的屏幕截图.png)

![2018-06-05 22-18-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-18-54 的屏幕截图.png)

Notes

(K^{circcirc} = K)
(K^{circ}) is closed and convex

Conjugation of convex functions
Let (f: mathbb{R}^n mapsto mathbb{R}cup{infty}) be a convex function. The conjugation of (f) is

[f^*(y) := sup_limits{x}(y^Tx - f(x)) ]

![2018-06-05 23-57-38 的屏幕截图](/home/yychi/Pictures/2018-06-05 23-57-38 的屏幕截图.png)

Properties of the conjugate

(f^{**} = f)
(f^*) is convex (supremum of affine functions of (y))

![2018-06-06 00-03-01 的屏幕截图](/home/yychi/Pictures/2018-06-06 00-03-01 的屏幕截图.png)

Convex sets

Convex functions

![2018-05-14 09-26-39 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-26-39 的屏幕截图.png)

affine is convex: (f(x) = a^T x+b)

affine 既凸也凹
任何_范数_是凸的

Proof: let (pi(x)) be a norm of (x), then![2018-05-14 09-32-58 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-32-58 的屏幕截图.png)
(f) is convex (iff) epi((f)) is convex

1. Closed convex

A convex function (f) is called closed if its epigraph is a closed set.

(f) which is convex and continuous on a closed domain is a closed function. (norms)
all differentiable convex functions are closed with dom(f = mathbb{R}^n).
当考虑一个凸函数时，通常认为在dom(f)外取值为(infty)
Jensen's inequality:![2018-05-14 09-52-18 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-52-18 的屏幕截图.png)
Corollary:![2018-06-04 18-35-09 的屏幕截图](/home/yychi/Pictures/2018-06-04 18-35-09 的屏幕截图.png)

pf: (f(x) = f(sumalpha_i x_i) le sum alpha_i f(x_i) le max_limits{i} f(x_i))

2. Level sets

![2018-05-14 09-57-20 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-57-20 的屏幕截图.png)

Note: the convexity of level sets does not characterize convex functions, but quasiconvex functions.

convex (f) is closed (implies) all its level sets are closed

Some convex sets

norm ball (({xin mathbb{R}^n | |x| le 1})) is convex and closed
椭球(({x | (x-a)^T Q (x-a) le r^2})) is convex and closed
pf: (x^TQy := langle x, y angle) 满足内积的三条性质
- bilinearity
- symmetry
- positivity
  上述三条性质 (iff) Q is PSD
(epsilon)-neighborhood: ![2018-06-04 20-56-40 的屏幕截图](/home/yychi/Pictures/2018-06-04 20-56-40 的屏幕截图.png)

3. Operations perserving convexity of functions

stability under taking weighted sums: (f,g mapsto lambda f + mu g, ; lambda,mu ge 0)
stability under affine substitutions of the argument: (x mapsto Ax+b) or (f(x) mapsto phi(x) = f(Ax+b))
stability under taking pointwise sup: ({f_i}_{i in mathcal{I}} mapsto g(x) := sup_limits{i in mathcal{I}}f_i(x)), 凸函数族 ({f_i}_{i in mathcal{I}}) 逐点取上确界而成的函数也是凸的
stability under partial minimization: (f(x,y)) jointly convex in ((x,y)), then (g(x) = inf_limits{y} f(x,y)) is convex (suppose g is proper, i.e., > -(infty) everywhere and is finite at least at one point)
stability under perspective: (f(x) mapsto g(x,t) = tf(x/t), mathbf{dom}g = {(x,t) | x/t in mathbf{dom}f, t > 0})

4. Detect convexity

Necessary and Sufficient Convexity Condition for smooth function:

一阶可微的光滑函数 (f) 是凸的 (iff) (f'(x)) 单调非减
二阶可微的光滑函数 (f) 是凸的 (iff) (f''(x) ge 0)

subgradient property is characteristic of convex functions:![2018-06-05 08-38-57 的屏幕截图](/home/yychi/Pictures/2018-06-05 08-38-57 的屏幕截图.png)

5. Subgradient

Examples

![2018-06-05 10-08-50 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-08-50 的屏幕截图.png)
![2018-06-05 10-09-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-09-54 的屏幕截图.png)
![2018-06-05 10-10-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-10-54 的屏幕截图.png)
![2018-06-05 10-08-04 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-08-04 的屏幕截图.png)

6. Optimality conditions

凸函数的局部最优等价于全局最优。

第一充要条件（凸函数）
(x^* in mathbf{dom}f) is the minimizer (iff) (0 in partial f(x^*))

7. Strong convexity

A differentiable function f is strongly convex if

[f(y) ge f(x) + abla f(x)^T(y-x) + frac{mu}{2} |y-x|^2 ]

Note

(f) is not necessarily differentiable, (see the equivalent definition)
if (f) is non-smooth, gradient -> subgradient
strong convexity (implies) strict convexity

Note: Intuitively speaking, strong convexity means that there exists a quartic lower bound on the growth of the function.

Equivalent definition

[egin{align} &(i)~f(y)ge f(x)+ abla f(x)^T(y-x)+frac{mu}{2}lVert y-x Vert^2,~forall x, y.\ &(ii)~g(x) = f(x)-frac{mu}{2}lVert x Vert^2~ ext{is convex},~forall x.\ &(iii)~langle abla f(x) - abla f(y),x-y angle ge mu lVert x-y Vert^2,~forall x, y.\ &(iv)~f(alpha x+ (1-alpha) y) le alpha f(x) + (1-alpha) f(y) - frac{alpha (1-alpha)mu}{2}Vert x-y Vert^2,~alpha in [0,1].\ &(v)~ abla^2 f(x) succeq mu oldsymbol{I} end{align} ]

Lagrange Duality

Consider an optimization problem in standard form (not necessarily convex)

[egin{array}{ll} underset{x}{ ext{minimize}} & f_0(x) \ ext{subject to} & f_i(x) le 0, ~i=1,cdots,m \ ~ & h_i(x) = 0, ~i=1,cdots,p end{array} ]

The Lagrangian is

[L(x,oldsymbol{lambda},oldsymbol{mu}) = f_0(x) + sum_{i=1}^m lambda_i f_i(x) + sum_{i=1}^p mu_i h_i(x) ]

The Lagrange dual function is defined as

[g(lambda, mu) = inf_{x} L(x,lambda,mu) ]

![2018-06-11 12-48-10 的屏幕截图](/home/yychi/Pictures/2018-06-11 12-48-10 的屏幕截图.png)

Lagrange dual problem

[egin{array}{ll} underset{lambda, mu}{ ext{maximize}} & g(lambda, mu) \ ext{subject to} & oldsymbol{lambda} succeq mathbf{0} end{array} ]

Weak duality

[d^* le p^* ]

(d^*): optima of dual problem
(p^*): optima of primal problem
duality gap: (p^* - d^*)
always hold

Strong dualiy

[d^* = p^* ]

constraint qualifications (implies) strong duality
Slater’s Constraint Qualification: a convex problem is strictly feasible (i.e., (exists ~x in mathbf{int} mathcal{D}: x in Omega))

Complementary slackness
![2018-06-11 13-05-01 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-05-01 的屏幕截图.png)

KKT conditions
![2018-06-11 13-06-37 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-06-37 的屏幕截图.png)

![2018-06-11 13-08-39 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-08-39 的屏幕截图.png)

Cones

Tagent cone
Let M be a (nonempty) convex set and (x^* in M), the tagent cone of (M) at (x^*) is the cone

[egin{split} T_M(x^*) &= {h in mathbb{R}^n | x^* + th in M, ; forall t > 0 } \ &= {y in mathbb{R}^n ~|~ y - x^* in M} end{split} ]

Note:

Geometrically, this is the set of all directions leading from (x^*) inside (M)
convex but not necessarily closed
fact: if (x^*) is a minimizer, then (forall h in T_M(x^*) implies h^T abla f(x^*) ge 0). （因为tangent cone里面都是可行解，所以必须不是下降方向）
(T_M(x^*) = mathbb{R}^n iff x^* in mathbf{int}M)

e.g. 多面体

[M = {x | Ax le b} = {x | a_i^Tx le b_i, ; i = 1,dots,m} ]

the tangent cone at (x^*) is

[T_M(x^*) = {h~|~a_i^T h le 0, ~forall i, ~a_i^T x^* = b_i} ]

Normal cone: the polar cone of tangent cone

[N_M(x^*) = {g in mathbb{R}^n ~|~ langle g, y-x^* angle le 0, ~forall y in M} ]

Note:

normal cone is the polar to tangent cone, i.e.,

[egin{split} T_M(x^*) &= {g in mathbb{R}^n ~|~ langle g, y-x^* angle ge 0, ~forall y in M} \ N_M(x^*) &= {g in mathbb{R}^n ~|~ langle g, y-x^* angle le 0, ~forall y in M} end{split} ]

fact: if (x^*) is a minimizer, then (- abla f(x^*) in N_M(x^*)).
![2018-06-05 15-11-35 的屏幕截图](/home/yychi/Pictures/2018-06-05 15-11-35 的屏幕截图.png)

Algorithm convergence

~	Stepsize Rule	Convergence Rate	Iteration Complexity
Gradient descent
strongly convex & smooth	(eta_t = frac{2}{mu + L})	(Oleft(frac{kappa -1}{kappa +1} ight)^t)	(Oleft(frac{logfrac{1}{epsilon}}{logfrac{kappa+1}{kappa-1}} ight))
convex & smooth	(eta_t = frac{1}{L})	(O(frac{1}{sqrt{t}}))	(O(frac{1}{epsilon}))
Frank-Wolfe
(strongly) convex & smooth	(eta_t = frac{1}{t})	(O(frac{1}{sqrt{t}}))	(O(frac{1}{epsilon}))
Projected GD
convex & smooth	(eta_t = frac{1}{L})	(O(frac{1}{sqrt{t}}))	(O(frac{1}{epsilon}))
strongly convex & smooth	(eta_t = frac{1}{L})	(Oleft((1-frac{1}{kappa})^t ight))	(O(kappalogfrac{1}{epsilon}))
Subgradient method
convex & Lipschitz	(eta_t = frac{1}{sqrt{t}})	(O(frac{1}{sqrt{t}}))	(O(frac{1}{epsilon^2}))
strongly convex & Lipschitz	(eta_t = frac{1}{t})	(Oleft(frac{1}{t} ight))	(O(frac{1}{epsilon}))
Proximal GD
convex & smooth (w.r.t. (f))	(eta_t = frac{1}{L})	(O(frac{1}{t}))	(O(frac{1}{epsilon}))
strongly convex & smooth (w.r.t. (f))	(eta_t = frac{1}{L})	(Oleft((1-frac{1}{kappa})^t ight))	(O(kappalogfrac{1}{epsilon}))