Methods for Problems with Special Structure¶

Sparse Secant Methods¶

Sparse symmetric secant updates do not necessarily preserve positive definiteness.
- Experiments with these updates did not yield good results.
Feasible solutions are either the finite-difference technique of Section 11.1 or conjugate gradient.
- Conjugate gradient are inefficient for small problems, but have proven to be a leading contender for large problems.

Exercise 1¶

From definition $4.1.7$, $J(x) = F'(x) \in \mathbb{R}^{n \times n}$ because $F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$.

For a banded matrix, $J(x)_{ij} = F'(x)_{ij} = \frac{\partial f_i}{\partial x_j}(x) = 0$ if $\left\vert i - j \right\vert > m$ and $m < \lfloor \frac{n}{2} \rfloor$.

As shown in (4.2.1), evaluating a finite difference along parameter $j$’s dimension ($h_j e_j$) is equivalent to approximating the $j\text{th}$ column of $J(x)$.

Generalizing the technique in Section 11.1 gives

\[\{ j \mid j = k \bmod (1 + 2m) \},\]

which means $(1 + 2m)$ additional evaluations of $F(x)$ are required at each iteration to approximate $J(x)$.

[1]:

def printSparseMatrix(n, m):
    m = min(m, n / 2)
    for i in range(n):
        for j in range(n):
            if abs(i - j) > m:
                print(' 0', end='')
            else:
                print(' 1', end='')
        print('\n')

printSparseMatrix(8, 3)

 1 1 1 1 0 0 0 0

 1 1 1 1 1 0 0 0

 1 1 1 1 1 1 0 0

 1 1 1 1 1 1 1 0

 0 1 1 1 1 1 1 1

 0 0 1 1 1 1 1 1

 0 0 0 1 1 1 1 1

 0 0 0 0 1 1 1 1

Exercise 2¶

$\Gamma = \varnothing$ when $J(x)$ is not symmetric because each column will have at least two non-zero elements. Thus this technique will require the full five evaluations of $F(x)$ in addition to $F(x_c)$ to approximate $J(x)$.

When $J(x)$ is symmetric, $F(x_c + h_5 e_5)$ could be evaluated to indirectly approximate the bottom row of $J(x)$. The left four columns of $J(x)$ can be evaluated simultaneously using $w = F(x_c + \sum_{j = 1}^4 h_j e_j)$ while ignoring the result w_5.

Exercise 3¶

Let $A_{i.}, B_{i.} \in \mathbb{R}^n$ denote the $i\text{th}$ row of each matrix and $\mathrm{b_i} = B_{i.}$ be defined by (11.2.6). This minimization problem is (11.2.4) with one fewer constraint:

\[\begin{split}\left\Vert B - A \right\Vert_F^2 &= \sum_{i = 1}^n \left\Vert B_{i.} - A_{i.} \right\Vert_2^2\\ &= \sum_{i = 1}^n \sum_{j = 1}^n (\mathrm{b_i}_j - A_{ij})^2\\ &= \sum_{i = 1}^n \left( \sum_{k \in \mathcal{Z}_{i1}} (\mathrm{b_i}_k - A_{ik})^2 + \sum_{l \in \mathcal{Z}_{i0}} A_{il}^2 \right)\end{split}\]

where $\mathcal{Z}_{i1} = \{ j \mid Z_{ij} = 1 \}$ and $\mathcal{Z}_{i0} = \{ j \mid Z_{ij} = 0 \}$. By inspection, this is minimized when $B = P_Z(A)$.

Exercise 4¶

\[\begin{split}\left\Vert A_{+_{i.}} - A_{i.} \right\Vert_2 &= \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ (y - As)_i \mathrm{s_i}^\top \right\Vert_2 & \quad & \text{(11.2.9)}\\ &= \left\Vert (y - As)_i \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{(3.1.17)}\\ &= \left\Vert (B_{i.} - A_{i.})^\top \mathrm{s_i} \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{(11.2.8)}\\ &\leq \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 \left\Vert \mathrm{s_i} \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{Cauchy-Schwarz inequality}\\ &= \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 (\mathrm{s_i}^\top \mathrm{s_i})^+ \left\Vert \mathrm{s_i} \right\Vert_2^2 & \quad & \text{(3.1.1b)}\\ &= \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 & \quad & \text{(3.6.6)}\end{split}\]

Exercise 5¶

Notice that applying lemma 4.1.9 yields

\[y = F(x_+) - F(x) = F(x + s) - F(x) = \int_0^1 J(x + ts) s dt.\]

A component-by-component application of (4.1.2) (see Exercise 4.4) reveals

\[y_i = F(x_+)_i - F(x)_i = F(x + s)_i - F(x)_i = \int_0^1 \nabla f_i(x + ts)^\top \mathrm{s_i} dt\]

where $\mathrm{s_i}$ denotes the result of imposing on $s$ on the sparsity pattern of the $i\text{th}$ row of $F'(x) = J(x) \in SP(Z)$. When $\mathrm{s_i} = 0$, the entire integral evaluates to $0 = y_i$.

Exercise 6¶

[2]:

import numpy

n = 5
A = numpy.zeros((n, n))
for i in range(n):
    for j in range(n):
        v = 4
        _ = abs(i - j)
        if _ == 1:
            v = 1
        elif _ > 1:
            v = 0
        A[i,j] = v

print(A)
print(numpy.linalg.inv(A))

[[4. 1. 0. 0. 0.]
 [1. 4. 1. 0. 0.]
 [0. 1. 4. 1. 0.]
 [0. 0. 1. 4. 1.]
 [0. 0. 0. 1. 4.]]
[[ 0.26794872 -0.07179487  0.01923077 -0.00512821  0.00128205]
 [-0.07179487  0.28717949 -0.07692308  0.02051282 -0.00512821]
 [ 0.01923077 -0.07692308  0.28846154 -0.07692308  0.01923077]
 [-0.00512821  0.02051282 -0.07692308  0.28717949 -0.07179487]
 [ 0.00128205 -0.00512821  0.01923077 -0.07179487  0.26794872]]

Exercise 7¶

The Thomas algorithm can be used to solve the tridiagonal system of $n$ unknowns, but it’s not stable in general. Since $A$ is symmetric positive definite, another approach is to rewrite $A = LL^\top$ (Cholesky decomposition) and compute the resulting quantities [BOCL97].

Exercise 8¶

Let $f(x) = \rho(R(x))$ where $R \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ and $\rho \colon \mathbb{R}^m \rightarrow \mathbb{R}$.

Only $\nabla \rho(R(x))$ was derived because the derivation for $\nabla^2 \rho(R(x))$ is essentially Exercise 6.17 and Exercise 6.18. See (11.3.4) for a more generalized derivation.

(a)¶

Recall that $\lvert x \rvert = \sqrt{x^2}$ and $\frac{d}{dx} \lvert x \rvert = \frac{d}{dx} \sqrt{x^2} = \frac{x}{\lvert x \rvert}$.

Let $\rho(z) = \sum_{i = 1}^m \rho_1(z_i)$ where $\rho_1$ is defined as (10.4.4). Applying definition 4.1.1 with the chain rule yields

\[\begin{split}\nabla \rho(z = R(x)) &= \frac{d}{dx} \sum_{i = 1}^m \rho_1(z_i)\\ &= \sum_{i = 1}^m \begin{bmatrix} \frac{\partial}{\partial x_1} \rho_1(z_i)\\ \vdots\\ \frac{\partial}{\partial x_n} \rho_1(z_i)\\ \end{bmatrix}\\ &= \sum_{i = 1}^m \rho_1'(z_i) \begin{bmatrix} \frac{\partial z_i}{\partial x_1}\\ \vdots\\ \frac{\partial z_i}{\partial x_n}\\ \end{bmatrix} & \quad & \text{(a.1)}\\ &= \sum_{i = 1}^m \rho_1'(z_i) \nabla z_i.\end{split}\]

(a.1)¶

\[\begin{split}\frac{\partial}{\partial x_j} \rho_1(z_i) &= \begin{cases} \frac{\partial}{\partial x_j} \frac{z_i^2}{2}, & \text{$|z_i| \leq k$},\\ \frac{\partial}{\partial x_j} \left( k |z_i| - \frac{k^2}{2} \right), & \text{$|z_i| > k$}. \end{cases}\\ &= \begin{cases} z_i \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| \leq k$},\\ k \frac{z_i}{|z_i|} \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| > k$}. \end{cases} & \quad & \frac{\partial}{\partial x_j} = \frac{\partial}{\partial z_i} \frac{\partial z_i}{\partial x_j}\\ &= \rho_1'(z_i) \frac{\partial z_i}{\partial x_j}\end{split}\]

(b)¶

Let $\rho(z) = \sum_{i = 1}^m \rho_2(z_i)$ where $\rho_2$ is defined as (10.4.5). Applying definition 4.1.1 with the chain rule yields

\[\begin{split}\nabla \rho(z = R(x)) &= \frac{d}{dx} \sum_{i = 1}^m \rho_2(z_i)\\ &= \sum_{i = 1}^m \rho_2'(z_i) \nabla z_i & \quad & \text{(a) and (b.1)}\end{split}\]

(b.1)¶

\[\begin{split}\frac{\partial}{\partial x_j} \rho_2(z_i) &= \begin{cases} \frac{\partial}{\partial x_j} \frac{k^2}{6} \left( 1 - \left( 1 - \left( \frac{z_i}{k} \right)^2 \right)^3 \right), & \text{$|z_i| \leq k$},\\ \frac{\partial}{\partial x_j} \frac{k^2}{6}, & \text{$|z_i| > k$}. \end{cases}\\ &= \begin{cases} z_i \left( 1 - \left( \frac{z_i}{k} \right)^2 \right)^2 \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| \leq k$},\\ 0, & \text{$|z_i| > k$}. \end{cases} & \quad & \frac{\partial}{\partial x_j} = \frac{\partial}{\partial z_i} \frac{\partial z_i}{\partial x_j}\\ &= \rho_2'(z_i) \frac{\partial z_i}{\partial x_j}\end{split}\]

References

BOCL97: Ilan Bar-On, Bruno Codenotti, and Mauro Leoncini. A fast parallel cholesky decomposition algorithm for tridiagonal symmetric matrices. SIAM Journal on Matrix Analysis and Applications, 18(2):403–418, 1997.