Methods for Problems with Special Structure

Sparse Secant Methods

  • Sparse symmetric secant updates do not necessarily preserve positive definiteness.

    • Experiments with these updates did not yield good results.

  • Feasible solutions are either the finite-difference technique of Section 11.1 or conjugate gradient.

    • Conjugate gradient are inefficient for small problems, but have proven to be a leading contender for large problems.

Exercise 1

From definition \(4.1.7\), \(J(x) = F'(x) \in \mathbb{R}^{n \times n}\) because \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\).

For a banded matrix, \(J(x)_{ij} = F'(x)_{ij} = \frac{\partial f_i}{\partial x_j}(x) = 0\) if \(\left\vert i - j \right\vert > m\) and \(m < \lfloor \frac{n}{2} \rfloor\).

As shown in (4.2.1), evaluating a finite difference along parameter \(j\)’s dimension (\(h_j e_j\)) is equivalent to approximating the \(j\text{th}\) column of \(J(x)\).

Generalizing the technique in Section 11.1 gives

\[\{ j \mid j = k \bmod (1 + 2m) \},\]

which means \((1 + 2m)\) additional evaluations of \(F(x)\) are required at each iteration to approximate \(J(x)\).

[1]:
def printSparseMatrix(n, m):
    m = min(m, n / 2)
    for i in range(n):
        for j in range(n):
            if abs(i - j) > m:
                print(' 0', end='')
            else:
                print(' 1', end='')
        print('\n')

printSparseMatrix(8, 3)
 1 1 1 1 0 0 0 0

 1 1 1 1 1 0 0 0

 1 1 1 1 1 1 0 0

 1 1 1 1 1 1 1 0

 0 1 1 1 1 1 1 1

 0 0 1 1 1 1 1 1

 0 0 0 1 1 1 1 1

 0 0 0 0 1 1 1 1

Exercise 2

\(\Gamma = \varnothing\) when \(J(x)\) is not symmetric because each column will have at least two non-zero elements. Thus this technique will require the full five evaluations of \(F(x)\) in addition to \(F(x_c)\) to approximate \(J(x)\).

When \(J(x)\) is symmetric, \(F(x_c + h_5 e_5)\) could be evaluated to indirectly approximate the bottom row of \(J(x)\). The left four columns of \(J(x)\) can be evaluated simultaneously using \(w = F(x_c + \sum_{j = 1}^4 h_j e_j)\) while ignoring the result w_5.

Exercise 3

Let \(A_{i.}, B_{i.} \in \mathbb{R}^n\) denote the \(i\text{th}\) row of each matrix and \(\mathrm{b_i} = B_{i.}\) be defined by (11.2.6). This minimization problem is (11.2.4) with one fewer constraint:

\[\begin{split}\left\Vert B - A \right\Vert_F^2 &= \sum_{i = 1}^n \left\Vert B_{i.} - A_{i.} \right\Vert_2^2\\ &= \sum_{i = 1}^n \sum_{j = 1}^n (\mathrm{b_i}_j - A_{ij})^2\\ &= \sum_{i = 1}^n \left( \sum_{k \in \mathcal{Z}_{i1}} (\mathrm{b_i}_k - A_{ik})^2 + \sum_{l \in \mathcal{Z}_{i0}} A_{il}^2 \right)\end{split}\]

where \(\mathcal{Z}_{i1} = \{ j \mid Z_{ij} = 1 \}\) and \(\mathcal{Z}_{i0} = \{ j \mid Z_{ij} = 0 \}\). By inspection, this is minimized when \(B = P_Z(A)\).

Exercise 4

\[\begin{split}\left\Vert A_{+_{i.}} - A_{i.} \right\Vert_2 &= \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ (y - As)_i \mathrm{s_i}^\top \right\Vert_2 & \quad & \text{(11.2.9)}\\ &= \left\Vert (y - As)_i \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{(3.1.17)}\\ &= \left\Vert (B_{i.} - A_{i.})^\top \mathrm{s_i} \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{(11.2.8)}\\ &\leq \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 \left\Vert \mathrm{s_i} \right\Vert_2 \left\Vert (\mathrm{s_i}^\top \mathrm{s_i})^+ \mathrm{s_i} \right\Vert_2 & \quad & \text{Cauchy-Schwarz inequality}\\ &= \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 (\mathrm{s_i}^\top \mathrm{s_i})^+ \left\Vert \mathrm{s_i} \right\Vert_2^2 & \quad & \text{(3.1.1b)}\\ &= \left\Vert (B_{i.} - A_{i.}) \right\Vert_2 & \quad & \text{(3.6.6)}\end{split}\]

Exercise 5

Notice that applying lemma 4.1.9 yields

\[y = F(x_+) - F(x) = F(x + s) - F(x) = \int_0^1 J(x + ts) s dt.\]

A component-by-component application of (4.1.2) (see Exercise 4.4) reveals

\[y_i = F(x_+)_i - F(x)_i = F(x + s)_i - F(x)_i = \int_0^1 \nabla f_i(x + ts)^\top \mathrm{s_i} dt\]

where \(\mathrm{s_i}\) denotes the result of imposing on \(s\) on the sparsity pattern of the \(i\text{th}\) row of \(F'(x) = J(x) \in SP(Z)\). When \(\mathrm{s_i} = 0\), the entire integral evaluates to \(0 = y_i\).

Exercise 6

[2]:
import numpy

n = 5
A = numpy.zeros((n, n))
for i in range(n):
    for j in range(n):
        v = 4
        _ = abs(i - j)
        if _ == 1:
            v = 1
        elif _ > 1:
            v = 0
        A[i,j] = v

print(A)
print(numpy.linalg.inv(A))
[[4. 1. 0. 0. 0.]
 [1. 4. 1. 0. 0.]
 [0. 1. 4. 1. 0.]
 [0. 0. 1. 4. 1.]
 [0. 0. 0. 1. 4.]]
[[ 0.26794872 -0.07179487  0.01923077 -0.00512821  0.00128205]
 [-0.07179487  0.28717949 -0.07692308  0.02051282 -0.00512821]
 [ 0.01923077 -0.07692308  0.28846154 -0.07692308  0.01923077]
 [-0.00512821  0.02051282 -0.07692308  0.28717949 -0.07179487]
 [ 0.00128205 -0.00512821  0.01923077 -0.07179487  0.26794872]]

Exercise 7

The Thomas algorithm can be used to solve the tridiagonal system of \(n\) unknowns, but it’s not stable in general. Since \(A\) is symmetric positive definite, another approach is to rewrite \(A = LL^\top\) (Cholesky decomposition) and compute the resulting quantities [BOCL97].

Exercise 8

Let \(f(x) = \rho(R(x))\) where \(R \colon \mathbb{R}^n \rightarrow \mathbb{R}^m\) and \(\rho \colon \mathbb{R}^m \rightarrow \mathbb{R}\).

Only \(\nabla \rho(R(x))\) was derived because the derivation for \(\nabla^2 \rho(R(x))\) is essentially Exercise 6.17 and Exercise 6.18. See (11.3.4) for a more generalized derivation.

(a)

Recall that \(\lvert x \rvert = \sqrt{x^2}\) and \(\frac{d}{dx} \lvert x \rvert = \frac{d}{dx} \sqrt{x^2} = \frac{x}{\lvert x \rvert}\).

Let \(\rho(z) = \sum_{i = 1}^m \rho_1(z_i)\) where \(\rho_1\) is defined as (10.4.4). Applying definition 4.1.1 with the chain rule yields

\[\begin{split}\nabla \rho(z = R(x)) &= \frac{d}{dx} \sum_{i = 1}^m \rho_1(z_i)\\ &= \sum_{i = 1}^m \begin{bmatrix} \frac{\partial}{\partial x_1} \rho_1(z_i)\\ \vdots\\ \frac{\partial}{\partial x_n} \rho_1(z_i)\\ \end{bmatrix}\\ &= \sum_{i = 1}^m \rho_1'(z_i) \begin{bmatrix} \frac{\partial z_i}{\partial x_1}\\ \vdots\\ \frac{\partial z_i}{\partial x_n}\\ \end{bmatrix} & \quad & \text{(a.1)}\\ &= \sum_{i = 1}^m \rho_1'(z_i) \nabla z_i.\end{split}\]

(a.1)

\[\begin{split}\frac{\partial}{\partial x_j} \rho_1(z_i) &= \begin{cases} \frac{\partial}{\partial x_j} \frac{z_i^2}{2}, & \text{$|z_i| \leq k$},\\ \frac{\partial}{\partial x_j} \left( k |z_i| - \frac{k^2}{2} \right), & \text{$|z_i| > k$}. \end{cases}\\ &= \begin{cases} z_i \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| \leq k$},\\ k \frac{z_i}{|z_i|} \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| > k$}. \end{cases} & \quad & \frac{\partial}{\partial x_j} = \frac{\partial}{\partial z_i} \frac{\partial z_i}{\partial x_j}\\ &= \rho_1'(z_i) \frac{\partial z_i}{\partial x_j}\end{split}\]

(b)

Let \(\rho(z) = \sum_{i = 1}^m \rho_2(z_i)\) where \(\rho_2\) is defined as (10.4.5). Applying definition 4.1.1 with the chain rule yields

\[\begin{split}\nabla \rho(z = R(x)) &= \frac{d}{dx} \sum_{i = 1}^m \rho_2(z_i)\\ &= \sum_{i = 1}^m \rho_2'(z_i) \nabla z_i & \quad & \text{(a) and (b.1)}\end{split}\]

(b.1)

\[\begin{split}\frac{\partial}{\partial x_j} \rho_2(z_i) &= \begin{cases} \frac{\partial}{\partial x_j} \frac{k^2}{6} \left( 1 - \left( 1 - \left( \frac{z_i}{k} \right)^2 \right)^3 \right), & \text{$|z_i| \leq k$},\\ \frac{\partial}{\partial x_j} \frac{k^2}{6}, & \text{$|z_i| > k$}. \end{cases}\\ &= \begin{cases} z_i \left( 1 - \left( \frac{z_i}{k} \right)^2 \right)^2 \frac{\partial z_i}{\partial x_j}, & \text{$|z_i| \leq k$},\\ 0, & \text{$|z_i| > k$}. \end{cases} & \quad & \frac{\partial}{\partial x_j} = \frac{\partial}{\partial z_i} \frac{\partial z_i}{\partial x_j}\\ &= \rho_2'(z_i) \frac{\partial z_i}{\partial x_j}\end{split}\]

References

BOCL97

Ilan Bar-On, Bruno Codenotti, and Mauro Leoncini. A fast parallel cholesky decomposition algorithm for tridiagonal symmetric matrices. SIAM Journal on Matrix Analysis and Applications, 18(2):403–418, 1997.