Methods for Problems with Special Structure¶
Sparse Secant Methods¶
Sparse symmetric secant updates do not necessarily preserve positive definiteness.
Experiments with these updates did not yield good results.
Feasible solutions are either the finite-difference technique of Section 11.1 or conjugate gradient.
Conjugate gradient are inefficient for small problems, but have proven to be a leading contender for large problems.
Exercise 1¶
From definition \(4.1.7\), \(J(x) = F'(x) \in \mathbb{R}^{n \times n}\) because \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\).
For a banded matrix, \(J(x)_{ij} = F'(x)_{ij} = \frac{\partial f_i}{\partial x_j}(x) = 0\) if \(\left\vert i - j \right\vert > m\) and \(m < \lfloor \frac{n}{2} \rfloor\).
As shown in (4.2.1), evaluating a finite difference along parameter \(j\)’s dimension (\(h_j e_j\)) is equivalent to approximating the \(j\text{th}\) column of \(J(x)\).
Generalizing the technique in Section 11.1 gives
which means \((1 + 2m)\) additional evaluations of \(F(x)\) are required at each iteration to approximate \(J(x)\).
[1]:
def printSparseMatrix(n, m):
m = min(m, n / 2)
for i in range(n):
for j in range(n):
if abs(i - j) > m:
print(' 0', end='')
else:
print(' 1', end='')
print('\n')
printSparseMatrix(8, 3)
1 1 1 1 0 0 0 0
1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 0
0 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1
0 0 0 1 1 1 1 1
0 0 0 0 1 1 1 1
Exercise 2¶
\(\Gamma = \varnothing\) when \(J(x)\) is not symmetric because each column will have at least two non-zero elements. Thus this technique will require the full five evaluations of \(F(x)\) in addition to \(F(x_c)\) to approximate \(J(x)\).
When \(J(x)\) is symmetric, \(F(x_c + h_5 e_5)\) could be evaluated to indirectly approximate the bottom row of \(J(x)\). The left four columns of \(J(x)\) can be evaluated simultaneously using \(w = F(x_c + \sum_{j = 1}^4 h_j e_j)\) while ignoring the result w_5.
Exercise 3¶
Let \(A_{i.}, B_{i.} \in \mathbb{R}^n\) denote the \(i\text{th}\) row of each matrix and \(\mathrm{b_i} = B_{i.}\) be defined by (11.2.6). This minimization problem is (11.2.4) with one fewer constraint:
where \(\mathcal{Z}_{i1} = \{ j \mid Z_{ij} = 1 \}\) and \(\mathcal{Z}_{i0} = \{ j \mid Z_{ij} = 0 \}\). By inspection, this is minimized when \(B = P_Z(A)\).
Exercise 4¶
Exercise 5¶
Notice that applying lemma 4.1.9 yields
A component-by-component application of (4.1.2) (see Exercise 4.4) reveals
where \(\mathrm{s_i}\) denotes the result of imposing on \(s\) on the sparsity pattern of the \(i\text{th}\) row of \(F'(x) = J(x) \in SP(Z)\). When \(\mathrm{s_i} = 0\), the entire integral evaluates to \(0 = y_i\).
Exercise 6¶
[2]:
import numpy
n = 5
A = numpy.zeros((n, n))
for i in range(n):
for j in range(n):
v = 4
_ = abs(i - j)
if _ == 1:
v = 1
elif _ > 1:
v = 0
A[i,j] = v
print(A)
print(numpy.linalg.inv(A))
[[4. 1. 0. 0. 0.]
[1. 4. 1. 0. 0.]
[0. 1. 4. 1. 0.]
[0. 0. 1. 4. 1.]
[0. 0. 0. 1. 4.]]
[[ 0.26794872 -0.07179487 0.01923077 -0.00512821 0.00128205]
[-0.07179487 0.28717949 -0.07692308 0.02051282 -0.00512821]
[ 0.01923077 -0.07692308 0.28846154 -0.07692308 0.01923077]
[-0.00512821 0.02051282 -0.07692308 0.28717949 -0.07179487]
[ 0.00128205 -0.00512821 0.01923077 -0.07179487 0.26794872]]
Exercise 7¶
The Thomas algorithm can be used to solve the tridiagonal system of \(n\) unknowns, but it’s not stable in general. Since \(A\) is symmetric positive definite, another approach is to rewrite \(A = LL^\top\) (Cholesky decomposition) and compute the resulting quantities [BOCL97].
Exercise 8¶
Let \(f(x) = \rho(R(x))\) where \(R \colon \mathbb{R}^n \rightarrow \mathbb{R}^m\) and \(\rho \colon \mathbb{R}^m \rightarrow \mathbb{R}\).
Only \(\nabla \rho(R(x))\) was derived because the derivation for \(\nabla^2 \rho(R(x))\) is essentially Exercise 6.17 and Exercise 6.18. See (11.3.4) for a more generalized derivation.
(a)¶
Recall that \(\lvert x \rvert = \sqrt{x^2}\) and \(\frac{d}{dx} \lvert x \rvert = \frac{d}{dx} \sqrt{x^2} = \frac{x}{\lvert x \rvert}\).
Let \(\rho(z) = \sum_{i = 1}^m \rho_1(z_i)\) where \(\rho_1\) is defined as (10.4.4). Applying definition 4.1.1 with the chain rule yields
(a.1)¶
(b)¶
Let \(\rho(z) = \sum_{i = 1}^m \rho_2(z_i)\) where \(\rho_2\) is defined as (10.4.5). Applying definition 4.1.1 with the chain rule yields
(b.1)¶
References
- BOCL97
Ilan Bar-On, Bruno Codenotti, and Mauro Leoncini. A fast parallel cholesky decomposition algorithm for tridiagonal symmetric matrices. SIAM Journal on Matrix Analysis and Applications, 18(2):403–418, 1997.