Secant Methods for Systems of Nonlinear Equations

Exercise 1

Given \(s_i, y_i \in \mathbb{R}^n\) for \(i = 1, \ldots, n\) where \(s_i\) are linearly independent, \(A \in \mathbb{R}^{n \times n}\) can be uniquely determined using the secant equations

\[\begin{split}\begin{bmatrix} A_{1,1} & \cdots & A_{1, n}\\ \vdots & \ddots & \vdots\\ A_{n,1} & \cdots & A_{n, n} \end{bmatrix} \begin{bmatrix} s_1 & \cdots & s_n \end{bmatrix} &= \begin{bmatrix} y_1 & \cdots & y_n \end{bmatrix}\\ A S &= Y\\ A &= Y S^{-1}.\end{split}\]

This is a bad way to form the model because it only works when \(S\) is invertible i.e. the columns of \(S\) are linearly independent.

Exercise 2

Let \(A = \begin{bmatrix} 4 & 1\\ -1 & 1 \end{bmatrix}\), \(s = \begin{bmatrix} 1\\ 0 \end{bmatrix}\), and \(y = \begin{bmatrix} 5\\ -2 \end{bmatrix}\).

One way to solve this is to invoke Lemma 8.1.1 since its assumptions are satisfied. The following derivations does not use that lemma.

Given \(M \in Q(y, s)\), its form must be \(\begin{bmatrix} 5 & a\\ -2 & b\end{bmatrix}\) where \(a, b \in \mathbb{R}\). The minimization problem becomes

\[\begin{split}\min_{M \in Q(y, s)} \left\Vert M - A \right\Vert = \min_{a, b} \left\Vert \begin{bmatrix} 1 & a - 1\\ -1 & b - 1 \end{bmatrix} \right\Vert.\end{split}\]

(a)

Minimizing over the \(l_2\) operator norm (3.1.8b) gives

\[\begin{split}\left\Vert M - A \right\Vert_2 &= \max_{\sqrt{\lambda}} \begin{bmatrix} 2 & a - b\\ a - b & (a - 1)^2 + (b - 1)^2 \end{bmatrix}\\ &= \max_{\lambda} \left\{ \frac{a^2}{2} - a + \frac{b^2}{2} - b + 2 \pm \frac{1}{2} \sqrt{\left( a^2 + b^2 \right) \left( a^2 - 4a + b^2 - 4b + 8 \right)} \right\}^{1/2}\\ &= \left( \frac{a^2}{2} - a + \frac{b^2}{2} - b + 2 + \frac{1}{2} \sqrt{\left( a^2 + b^2 \right) \left( a^2 - 4a + b^2 - 4b + 8 \right)} \right)^{1/2}\\ &= \frac{1}{\sqrt{2}} \left( (a - 1)^2 + (b - 1)^2 + 2 + \sqrt{a^2 + b^2} \sqrt{(a - 2)^2 + (b - 2)^2} \right)^{1/2}\\ &= \frac{1}{\sqrt{2}} \left( 2 (\beta - 1)^2 + 2 + 2 \sqrt{\beta^2 \left( \beta - 2 \right)^2} \right)^{1/2} \qquad & \quad & a = b = \beta\\ &= \left( (\beta - 1)^2 + 1 + \sqrt{\beta^2 \left( \beta - 2 \right)^2} \right)^{1/2}.\end{split}\]

Plotting the last equation shows that the function is at a global minimum \(\sqrt{2}\) when \(\beta \in [0, 2]\).

Replacing \(M\) with \(A_+ = A + \begin{bmatrix} 1 & \alpha\\ -1 & \alpha \end{bmatrix}\) yields

\[\begin{split}\left\Vert A_+ - A \right\Vert_2 &= \max_{\sqrt{\lambda}} \begin{bmatrix} 2 & 0\\ 0 & 2 \alpha^2 \end{bmatrix}\\ &= \max_{\lambda} \left\{ \alpha^2 + 1 \pm \sqrt{(\alpha - 1)^2 (\alpha + 1)^2} \right\}^{1/2}\\ &= \left( \alpha^2 + 1 + \sqrt{(\alpha - 1)^2 (\alpha + 1)^2} \right)^{1/2}\\ &= \left( (\alpha - 1) (\alpha + 1) + 2 + \sqrt{(\alpha - 1)^2 (\alpha + 1)^2} \right)^{1/2}.\end{split}\]

Plotting the last equation shows that the function is at a global minimum \(\sqrt{2}\) when \(\alpha \in [-1, 1]\).

(b)

Minimizing over the Frobenius matrix norm (3.1.6) yields

\[\left\Vert M - A \right\Vert_F = \sqrt{1^2 + (a - 1)^2 + (b - 1)^2 + (-1)^2}\]

and

\[\left\Vert A_+ - A \right\Vert_F = \sqrt{1^2 + 2 \alpha^2 + (-1)^2}\]

with both achieving a global minimum \(\sqrt{2}\) when \(a = b = 1\) and \(\alpha = 0\) respectively.

(c)

If \(F(x)\) is linear in \(x_2\), then the Frobenius matrix norm should be used because there is a single minimum which serves to isolate the changes in \(x_2\).

Exercise 3

Even though Broyden’s method converges to a root, the relative error with respect to \(\frac{\partial f_2}{\partial x_1}\) is huge; the others are within \([0.0, 0.5]\) with \(0.0\) applicable only to the linear function \(f_1\).

[1]:
import numpy

def broydens_update(F, A_k, x_k):
    s_k = numpy.linalg.solve(A_k, -F(x_k))
    x_next = x_k + s_k
    y_k = F(x_next) - F(x_k)
    A_next = A_k + (y_k - A_k * s_k) * s_k.T / (s_k.T * s_k)[0]
    return A_next, x_next

def F(x):
    x = x.flat
    return numpy.asmatrix([[x[0] + x[1] - 3],
                           [x[0]**2 + x[1]**2 - 9]])

def J(x):
    x = x.flat
    return numpy.asmatrix([[1, 1],
                           [2 * x[0], 2 * x[1]]])

macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[2], [7]])
A_k = J(x_k)
for i in range(16):
    _ = broydens_update(F, A_k, x_k)
    norm = numpy.linalg.norm(_[1] - x_k)
    print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
    if norm <= macheps:
        A_k, x_k = _
        break
    A_k, x_k = _
print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))
L2-Norm(x_1 - x_0): 4.47213595499958
L2-Norm(x_2 - x_1): 2.357022603955159
L2-Norm(x_3 - x_2): 0.29462782549439437
L2-Norm(x_4 - x_3): 0.15973797767768386
L2-Norm(x_5 - x_4): 0.016359786480031936
L2-Norm(x_6 - x_5): 0.0006762158471433808
L2-Norm(x_7 - x_6): 2.714857331443653e-06
L2-Norm(x_8 - x_7): 4.344464330780998e-10

x_k: [5.32388602e-17 3.00000000e+00]

A_k:
[[1.         1.        ]
 [2.99999808 9.00000192]]

J(x_k):
[[1.0000000e+00 1.0000000e+00]
 [1.0647772e-16 6.0000000e+00]]

Exercise 4

Let \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\) for which \(f_1, \ldots, f_m\) are linear with \(m < n\).

Let \(A_0 = J(x_0)\) where \(x_0 \in \mathbb{R}^n\). As illustrated in (2.2.1), the extension of the local affine approximation to higher dimensions (5.1.1) will be exact for \(f_1, \ldots, f_m\) such that

\[f_1(x_1) = \cdots = f_m(x_1) = 0 \quad \text{where} \quad x_1 = x_0 + s_0 \quad \text{with} \quad s_0 = -A_0^{-1} F(x_0) = -J_0^{-1}(x_0) F(x_0).\]

The next step in Broyden’s method computes

\[\begin{split}A_1 &= A_0 + \frac{y_0 - A_0 s_0}{s_0^\top s_0} s_0^\top\\ &= J(x_0) + \frac{F(x_1)}{s_0^\top s_0} s_0^\top & \quad & y_0 = F(x_1) - F(x_0) \text{ and } A_0 s_0 = -F(x_0).\end{split}\]

By inspection, the first \(m\) rows of the outer product \(F(x_1) s_0^\top\) contains only zeros such that \((A_1)_i = (A_0)_i = J(x_0)_i\) for \(i = 1, \ldots, m\).

An equivalent view of Newton’s method is

\[(M_c)_i(x_c + s) = f_i(x_c) + \nabla f_i(x_c)^\top s\]

where \(M_c\) denotes the current model. From the intuition of the method in lower dimensions, it is clear that

\[f_1(x_k) = \cdots = f_m(x_k) = 0 \text{ for } k \geq 1\]

and consequently

\[(A_k)_i = (A_0)_i \text{ for } k \geq 0 \text{ and } i = 1, \ldots, m.\]

For the finite difference case, Theorem 5.4.1 states that the \(i^\text{th}\) component has an error of \(\mathcal{O}(h_i)\) when using forward differencing (5.6.5) and \(\mathcal{O}(h_i^2)\) when using central differencing (5.6.6).

Exercise 5

Let \(F(x) = Jx + b\) where \(J \in \mathbb{R}^{n \times n}\), \(b \in \mathbb{R}^n\), and \(x \in \mathbb{R}^n\).

The proof is by contradiction. Assume \(x_+, x_c \in \mathbb{R}^n\) and \(x_+ \neq x_c\) with

\[\begin{split}s_c &= x_+ - x_c,\\\\ y_c &= F(x_+) - F(x_c),\\\\ J \not\in Q(y_c, s_c) &= \left\{ B \in \mathbb{R}^{n \times n} \mid Bs = y \right\}.\end{split}\]

Consequently, the following must hold

\[\begin{split}y_c &= F(x_+) - F(x_c)\\ &= J x_+ + b - J x_c - b\\ &= J (x_+ - x_c)\\ &= J s_c,\end{split}\]

but this contradicts the initial assumptions. Thus \(J \in Q(y_c, s_c)\).

Exercise 6

Let the assumptions of Lemma 8.2.1 be satisfied and define

\[\begin{split}J_+ &\triangleq J(x_+),\\\\ F_+ &\triangleq F(x_+),\\\\ F_c &\triangleq F(x_c),\\\\ s_c &= x_+ - x_c,\\\\ y_c &= F_+ - F_c.\end{split}\]
\[\begin{split}A_+ &= A_c + \frac{\left( y_c - A_c s_c \right) s_c^\top}{s_c^\top s_c} & \quad & \text{(8.1.5)}\\ A_+ - J_+ &= A_c - J_+ + \frac{\left( y_c - A_c s_c \right) s_c^\top}{s_c^\top s_c}\\ &= \left( A_c - J_+ \right) \left[ I - \frac{s_c s_c^\top}{s_c^\top s_c} \right] + \frac{\left( y_c - J_+ s_c \right) s_c^\top}{s_c^\top s_c} & \quad & \text{(8.2.4)}\\ \left\Vert A_+ - J_+ \right\Vert &= \left\Vert \left( A_c - J_+ \right) \left[ I - \frac{s_c s_c^\top}{s_c^\top s_c} \right] + \frac{\left( y_c - J_+ s_c \right) s_c^\top}{s_c^\top s_c} \right\Vert\\ &\leq \left\Vert \left( A_c - J_+ \right) \left[ I - \frac{s_c s_c^\top}{s_c^\top s_c} \right] \right\Vert + \left\Vert \frac{\left( y_c - J_+ s_c \right) s_c^\top}{s_c^\top s_c} \right\Vert & \quad & \text{(3.1.1c)}\\ &\leq \left\Vert A_c - J_+ \right\Vert \left\Vert I - \frac{s_c s_c^\top}{s_c^\top s_c} \right\Vert_2 + \left\Vert \frac{\left( y_c - J_+ s_c \right) s_c^\top}{s_c^\top s_c} \right\Vert & \quad & \text{(3.1.10) and (3.1.15)}\\ &\leq \left\Vert A_c - J_c + J_c - J_+ \right\Vert + \left\Vert y_c - J_+ s_c \right\Vert_2 \left\Vert \frac{s_c}{s_c^\top s_c} \right\Vert_2 & \quad & \text{(3.1.17)}\\ &\leq \left\Vert A_c - J_c \right\Vert + \left\Vert J_c - J_+ \right\Vert + \left\Vert F_+ - F_c - J_+ s_c \right\Vert_2 \left\Vert s_c \right\Vert_2^{-1} & \quad & \text{(3.1.1c) and (3.1.2c)}\\ &\leq \left\Vert A_c - J_c \right\Vert + \gamma \left\Vert x_c - x_+ \right\Vert + \frac{\gamma}{2} \left( \left\Vert x_+ - x_+ \right\Vert_2 + \left\Vert x_c - x_+ \right\Vert_2 \right) \left\Vert s_c \right\Vert_2 \left\Vert s_c \right\Vert_2^{-1} & \quad & J(x) \in \text{Lip}_\gamma(D) \text{ and Lemma 4.1.15}\\ &\leq \left\Vert A_c - J_c \right\Vert + \frac{3 \gamma}{2} \left\Vert x_+ - x_c \right\Vert_2 & \quad & \text{(3.1.1c), (3.1.2c), and (3.1.6).}\end{split}\]

For a proof of (3.1.1c), see Exercise 3.4. Exercise 7 may be helpful towards understanding the application of (3.1.17).

Exercise 7

Suppose \(s \in \mathbb{R}^n \setminus \{ 0 \}\) and \(I \in \mathbb{R}^{n \times n}\) is the identity matrix.

In order to simplify the derivations, rewrite the original problem as

\[I - \frac{s s^\top}{s^\top s} = I - \frac{s s^\top}{\left\Vert s \right\Vert_2^2} = I - u u^\top\]

where \(u = \frac{s}{\left\Vert s \right\Vert_2}\) and \(u^\top u = 1\).

By definition, the resulting (projection) matrix and its summands are idempotent

\[\begin{split}\left( I - u u^\top \right) \left( I - u u^\top \right) &= I - 2u u^\top + u u^\top u u^\top\\ &= I - 2u u^\top + u u^\top\\ &= I - u u^\top\end{split}\]

and symmetric

\[\left( I - u u^\top \right)^\top = I - u u^\top.\]

The rank of the matrix is

\[\begin{split}\text{rank}(I - u u^\top) &= \text{tr}(I - u u^\top) & \quad & \text{the trace of an idempotent matrix is its rank}\\ &= \text{tr}(I) - \text{tr}(u u ^\top) & \quad & \text{trace is a linear mapping}\\ &= n - \sum_{i = 1}^n u_i^2\\ &= n - u^\top u\\ &= n - 1.\end{split}\]

Let \(A = I - u u^\top\). Recall that the relation between an eigenvalue \(\lambda\) and a non-zero eigenvector \(v\) for \(A\) is \(Av = \lambda v\). Since \(A\) is idempotent, the following must hold:

\[\begin{split}\lambda v = Av = A^2 v = A (Av) &= \lambda^2 v\\ \boldsymbol{0} &= \lambda (\lambda - 1) v\end{split}\]

i.e. the eigenvalues of \(A\) must be either \(0\) or \(1\).

By Theorem 3.5.6 and

\[\text{rank}(A) = n - 1 = \text{tr}(A) = \sum_{i = 1}^n \lambda_i\]

where \(\lambda_i\) are the eigenvalues of \(A\),

\[\left\Vert I - \frac{s s^\top}{s^\top s} \right\Vert_2 = \max_{1 \leq i \leq n} \left\vert \lambda_i \right\vert = 1.\]

Exercise 8

Let \(D \subseteq \mathbb{R}^n\) be an open convex set, \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\), \(J(x) \in \text{Lip}_\gamma(D)\), \(x_* \in D\), and \(J(x_*)\) be nonsingular.

Let \(\{ A_k \in \mathbb{R}^{n \times n} \}\) be a sequence of nonsingular matrices, and suppose for some \(x_0 \in D\) that the sequence of points generated by \(x_{k + 1} = x_k - A_k^{-1} F(x_k)\) remains in \(D\) and satisfies \(\lim_{k \rightarrow \infty} x_k = x_*\) for any \(k\).

Define \(J(x_k) = J_k\), \(e_k = x_k - x_*\), \(F(x_k) = F_k\), and \(s_k \triangleq x_{k + 1} - x_k\).

A useful fact is

\[\begin{split}\lim_{k \rightarrow \infty} \left\Vert s_k \right\Vert &= \lim_{k \rightarrow \infty} \left\Vert x_{k + 1} - x_k \right\Vert\\ &= \lim_{k \rightarrow \infty} \left\Vert x_* - x_* \right\Vert\\ &= \left\Vert \lim_{k \rightarrow \infty} x_* - x_* \right\Vert & \quad & \left\Vert \cdot \right\Vert \text{ is continuous}\\ &= 0.\end{split}\]

(a)

If \(\lim_{k \rightarrow \infty} \frac{\left\Vert (A_k - J_k) s_k \right\Vert}{\left\Vert s_k \right\Vert} = 0\) where \(s_k \triangleq x_{k + 1} - x_k\), then \(\{ x_k \}\) converges \(q\)-superlinearly to \(x_*\) in some norm \(\left\Vert \cdot \right\Vert\) and \(F_* = 0\).

\[\begin{split}x_{k + 1} &= x_k - A_k^{-1} F_k & \quad & \text{(8.2.15)}\\ 0 &= A_k s_k + F_k\\ 0 &= \left( A_k - J_k \right) s_k + F_k + J_k s_k\\ -F_{k + 1} &= \left( A_k - J_k \right) s_k - F_{k + 1} + F_k + J_k s_k\\ \left\Vert -F_{k + 1} \right\Vert &= \left\Vert \left( A_k - J_k \right) s_k - F_{k + 1} + F_k + J_k s_k \right\Vert\\ \left\Vert F_{k + 1} \right\Vert &\leq \left\Vert \left( A_k - J_k \right) s_k \right\Vert + \left\Vert F_{k + 1} - F_k - J_k s_k \right\Vert & \quad & \text{(3.1.1c) and (3.1.10)}\\ \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} &\leq \frac{ \left\Vert \left( A_k - J_k \right) s_k \right\Vert }{ \left\Vert s_k \right\Vert } + \frac{\gamma}{2}\left\Vert s_k \right\Vert & \quad & \text{Lemma 4.1.15}\\ \lim_{k \rightarrow \infty} \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} &\leq 0 & \quad & \text{see initial assumptions.}\end{split}\]

Thus \(F_* = \lim_{k \rightarrow \infty} F_k = 0\). The sequence \(\{ x_k \}\) converges \(q\)-superlinearly because

\[\begin{split}0 &= \lim_{k \rightarrow \infty} \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert}\\ &= \lim_{k \rightarrow \infty} \frac{ \left\Vert F_{k + 1} - F_* \right\Vert }{ \left\Vert x_{k + 1} - x_k \right\Vert }\\ &\geq \lim_{k \rightarrow \infty} \alpha \frac{ \left\Vert e_{k + 1} \right\Vert }{ \left\Vert x_{k + 1} - x_* + x_* - x_k \right\Vert } & \quad & \text{Lemma 4.1.16}\\ &\geq \lim_{k \rightarrow \infty} \alpha \frac{ \left\Vert e_{k + 1} \right\Vert }{ \left\Vert e_{k + 1} \right\Vert + \left\Vert e_k \right\Vert } & \quad & \text{(3.1.1c)}\\ &= \lim_{k \rightarrow \infty} \alpha \frac{r_k}{1 + r_k} & \quad & \text{(2.3.2)}\end{split}\]

where \(r_k \triangleq \left\Vert e_{k + 1} \right\Vert / \left\Vert e_k \right\Vert\).

(b)

If \(\{ x_k \}\) converges \(q\)-superlinearly to \(x_*\) in some norm \(\left\Vert \cdot \right\Vert\) and \(F(x_*) = 0\), then \(\lim_{k \rightarrow \infty} \frac{\left\Vert (A_k - J_k) s_k \right\Vert}{\left\Vert s_k \right\Vert} = 0\) where \(s_k \triangleq x_{k + 1} - x_k\).

\[\begin{split}0 &= \lim_{k \rightarrow \infty} \frac{\left\Vert e_{k + 1} \right\Vert}{\left\Vert e_k \right\Vert} & \quad & \text{(2.3.2)}\\ &\geq \lim_{k \rightarrow \infty} \frac{ \left\Vert F_{k + 1} \right\Vert }{ \beta \left\Vert e_k \right\Vert } & \quad & \text{Lemma 4.1.16 states that } \left\Vert F_{k + 1} - F_* \right\Vert \leq \beta \left\Vert e_{k + 1} \right\Vert\\ &= \lim_{k \rightarrow \infty} \beta^{-1} \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} \frac{\left\Vert s_k \right\Vert}{\left\Vert e_k \right\Vert}.\end{split}\]

Applying Lemma 8.2.3 shows that \(\lim_{k \rightarrow \infty} \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} = 0\). Thus

\[\begin{split}x_{k + 1} &= x_k - A_k^{-1} F_k & \quad & \text{(8.2.15)}\\ 0 &= A_k s_k + F_k\\ 0 &= \left( A_k - J_k \right) s_k + F_k + J_k s_k\\ -\left( A_k - J_k \right) s_k &= F_{k + 1} - F_{k + 1} + F_k + J_k s_k\\ \left\Vert -\left( A_k - J_k \right) s_k \right\Vert &= \left\Vert F_{k + 1} - F_{k + 1} + F_k + J_k s_k \right\Vert\\ \left\Vert \left( A_k - J_k \right) s_k \right\Vert &\leq \left\Vert F_{k + 1} \right\Vert + \left\Vert F_{k + 1} - F_k - J_k s_k \right\Vert & \quad & \text{(3.1.1c) and (3.1.10)}\\ \frac{ \left\Vert \left( A_k - J_k \right) s_k \right\Vert }{ \left\Vert s_k \right\Vert } &\leq \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} + \frac{\gamma}{2} \left\Vert s_k \right\Vert & \quad & \text{Lemma 4.1.15}\\ \lim_{k \rightarrow \infty} \frac{ \left\Vert \left( A_k - J_k \right) s_k \right\Vert }{ \left\Vert s_k \right\Vert } &\leq 0 & \quad & \text{see initial assumptions.}\end{split}\]

Exercise 9

The purpose of Example 8.2.6 is to illustrate that Broyden’s method can get relatively close to the true Jacobian.

This exercise is more along the lines of Lemma 8.2.7 where the relative error is much greater: in this case it is between \(39\%\) and \(620\%\).

[2]:
def F(x):
    x = x.flat
    return numpy.asmatrix([[x[0]**2 + x[1]**2 - 2],
                           [numpy.exp(x[0] - 1) + x[1]**3 - 2]])

def J(x):
    x = x.flat
    return numpy.asmatrix([[2 * x[0], 2 * x[1]],
                           [numpy.exp(x[0] - 1), 3 * x[1]**2]])

macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[2], [3]])
A_k = J(x_k)
for i in range(16):
    _ = broydens_update(F, A_k, x_k)
    norm = numpy.linalg.norm(_[1] - x_k)
    print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
    if norm <= macheps:
        A_k, x_k = _
        break
    A_k, x_k = _

print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))

for index, (a, j) in enumerate(zip(A_k.flat, J(x_k).flat)):
    _ = 100 * abs(a - j) / abs(j)
    print('\nRelative Error at {}: {:0.2f}%'.format(index, _))
L2-Norm(x_1 - x_0): 1.6767467315827993
L2-Norm(x_2 - x_1): 0.5245600994611522
L2-Norm(x_3 - x_2): 0.31879902339630917
L2-Norm(x_4 - x_3): 0.09733974405443968
L2-Norm(x_5 - x_4): 0.23265589552166405
L2-Norm(x_6 - x_5): 3.268722459941512
L2-Norm(x_7 - x_6): 3.1691877752357605
L2-Norm(x_8 - x_7): 4.705640972470869
L2-Norm(x_9 - x_8): 0.4705727292535645
L2-Norm(x_10 - x_9): 0.14864052229030497
L2-Norm(x_11 - x_10): 1.9639366915192304
L2-Norm(x_12 - x_11): 1.703473765729639
L2-Norm(x_13 - x_12): 0.08713054352155514
L2-Norm(x_14 - x_13): 0.030090519390857193
L2-Norm(x_15 - x_14): 0.007459734854001399
L2-Norm(x_16 - x_15): 0.0009364321451569183

x_k: [-0.71387461  1.22097779]

A_k:
[[-0.2321455   4.31364145]
 [ 1.29755602  6.22127445]]

J(x_k):
[[-1.42774922  2.44195558]
 [ 0.18016636  4.47236031]]

Relative Error at 0: 83.74%

Relative Error at 1: 76.65%

Relative Error at 2: 620.20%

Relative Error at 3: 39.10%

Exercise 10

The following extension of (2.2.1) to higher dimensions (5.1.1) will be useful in the proof that follows.

Suppose \(F(x) = Ax + b\) is an affine transformation where \(A \in \mathbb{R}^{n \times n}\) and \(F'(x) = A = J(x)\) is nonsingular. For all \(k \geq 1\), \(F(x_k) = 0\) because

\[\begin{split}x_{k + 1} &= x_k - s_k = x_k - J^{-1}(x_k) F(x_k)\\ x_1 &= x_0 - A^{-1} (A x_0 + b) = -A^{-1} b\\ x_2 &= x_1 - A^{-1} \left[ A (-A^{-1} b) + b \right] = -A^{-1} b.\end{split}\]

Let \(F(x) = \begin{bmatrix} J_1 x + b_1\\ F_2(x) \end{bmatrix}\) where \(J_1 \in \mathbb{R}^{m \times n}\), \(b_1 \in \mathbb{R}^m\), and \(F_2 \colon \mathbb{R}^n \rightarrow \mathbb{R}^{n - m}\) is nonlinear.

(a)

In Broyden’s method, the initial \(A_0\) can be calculated analytically as

\[\begin{split}A_0 = J(x) &= \frac{F(x)}{\partial x^\top}\\ &= \begin{bmatrix} J_1\\ F'_2(x) \end{bmatrix} & \quad & F'_2(x) = \frac{\partial F_2(x)}{\partial x^\top}\\ &= \begin{bmatrix} A_{01}\\ A_{02} \end{bmatrix} & \quad & A_{01} = J_1 \text{ and } A_{02} = F'_2(x).\end{split}\]

For the finite difference case, Theorem 5.4.1 states that \(i\text{th}\) component has an error of \(\mathcal{O}(h_i)\) when using forward difference (5.6.5) and \(\mathcal{O}(h_i^2)\) when using central difference (5.6.6).

\[\begin{split}A_1 &= A_0 + \frac{y_0 - A_0 s_0}{s_0^\top s_0} s_0^\top\\ &= J(x_0) + \frac{F(x_1)}{s_0^\top s_0} s_0^\top & \quad y_k = F(x_{k + 1}) - F(x_k)\\ &= \begin{bmatrix} J_1\\ F'_2(x_0) \end{bmatrix} + \left\Vert s_0 \right\Vert_2^{-2} \begin{bmatrix} J_1 x_1 + b_1\\ F_2(x_1) \end{bmatrix} s_0^\top\end{split}\]

By inspection, the first \(m\) rows of the outer product \(F(x_1) s_0^\top\) contains only zeros. Thus by induction \(A_{k1} = J_1 \forall k \geq 0\).

(b)

Generalizing the update sequence

\[\begin{split}A_{k + 1} &= A_k + \frac{y_k - A_k s_k}{s_k^\top s_k} s_k^\top\\ &= \begin{bmatrix} J_1\\ A_{k2} \end{bmatrix} + \left\Vert s_k \right\Vert_2^{-2} \begin{bmatrix} J_1 x_k + b_1\\ F_2(x_k) \end{bmatrix} s_k^\top\end{split}\]

gives

\[A_{(k + 1)2} = A_{k2} + \left\Vert s_k \right\Vert_2^{-2} F_2(x_k) s_k^\top.\]

Consequently,

\[\begin{split}\left( A_{k2} - A_{12} \right) J_1^\top &= \left( \sum_{i = 1}^{k - 1} \left\Vert s_i \right\Vert_2^{-2} F_2(x_i) s_i^\top \right) J_1^\top\\ &= \sum_{i = 1}^{k - 1} \left\Vert s_i \right\Vert_2^{-2} F_2(x_i) \left( J_1 s_i \right)^\top\\ &= \sum_{i = 1}^{k - 1} \left\Vert s_i \right\Vert_2^{-2} F_2(x_i) \left[ J_1 \left( x_{i + 1} - x_i \right) \right]^\top\\ &= 0 & \quad & J_1 x_k + b_1 = 0 \, \forall \, k \geq 1\end{split}\]

implies the sequence \(\{ A_{k2} \}\) will converge to the correct value \(F'_2(x_*)\) when the nonlinear solution intersects with the affine solution \(J_1\).

Exercise 11

The relative error for \(x_0 = (0.5, 0.5)^\top\) is between \(2.8\%\) and \(43.6\%\), which is not as terrible as the starting point for Exercise 9.

[3]:
def F(x):
    x = x.flat
    return numpy.asmatrix([[x[0]**2 + x[1]**2 - 2],
                           [numpy.exp(x[0] - 1) + x[1]**3 - 2]])

def J(x):
    x = x.flat
    return numpy.asmatrix([[2 * x[0], 2 * x[1]],
                           [numpy.exp(x[0] - 1), 3 * x[1]**2]])

macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[0.5], [0.5]])
A_k = J(x_k)
for i in range(16):
    _ = broydens_update(F, A_k, x_k)
    norm = numpy.linalg.norm(_[1] - x_k)
    print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
    if norm <= macheps:
        A_k, x_k = _
        break
    A_k, x_k = _

print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))

for index, (a, j) in enumerate(zip(A_k.flat, J(x_k).flat)):
    _ = 100 * abs(a - j) / abs(j)
    print('\nRelative Error at {}: {:0.2f}%'.format(index, _))
L2-Norm(x_1 - x_0): 2.692582403567252
L2-Norm(x_2 - x_1): 2.8572229427438764
L2-Norm(x_3 - x_2): 0.7723700761333678
L2-Norm(x_4 - x_3): 0.34589773894258735
L2-Norm(x_5 - x_4): 0.06824600043497131
L2-Norm(x_6 - x_5): 0.030690913962507314
L2-Norm(x_7 - x_6): 0.16638292726530998
L2-Norm(x_8 - x_7): 0.045234877304341184
L2-Norm(x_9 - x_8): 0.006719355304038589
L2-Norm(x_10 - x_9): 0.0004239644116932341
L2-Norm(x_11 - x_10): 5.577759379756155e-06
L2-Norm(x_12 - x_11): 6.09851312396262e-07
L2-Norm(x_13 - x_12): 2.9139809614567313e-08
L2-Norm(x_14 - x_13): 1.1194150756256751e-11

x_k: [1. 1.]

A_k:
[[1.93259629 1.94390582]
 [0.56428491 2.63765057]]

J(x_k):
[[2. 2.]
 [1. 3.]]

Relative Error at 0: 3.37%

Relative Error at 1: 2.80%

Relative Error at 2: 43.57%

Relative Error at 3: 12.08%

Exercise 12

Let \(\hat{x} = D_x x\) and \(\hat{F}(\hat{x}) = D_F F(x)\). The following are useful for the proof.

\[\hat{A}_k \hat{s}_k = -\hat{F}(\hat{x}_k) = -D_F F(x) = D_F A_k s_k\]
\[\hat{s}_k = \hat{x}_{k + 1} - \hat{x}_k = D_x x_{k + 1} - D_x x_k = D_x s_k\]
\[\hat{y}_k = \hat{F}(\hat{x}_{k + 1}) - \hat{F}(\hat{x}_k) = D_F F(x_{k + 1}) - D_F F(x_k) = D_F y_k\]

Suppose \(A_k \simeq J_k\).

\[\begin{split}\hat{A}_{k + 1} &= \hat{A}_k + \frac{ \left( \hat{y}_k \hat{A}_k \hat{s}_k \right) \hat{s}_k^\top }{ \hat{s}_k^\top \hat{s}_k }\\\\ D_F J_{k + 1} D_x^{-1} &= D_F J_k D_x^{-1} + \frac{ \left( D_F y_k - D_F A_k s_k \right) s_k^\top D_x }{ s_k^\top D_x^2 s_k }\\\\ A_{k + 1} &= A_k + \frac{\left( y_k - A_k s_k \right) s_k^\top D_x^2}{s_k^\top D_x^2 s_k}\end{split}\]

where the second equality is based on the foregoing results and Exercise 7.4.

Exercise 13

Suppose \(u, v \in \mathbb{R}^n\) and \(A \in \mathbb{R}^{n \times n}\) is nonsingular.

Recall that the matrix determinant lemma states that

\[\begin{split}\DeclareMathOperator{\det}{det} \det\left( \begin{bmatrix} I & 0\\ v^\top & 1 \end{bmatrix} \begin{bmatrix} I + uv^\top & u\\ 0 & 1 \end{bmatrix} \begin{bmatrix} I & 0\\ -v^\top & 1 \end{bmatrix} \right) &= \det\left( \begin{bmatrix} I & u\\ 0 & 1 + v^\top u \end{bmatrix} \right)\\ \det\left( I + uv^\top \right) &= \det\left( 1 + v^\top u \right).\end{split}\]

(a)

\[\begin{split}\det\left( A + uv^\top \right) &= \det\left( A \left[I + A^{-1} uv^\top \right] \right)\\ &= \det(A) \det\left(I + \left( A^{-1} u \right) v^\top \right)\\ &= \det(A) \det\left(1 + v^\top A^{-1} u \right) & \quad & \text{matrix determinant lemma}\end{split}\]

(b)

Observe that \(\left( I + uv^\top \right)^{-1} = I - \frac{u v^\top}{\sigma}\) where \(\sigma \triangleq 1 + v^\top u\) because

\[\begin{split}\left( I + uv^\top \right) \left( I + uv^\top \right)^{-1} &= \left( I + uv^\top \right) \left( I - \frac{1}{\sigma} u v^\top \right)\\ &= I - \frac{uv^\top}{\sigma} + uv^\top - \frac{uv^\top uv^\top}{\sigma}\\ &= I + uv^\top - \frac{\left( 1 + v^\top u \right) uv^\top}{\sigma}\\ &= I.\end{split}\]

Applying the foregoing result gives

\[\begin{split}\left[ A + uv^\top \right]^{-1} &= \left[ A \left(I + A^{-1} uv^\top \right) \right]^{-1}\\ &= \left(I + A^{-1} uv^\top \right)^{-1} A^{-1}\\ &= \left(I + \frac{A^{-1} uv^\top}{1 + v^\top A^{-1} u} \right) A^{-1}\\ &= A^{-1} + \frac{1}{\sigma} A^{-1} uv^\top A^{-1}.\end{split}\]

Exercise 14

See the previous exercises for a basic implementation of Broyden’s method.

Problems that are not smooth and have varying magnitudes in each dimension could cause the secant approximation to fail to converge.

Exercise 15

(a)

Suppose \(B \in Q(y, s)\) such that \(B s = y\) where \(s \neq 0\) and \(B\) is nonsingular.

\[\begin{split}\left\Vert A_+^{-1} - A^{-1} \right\Vert &= \left\Vert \frac{\left( s - A^{-1} y \right) y^\top}{y^\top y} \right\Vert & \quad & \text{(8.4.2)}\\ &= \left\Vert \frac{\left( B^{-1} - A^{-1} \right) y y^\top}{y^\top y} \right\Vert\\ &\leq \left\Vert B^{-1} - A^{-1} \right\Vert \left\Vert \frac{y y^\top}{y^\top y} \right\Vert & \quad & \text{(8.1.6)}\\ &= \left\Vert B^{-1} - A^{-1} \right\Vert & \quad & \text{(8.1.7)}\end{split}\]

(b)

Suppose \(A\) is nonsingular and \(y^\top A s \neq 0\).

\[\begin{split}\left( A_+^{-1} \right)^{-1} &= \left( A^{-1} + \frac{\left( s - A^{-1} y \right) y^\top}{y^\top y} \right)^{-1} & \quad & \text{(8.4.2)}\\ &= A - \left( 1 + \frac{y^\top A \left( s - A^{-1} y \right)}{y^\top y} \right)^{-1} A \frac{\left( s - A^{-1} y \right) y^\top}{y^\top y} A & \quad & \text{Lemma 8.3.1}\\ &= A - \left( \frac{y^\top As}{y^\top y} \right)^{-1} \frac{\left( As - y \right) y^\top A}{y^\top y}\\ A_+ &= A + \frac{\left( y - As \right) y^\top A}{y^\top As}\end{split}\]

Exercise 16

The bad Broyden update is strongly influenced by the scaling of a norm’s residuals. The storage requirement is twice that of the good update. It does not exploit sparsity. See [Gri12] for an excellent exposition on this matter.

Exercise 17

Suppose \(A_k \in \mathbb{R}^{n \times n}\), \(s_i, y_i \in \mathbb{R}^n\) for \(i = k - 1, k\) where \(s_{k - 1}\) and \(s_k\) linearly independent, and \(A_k s_{k - 1} = y_{k - 1}\).

(a)

\[\begin{split}A_{k + 1} s_{k - 1} &= \left( A_k + \frac{\left( y_k - A_k s_k \right) v_k^\top}{v_k^\top s_k} \right) s_{k - 1}\\ &= A_k s_{k - 1} + \frac{\left( y_k - A_k s_k \right) v_k^\top s_{k - 1}}{v_k^\top s_k}\\ &= y_{k - 1} + \frac{\left( y_k - A_k s_k \right) v_k^\top s_{k - 1}}{v_k^\top s_k}\end{split}\]

\(v_k^\top s_{k - 1} = 0\) when \(v_k\) is defined as the result of the Gram-Schmidt orthogonalization on the linearly independent set \(\left\{ s_{k - 1}, s_k \right\}\):

\[v_k = s_k - \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} s_{k - 1}.\]

(b)

Suppose

\[v_k = s_k - \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} s_{k - 1}\]

and \(B \in Q(y_k, s_k) \cap Q(y_{k - 1}, s_{k - 1})\) such that \(B s_k = y_k\) and \(B s_{k - 1} = y_{k - 1}\).

\[\begin{split}\left\Vert A_{k + 1} - A_k \right\Vert &= \left\Vert \frac{\left( y_k - A_k s_k \right) v_k^\top}{v_k^\top s_k} \right\Vert & \quad & \text{(8.4.5)}\\ &= \left\Vert \frac{\left( B - A_k \right) s_k v_k^\top}{v_k^\top s_k} \right\Vert\\ &= \left\Vert \frac{\left( B - A_k \right) v_k v_k^\top}{v_k^\top s_k} \right\Vert & \quad & \text{(b.1)}\\ &\leq \left\Vert B - A_k \right\Vert \left\Vert \frac{v_k v_k^\top}{v_k^\top s_k} \right\Vert_2 & \quad & \text{(3.1.10), (3.1.15), (3.1.17)}\\ &= \left\Vert B - A_k \right\Vert & \quad & \text{(b.2)}\end{split}\]

Since the Frobenius norm is strictly convex and \(Q(y, s)\) is an affine subset of \(\mathbb{R}^{n \times n}\) (hence convex), (8.4.5) gives a unique solution.

(b.1)

\[\begin{split}\left( B - A_k \right) s_k &= \left( B - A_k \right) \left( v_k + \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} s_{k - 1} \right)\\ &= \left( B - A_k \right) v_k + \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} \left( B s_{k - 1} - A_k s_{k - 1} \right)\\ &= \left( B - A_k \right) v_k + \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} \left( y_{k - 1} - y_{k - 1} \right)\\ &= \left( B - A_k \right) v_k\end{split}\]

(b.2)

\[\begin{split}v_k^\top s_k &= v_k^\top \left( v_k + \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} s_{k - 1} \right)\\ &= v_k^\top v_k + \frac{s_k^\top s_{k - 1}}{s_{k - 1}^\top s_{k - 1}} v_k^\top s_{k - 1}\\ &= \left\Vert v_k \right\Vert_2^2 & \quad & \text{(a) shows that } v_k^\top s_{k - 1} = 0\end{split}\]

Define \(u_k = \frac{v_k}{\left\Vert v_k \right\Vert_2}\). Applying (3.1.17) yields

\[\left\Vert \frac{v_k v_k^\top}{v_k^\top s_k} \right\Vert_2 = \left\Vert u_k u_k^\top \right\Vert_2 = \left\Vert u_k \right\Vert_2^2 = 1.\]

Exercise 18

Suppose for some \(m < n\), \(A_k \in \mathbb{R}^{n \times n}\), \(s_j, y_j \in \mathbb{R}^n\) for \(j = k - m, \ldots, k\), \(s_{k - m}, \ldots, s_k\) are linearly independent, and \(A_k s_i = y_i\) for \(i = k - m, \ldots, k - 1\).

(a)

\[\begin{split}A_{k + 1} s_i &= \left( A_k + \frac{\left( y_k - A_k s_k \right) v_k^\top}{v_k^\top s_k} \right) s_i\\ &= A_k s_i + \frac{\left( y_k - A_k s_k \right) v_k^\top}{v_k^\top s_k} s_i\\ &= y_i + \frac{\left( y_k - A_k s_k \right)}{v_k^\top s_k} v_k^\top s_i\end{split}\]

\(v_k^\top s_i = 0\) when \(v_k\) is defined as the result of the Gram-Schmidt orthogonalization on the linearly independent set \(\left\{ s_{k - m}, \ldots, s_k \right\}\).

\[v_k = s_k - \sum_{i = k - m}^{k - 1} \text{proj}_{v_i}(s_k) = s_k - \sum_{i = k - m}^{k - 1} \frac{s_k^\top v_i}{v_i^\top v_i} v_i\]

because

\[\begin{split}v_k^\top s_i &= v_k^\top \left( v_i + \sum_{z = k - m}^{k - 1} \text{proj}_{v_z}(s_i) \right)\\ &= v_k^\top v_i + \sum_{z = k - m}^{k - 1} \frac{s_i^\top v_z}{v_z^\top v_z} v_k^\top v_z\\ &= 0.\end{split}\]

(b)

Define \(v_k\) as in (a) and \(B \in \bigcap_{j} Q(y_j, s_j)\) such that \(B s_j = y_j\).

\[\begin{split}\left\Vert A_{k + 1} - A_k \right\Vert &= \left\Vert \frac{\left( y_k - A_k s_k \right) v_k^\top}{v_k^\top s_k} \right\Vert & \quad & \text{(8.4.5)}\\ &= \left\Vert \frac{\left( B - A_k \right) s_k v_k^\top}{v_k^\top s_k} \right\Vert\\ &= \left\Vert \frac{\left( B - A_k \right) v_k v_k^\top}{v_k^\top s_k} \right\Vert & \quad & \text{(b.1)}\\ &\leq \left\Vert B - A_k \right\Vert \left\Vert \frac{v_k v_k^\top}{v_k^\top s_k} \right\Vert_2 & \quad & \text{(3.1.10), (3.1.15), (3.1.17)}\\ &= \left\Vert B - A_k \right\Vert & \quad & \text{(b.2)}\end{split}\]

Since the Frobenius norm is strictly convex and \(Q(y, s)\) is an affine subset of \(\mathbb{R}^{n \times n}\) (hence convex), (8.4.5) gives a unique solution.

(b.1)

\[\begin{split}\left( B - A_k \right) s_k &= \left( B - A_k \right) \left( v_k + \sum_{i = k - m}^{k - 1} \text{proj}_{v_i}(s_k) \right)\\ &= \left( B - A_k \right) v_k + \sum_{i = k - m}^{k - 1} \frac{s_k^\top v_i}{v_i^\top v_i} \left( B v_i - A_k v_i \right) & \quad & \text{(b.1.1)}\\ &= \left( B - A_k \right) v_k\end{split}\]

(b.1.1)

Proof via induction:

For the base case \(v_{k - m} = s_{k - m}\),

\[\left( B - A_k \right) v_{k - m} = B s_{k - m} - A_k s_{k - m} = y_{k - m} - y_{k - m} = 0.\]

For the induction step, assume \(\left( B - A_k \right) v_z = 0\) for \(z = k - m, \ldots, k - 2\).

\[\begin{split}\left( B - A_k \right) v_{k - 1} &= \left( B - A_k \right) \left( s_{k - 1} - \sum_{z = k - m}^{(k - 1) - 1} \text{proj}_{v_z}(s_{k - 1}) \right)\\ &= B s_{k - 1} - A_k s_{k - 1} - \sum_{z = k - m}^{k - 2} \frac{s_{k - 1}^\top v_z}{v_z^\top v_z} \left( B - A_k \right) v_z\\ &= y_{k - 1} - y_{k - 1}\\ &= 0\end{split}\]

(b.2)

\[\begin{split}v_k^\top s_k &= v_k^\top \left( v_k + \sum_{i = k - m}^{k - 1} \text{proj}_{v_i}(s_k) \right)\\ &= v_k^\top v_k + \sum_{i = k - m}^{k - 1} \frac{s_k^\top v_i}{v_i^\top v_i} v_k^\top v_i\\ &= \left\Vert v_k \right\Vert_2^2 & \quad & \text{(a) shows that } v_k^\top v_i = 0\end{split}\]

Define \(u_k = \frac{v_k}{\left\Vert v_k \right\Vert_2}\). Applying (3.1.17) yields

\[\left\Vert \frac{v_k v_k^\top}{v_k^\top s_k} \right\Vert_2 = \left\Vert u_k u_k^\top \right\Vert_2 = \left\Vert u_k \right\Vert_2^2 = 1.\]

References

Gri12

Andreas Griewank. Broyden updating, the good and the bad! Optimization stories. Doc. Math, pages 301–315, 2012.