Secant Methods for Systems of Nonlinear Equations¶
Exercise 1¶
Given \(s_i, y_i \in \mathbb{R}^n\) for \(i = 1, \ldots, n\) where \(s_i\) are linearly independent, \(A \in \mathbb{R}^{n \times n}\) can be uniquely determined using the secant equations
This is a bad way to form the model because it only works when \(S\) is invertible i.e. the columns of \(S\) are linearly independent.
Exercise 2¶
Let \(A = \begin{bmatrix} 4 & 1\\ -1 & 1 \end{bmatrix}\), \(s = \begin{bmatrix} 1\\ 0 \end{bmatrix}\), and \(y = \begin{bmatrix} 5\\ -2 \end{bmatrix}\).
One way to solve this is to invoke Lemma 8.1.1 since its assumptions are satisfied. The following derivations does not use that lemma.
Given \(M \in Q(y, s)\), its form must be \(\begin{bmatrix} 5 & a\\ -2 & b\end{bmatrix}\) where \(a, b \in \mathbb{R}\). The minimization problem becomes
(a)¶
Minimizing over the \(l_2\) operator norm (3.1.8b) gives
Plotting the last equation shows that the function is at a global minimum \(\sqrt{2}\) when \(\beta \in [0, 2]\).
Replacing \(M\) with \(A_+ = A + \begin{bmatrix} 1 & \alpha\\ -1 & \alpha \end{bmatrix}\) yields
Plotting the last equation shows that the function is at a global minimum \(\sqrt{2}\) when \(\alpha \in [-1, 1]\).
(b)¶
Minimizing over the Frobenius matrix norm (3.1.6) yields
and
with both achieving a global minimum \(\sqrt{2}\) when \(a = b = 1\) and \(\alpha = 0\) respectively.
(c)¶
If \(F(x)\) is linear in \(x_2\), then the Frobenius matrix norm should be used because there is a single minimum which serves to isolate the changes in \(x_2\).
Exercise 3¶
Even though Broyden’s method converges to a root, the relative error with respect to \(\frac{\partial f_2}{\partial x_1}\) is huge; the others are within \([0.0, 0.5]\) with \(0.0\) applicable only to the linear function \(f_1\).
[1]:
import numpy
def broydens_update(F, A_k, x_k):
s_k = numpy.linalg.solve(A_k, -F(x_k))
x_next = x_k + s_k
y_k = F(x_next) - F(x_k)
A_next = A_k + (y_k - A_k * s_k) * s_k.T / (s_k.T * s_k)[0]
return A_next, x_next
def F(x):
x = x.flat
return numpy.asmatrix([[x[0] + x[1] - 3],
[x[0]**2 + x[1]**2 - 9]])
def J(x):
x = x.flat
return numpy.asmatrix([[1, 1],
[2 * x[0], 2 * x[1]]])
macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[2], [7]])
A_k = J(x_k)
for i in range(16):
_ = broydens_update(F, A_k, x_k)
norm = numpy.linalg.norm(_[1] - x_k)
print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
if norm <= macheps:
A_k, x_k = _
break
A_k, x_k = _
print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))
L2-Norm(x_1 - x_0): 4.47213595499958
L2-Norm(x_2 - x_1): 2.357022603955159
L2-Norm(x_3 - x_2): 0.29462782549439437
L2-Norm(x_4 - x_3): 0.15973797767768386
L2-Norm(x_5 - x_4): 0.016359786480031936
L2-Norm(x_6 - x_5): 0.0006762158471433808
L2-Norm(x_7 - x_6): 2.714857331443653e-06
L2-Norm(x_8 - x_7): 4.344464330780998e-10
x_k: [5.32388602e-17 3.00000000e+00]
A_k:
[[1. 1. ]
[2.99999808 9.00000192]]
J(x_k):
[[1.0000000e+00 1.0000000e+00]
[1.0647772e-16 6.0000000e+00]]
Exercise 4¶
Let \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\) for which \(f_1, \ldots, f_m\) are linear with \(m < n\).
Let \(A_0 = J(x_0)\) where \(x_0 \in \mathbb{R}^n\). As illustrated in (2.2.1), the extension of the local affine approximation to higher dimensions (5.1.1) will be exact for \(f_1, \ldots, f_m\) such that
The next step in Broyden’s method computes
By inspection, the first \(m\) rows of the outer product \(F(x_1) s_0^\top\) contains only zeros such that \((A_1)_i = (A_0)_i = J(x_0)_i\) for \(i = 1, \ldots, m\).
An equivalent view of Newton’s method is
where \(M_c\) denotes the current model. From the intuition of the method in lower dimensions, it is clear that
and consequently
For the finite difference case, Theorem 5.4.1 states that the \(i^\text{th}\) component has an error of \(\mathcal{O}(h_i)\) when using forward differencing (5.6.5) and \(\mathcal{O}(h_i^2)\) when using central differencing (5.6.6).
Exercise 5¶
Let \(F(x) = Jx + b\) where \(J \in \mathbb{R}^{n \times n}\), \(b \in \mathbb{R}^n\), and \(x \in \mathbb{R}^n\).
The proof is by contradiction. Assume \(x_+, x_c \in \mathbb{R}^n\) and \(x_+ \neq x_c\) with
Consequently, the following must hold
but this contradicts the initial assumptions. Thus \(J \in Q(y_c, s_c)\).
Exercise 6¶
Let the assumptions of Lemma 8.2.1 be satisfied and define
For a proof of (3.1.1c), see Exercise 3.4. Exercise 7 may be helpful towards understanding the application of (3.1.17).
Exercise 7¶
Suppose \(s \in \mathbb{R}^n \setminus \{ 0 \}\) and \(I \in \mathbb{R}^{n \times n}\) is the identity matrix.
In order to simplify the derivations, rewrite the original problem as
where \(u = \frac{s}{\left\Vert s \right\Vert_2}\) and \(u^\top u = 1\).
By definition, the resulting (projection) matrix and its summands are idempotent
and symmetric
The rank of the matrix is
Let \(A = I - u u^\top\). Recall that the relation between an eigenvalue \(\lambda\) and a non-zero eigenvector \(v\) for \(A\) is \(Av = \lambda v\). Since \(A\) is idempotent, the following must hold:
i.e. the eigenvalues of \(A\) must be either \(0\) or \(1\).
By Theorem 3.5.6 and
where \(\lambda_i\) are the eigenvalues of \(A\),
Exercise 8¶
Let \(D \subseteq \mathbb{R}^n\) be an open convex set, \(F \colon \mathbb{R}^n \rightarrow \mathbb{R}^n\), \(J(x) \in \text{Lip}_\gamma(D)\), \(x_* \in D\), and \(J(x_*)\) be nonsingular.
Let \(\{ A_k \in \mathbb{R}^{n \times n} \}\) be a sequence of nonsingular matrices, and suppose for some \(x_0 \in D\) that the sequence of points generated by \(x_{k + 1} = x_k - A_k^{-1} F(x_k)\) remains in \(D\) and satisfies \(\lim_{k \rightarrow \infty} x_k = x_*\) for any \(k\).
Define \(J(x_k) = J_k\), \(e_k = x_k - x_*\), \(F(x_k) = F_k\), and \(s_k \triangleq x_{k + 1} - x_k\).
A useful fact is
(a)¶
If \(\lim_{k \rightarrow \infty} \frac{\left\Vert (A_k - J_k) s_k \right\Vert}{\left\Vert s_k \right\Vert} = 0\) where \(s_k \triangleq x_{k + 1} - x_k\), then \(\{ x_k \}\) converges \(q\)-superlinearly to \(x_*\) in some norm \(\left\Vert \cdot \right\Vert\) and \(F_* = 0\).
Thus \(F_* = \lim_{k \rightarrow \infty} F_k = 0\). The sequence \(\{ x_k \}\) converges \(q\)-superlinearly because
where \(r_k \triangleq \left\Vert e_{k + 1} \right\Vert / \left\Vert e_k \right\Vert\).
(b)¶
If \(\{ x_k \}\) converges \(q\)-superlinearly to \(x_*\) in some norm \(\left\Vert \cdot \right\Vert\) and \(F(x_*) = 0\), then \(\lim_{k \rightarrow \infty} \frac{\left\Vert (A_k - J_k) s_k \right\Vert}{\left\Vert s_k \right\Vert} = 0\) where \(s_k \triangleq x_{k + 1} - x_k\).
Applying Lemma 8.2.3 shows that \(\lim_{k \rightarrow \infty} \frac{\left\Vert F_{k + 1} \right\Vert}{\left\Vert s_k \right\Vert} = 0\). Thus
Exercise 9¶
The purpose of Example 8.2.6 is to illustrate that Broyden’s method can get relatively close to the true Jacobian.
This exercise is more along the lines of Lemma 8.2.7 where the relative error is much greater: in this case it is between \(39\%\) and \(620\%\).
[2]:
def F(x):
x = x.flat
return numpy.asmatrix([[x[0]**2 + x[1]**2 - 2],
[numpy.exp(x[0] - 1) + x[1]**3 - 2]])
def J(x):
x = x.flat
return numpy.asmatrix([[2 * x[0], 2 * x[1]],
[numpy.exp(x[0] - 1), 3 * x[1]**2]])
macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[2], [3]])
A_k = J(x_k)
for i in range(16):
_ = broydens_update(F, A_k, x_k)
norm = numpy.linalg.norm(_[1] - x_k)
print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
if norm <= macheps:
A_k, x_k = _
break
A_k, x_k = _
print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))
for index, (a, j) in enumerate(zip(A_k.flat, J(x_k).flat)):
_ = 100 * abs(a - j) / abs(j)
print('\nRelative Error at {}: {:0.2f}%'.format(index, _))
L2-Norm(x_1 - x_0): 1.6767467315827993
L2-Norm(x_2 - x_1): 0.5245600994611522
L2-Norm(x_3 - x_2): 0.31879902339630917
L2-Norm(x_4 - x_3): 0.09733974405443968
L2-Norm(x_5 - x_4): 0.23265589552166405
L2-Norm(x_6 - x_5): 3.268722459941512
L2-Norm(x_7 - x_6): 3.1691877752357605
L2-Norm(x_8 - x_7): 4.705640972470869
L2-Norm(x_9 - x_8): 0.4705727292535645
L2-Norm(x_10 - x_9): 0.14864052229030497
L2-Norm(x_11 - x_10): 1.9639366915192304
L2-Norm(x_12 - x_11): 1.703473765729639
L2-Norm(x_13 - x_12): 0.08713054352155514
L2-Norm(x_14 - x_13): 0.030090519390857193
L2-Norm(x_15 - x_14): 0.007459734854001399
L2-Norm(x_16 - x_15): 0.0009364321451569183
x_k: [-0.71387461 1.22097779]
A_k:
[[-0.2321455 4.31364145]
[ 1.29755602 6.22127445]]
J(x_k):
[[-1.42774922 2.44195558]
[ 0.18016636 4.47236031]]
Relative Error at 0: 83.74%
Relative Error at 1: 76.65%
Relative Error at 2: 620.20%
Relative Error at 3: 39.10%
Exercise 10¶
The following extension of (2.2.1) to higher dimensions (5.1.1) will be useful in the proof that follows.
Suppose \(F(x) = Ax + b\) is an affine transformation where \(A \in \mathbb{R}^{n \times n}\) and \(F'(x) = A = J(x)\) is nonsingular. For all \(k \geq 1\), \(F(x_k) = 0\) because
Let \(F(x) = \begin{bmatrix} J_1 x + b_1\\ F_2(x) \end{bmatrix}\) where \(J_1 \in \mathbb{R}^{m \times n}\), \(b_1 \in \mathbb{R}^m\), and \(F_2 \colon \mathbb{R}^n \rightarrow \mathbb{R}^{n - m}\) is nonlinear.
(a)¶
In Broyden’s method, the initial \(A_0\) can be calculated analytically as
For the finite difference case, Theorem 5.4.1 states that \(i\text{th}\) component has an error of \(\mathcal{O}(h_i)\) when using forward difference (5.6.5) and \(\mathcal{O}(h_i^2)\) when using central difference (5.6.6).
By inspection, the first \(m\) rows of the outer product \(F(x_1) s_0^\top\) contains only zeros. Thus by induction \(A_{k1} = J_1 \forall k \geq 0\).
(b)¶
Generalizing the update sequence
gives
Consequently,
implies the sequence \(\{ A_{k2} \}\) will converge to the correct value \(F'_2(x_*)\) when the nonlinear solution intersects with the affine solution \(J_1\).
Exercise 11¶
The relative error for \(x_0 = (0.5, 0.5)^\top\) is between \(2.8\%\) and \(43.6\%\), which is not as terrible as the starting point for Exercise 9.
[3]:
def F(x):
x = x.flat
return numpy.asmatrix([[x[0]**2 + x[1]**2 - 2],
[numpy.exp(x[0] - 1) + x[1]**3 - 2]])
def J(x):
x = x.flat
return numpy.asmatrix([[2 * x[0], 2 * x[1]],
[numpy.exp(x[0] - 1), 3 * x[1]**2]])
macheps = numpy.sqrt(numpy.finfo(numpy.float64).eps)
x_k = numpy.asmatrix([[0.5], [0.5]])
A_k = J(x_k)
for i in range(16):
_ = broydens_update(F, A_k, x_k)
norm = numpy.linalg.norm(_[1] - x_k)
print('L2-Norm(x_{} - x_{}): {}'.format(i + 1, i, norm))
if norm <= macheps:
A_k, x_k = _
break
A_k, x_k = _
print('\nx_k: {}\n'.format(numpy.asarray(x_k).squeeze()))
print('A_k:\n{}\n'.format(A_k))
print('J(x_k):\n{}'.format(J(x_k)))
for index, (a, j) in enumerate(zip(A_k.flat, J(x_k).flat)):
_ = 100 * abs(a - j) / abs(j)
print('\nRelative Error at {}: {:0.2f}%'.format(index, _))
L2-Norm(x_1 - x_0): 2.692582403567252
L2-Norm(x_2 - x_1): 2.8572229427438764
L2-Norm(x_3 - x_2): 0.7723700761333678
L2-Norm(x_4 - x_3): 0.34589773894258735
L2-Norm(x_5 - x_4): 0.06824600043497131
L2-Norm(x_6 - x_5): 0.030690913962507314
L2-Norm(x_7 - x_6): 0.16638292726530998
L2-Norm(x_8 - x_7): 0.045234877304341184
L2-Norm(x_9 - x_8): 0.006719355304038589
L2-Norm(x_10 - x_9): 0.0004239644116932341
L2-Norm(x_11 - x_10): 5.577759379756155e-06
L2-Norm(x_12 - x_11): 6.09851312396262e-07
L2-Norm(x_13 - x_12): 2.9139809614567313e-08
L2-Norm(x_14 - x_13): 1.1194150756256751e-11
x_k: [1. 1.]
A_k:
[[1.93259629 1.94390582]
[0.56428491 2.63765057]]
J(x_k):
[[2. 2.]
[1. 3.]]
Relative Error at 0: 3.37%
Relative Error at 1: 2.80%
Relative Error at 2: 43.57%
Relative Error at 3: 12.08%
Exercise 12¶
Let \(\hat{x} = D_x x\) and \(\hat{F}(\hat{x}) = D_F F(x)\). The following are useful for the proof.
Suppose \(A_k \simeq J_k\).
where the second equality is based on the foregoing results and Exercise 7.4.
Exercise 13¶
Suppose \(u, v \in \mathbb{R}^n\) and \(A \in \mathbb{R}^{n \times n}\) is nonsingular.
Recall that the matrix determinant lemma states that
(a)¶
(b)¶
Observe that \(\left( I + uv^\top \right)^{-1} = I - \frac{u v^\top}{\sigma}\) where \(\sigma \triangleq 1 + v^\top u\) because
Applying the foregoing result gives
Exercise 14¶
See the previous exercises for a basic implementation of Broyden’s method.
Problems that are not smooth and have varying magnitudes in each dimension could cause the secant approximation to fail to converge.
Exercise 15¶
(a)¶
Suppose \(B \in Q(y, s)\) such that \(B s = y\) where \(s \neq 0\) and \(B\) is nonsingular.
(b)¶
Suppose \(A\) is nonsingular and \(y^\top A s \neq 0\).
Exercise 16¶
The bad Broyden update is strongly influenced by the scaling of a norm’s residuals. The storage requirement is twice that of the good update. It does not exploit sparsity. See [Gri12] for an excellent exposition on this matter.
Exercise 17¶
Suppose \(A_k \in \mathbb{R}^{n \times n}\), \(s_i, y_i \in \mathbb{R}^n\) for \(i = k - 1, k\) where \(s_{k - 1}\) and \(s_k\) linearly independent, and \(A_k s_{k - 1} = y_{k - 1}\).
(a)¶
\(v_k^\top s_{k - 1} = 0\) when \(v_k\) is defined as the result of the Gram-Schmidt orthogonalization on the linearly independent set \(\left\{ s_{k - 1}, s_k \right\}\):
(b)¶
Suppose
and \(B \in Q(y_k, s_k) \cap Q(y_{k - 1}, s_{k - 1})\) such that \(B s_k = y_k\) and \(B s_{k - 1} = y_{k - 1}\).
Since the Frobenius norm is strictly convex and \(Q(y, s)\) is an affine subset of \(\mathbb{R}^{n \times n}\) (hence convex), (8.4.5) gives a unique solution.
(b.1)¶
(b.2)¶
Define \(u_k = \frac{v_k}{\left\Vert v_k \right\Vert_2}\). Applying (3.1.17) yields
Exercise 18¶
Suppose for some \(m < n\), \(A_k \in \mathbb{R}^{n \times n}\), \(s_j, y_j \in \mathbb{R}^n\) for \(j = k - m, \ldots, k\), \(s_{k - m}, \ldots, s_k\) are linearly independent, and \(A_k s_i = y_i\) for \(i = k - m, \ldots, k - 1\).
(a)¶
\(v_k^\top s_i = 0\) when \(v_k\) is defined as the result of the Gram-Schmidt orthogonalization on the linearly independent set \(\left\{ s_{k - m}, \ldots, s_k \right\}\).
because
(b)¶
Define \(v_k\) as in (a) and \(B \in \bigcap_{j} Q(y_j, s_j)\) such that \(B s_j = y_j\).
Since the Frobenius norm is strictly convex and \(Q(y, s)\) is an affine subset of \(\mathbb{R}^{n \times n}\) (hence convex), (8.4.5) gives a unique solution.
(b.1)¶
(b.1.1)¶
Proof via induction:
For the base case \(v_{k - m} = s_{k - m}\),
For the induction step, assume \(\left( B - A_k \right) v_z = 0\) for \(z = k - m, \ldots, k - 2\).
(b.2)¶
Define \(u_k = \frac{v_k}{\left\Vert v_k \right\Vert_2}\). Applying (3.1.17) yields
References
- Gri12
Andreas Griewank. Broyden updating, the good and the bad! Optimization stories. Doc. Math, pages 301–315, 2012.