Newton’s Method for Nonlinear Equations and Unconstrained Minimization¶
Exercise 1¶
[1]:
import numpy
def F(x):
return numpy.asfarray([2 * (x[0] + x[1])**2 + (x[0] - x[1])**2 - 8,
5 * x[0]**2 + (x[1] - 3)**2 - 9])
def J(x):
return numpy.asfarray([[6 * x[0] + 2 * x[1], 2 * x[0] + 6 * x[1]],
[10 * x[0], 2 * x[1] - 6]])
x_c = numpy.asfarray([2, 0])
for i in range(2):
s_k = numpy.dot(numpy.linalg.inv(J(x_c)), -F(x_c))
x_k = x_c + s_k
print('x_{0} = {1}'.format(i, x_c))
x_c = x_k
x_0 = [2. 0.]
x_1 = [1.31578947 1.05263158]
Exercise 2¶
The Jacobian becomes ill-conditioned in the second iteration i.e. it becomes the zero matrix.
[2]:
import numpy
def F(x):
return numpy.exp(x) - 1
def J(x):
return numpy.asfarray([[numpy.exp(x[0]), 0],
[0, numpy.exp(x[1])]])
x_c = numpy.asfarray([-10, -10])
for i in range(2):
_ = J(x_c)
s_k = numpy.dot(numpy.linalg.inv(_), -F(x_c))
x_k = x_c + s_k
print('Iteration {0}'.format(i))
print('x_{0} = {1}'.format(i, x_c))
print('s_{0} = {1}'.format(i, s_k))
print('J_{0} =\n{1}'.format(i, _))
print('x_{0} = {1}'.format(i + 1, x_k))
x_c = x_k
Iteration 0
x_0 = [-10. -10.]
s_0 = [22025.46579481 22025.46579481]
J_0 =
[[4.53999298e-05 0.00000000e+00]
[0.00000000e+00 4.53999298e-05]]
x_1 = [22015.46579481 22015.46579481]
Iteration 1
x_1 = [22015.46579481 22015.46579481]
s_1 = [nan nan]
J_1 =
[[inf 0.]
[ 0. inf]]
x_2 = [nan nan]
Exercise 3¶
Define \(f_1(x) = x, f_2(x) = x^2 + x,\) and \(f_3(x) = e^x - 1\).
(a)¶
(b)¶
The Lipschitz constant for each derivative at the root \(x_* = 0\) is
The last inequality holds because the Bernoulli inequality gives
Note that \(e^x\) is an upper bound when \(x > 0\) and becomes a lower bound when \(x < 0\).
(c)¶
The derivative of each function is bounded on the interval \([-a, a]\), so each function is Lipschitz continuous. The bound for each derivative is
The smallest region of convergence is defined to be \(\eta = \min \left\{ \hat{\eta}, 2 \rho / \gamma \right\}\) where \(\hat{\eta}\) is defined to be the radius of the largest open interval around \(x_*\).
(d)¶
\(f_1\) will converge to \(x_* = 0\) in the interval \((-\infty, \infty)\) since it only has one root and \(\lim_{\gamma \rightarrow 0^+} \frac{2 \rho_1}{\gamma_1} = \infty\).
\(f_2\) violates the assumption that \(\rho_2 > 0\) at \(x = -0.5\) causing Newton’s method to fail because the derivative is zero. Splitting the interval into \((-\infty, -0.5)\) and \((-0.5, \infty)\) sidesteps that issue. Applying Theorem 2.4.3 to each region shows that any point in those regions will converge.
\(f_3\) will converge to \(x_* = 0\) in the interval \((-\infty, \infty)\) since it only has one root in theory and \(\lim_{a \rightarrow \infty} \frac{2 \rho_3}{\gamma_3} = \lim_{a \rightarrow \infty} 2 e^{-2a} = 0\). However, the floating-point precision restricts the region to \([-6.5, 709]\).
[3]:
import numpy
def Newtons_Method(f, f_prime, x_c):
return x_c - f(x_c) / f_prime(x_c)
f = [lambda x: x,
lambda x: x**2 + x,
lambda x: numpy.exp(x) - 1
]
f_prime = [lambda x: 1,
lambda x: 2 * x + 1,
lambda x: numpy.exp(x)]
x_c = [-7.5, -0.6, 7.5]
for i in range(16):
for j in range(len(x_c)):
x_c[j] = Newtons_Method(f[j], f_prime[j], x_c[j])
print('x_{0} = {1}'.format(i + 1, x_c))
x_1 = [0.0, -1.8000000000000003, 6.500553084370148]
x_2 = [0.0, -1.2461538461538462, 5.502055692264317]
x_3 = [0.0, -1.0406026962727994, 4.506134071187519]
x_4 = [0.0, -1.0015247601944897, 3.5171751329216505]
x_5 = [0.0, -1.000002317825395, 2.546858300770652]
x_6 = [0.0, -1.0000000000053721, 1.6251856616292897]
x_7 = [0.0, -1.0, 0.8220607812846256]
x_8 = [0.0, -1.0, 0.2615857370546373]
x_9 = [0.0, -1.0, 0.031415606703852794]
x_10 = [0.0, -1.0, 0.0004883429491289379]
x_11 = [0.0, -1.0, 1.192200103575106e-07]
x_12 = [0.0, -1.0, 7.157097849691172e-15]
x_13 = [0.0, -1.0, 5.16704920902205e-17]
x_14 = [0.0, -1.0, 5.16704920902205e-17]
x_15 = [0.0, -1.0, 5.16704920902205e-17]
x_16 = [0.0, -1.0, 5.16704920902205e-17]
Exercise 4¶
(a)¶
(b)¶
The Lipschitz constant on \(J(x) \in \text{Lip}_\gamma(N(x_*, a))\) is
(c)¶
1-norm is the largest possible value for \(\left\Vert J(x_*)^{-1} \right\Vert\), so the region of convergence is defined as \(\epsilon = \min \left\{ a, \frac{1}{2 \beta \gamma} \right\}\) where \(0 < \left\Vert J(x_*)^{-1} \right\Vert \leq 1 \leq \beta\). Hence
(d)¶
When \((x_0)_3 = 0, \epsilon = \min \left\{ a, \frac{a}{\left\vert 4 x_2 \right\vert} \right\}\). When \((x_0)_2 = (x_0)_3 = 0, \epsilon = a\) because as \((x_0)_2 \rightarrow 0\) or \((x_0)_3 \rightarrow 0\), the denominator grows infinitely large.
Exercise 5¶
Proof by induction starts at the base case of \(x_0\), so the goal is to show \(x_1 = x_0 - J(x_0)^{-1} F(x_0)\) is well-defined i.e. \(J(x_0)\) is nonsingular.
Since \(J(x_*)^{-1}\) exists and \(\left\Vert J(x_*)^{-1} \right\Vert \leq \beta\) where \(\beta > 0\) by assumption, \(J(x_*)\) is nonsingular. Theorem 3.14 states that \(J(x_0)\) is nonsingular when \(\left\Vert J(x_*)^{-1} \left( J(x_0) - J(x_*) \right) \right\Vert < 1\) holds. Recall that \(J\) is Holder continuous within \(N(x_*, r)\) where \(r > 0\) and \(\alpha \in (0, 1]\) such that
Picking \(\epsilon = \min \left\{ r, \left( \frac{\alpha}{(1 + \alpha) \beta \gamma} \right)^{1 / \alpha} \right\}\) satisfies Theorem 3.14 in addition to revealing the elegant inequality \(\beta \gamma \epsilon^\alpha \leq \frac{\alpha}{1 + \alpha} < 1\) that simplifies later derivations. Thus \(J(x_0)\) is nonsingular and its induced matrix norm is bounded above by
Observe that the sequence \(x_1, x_2, \ldots\) converges to \(x_*\) because
To understand why \(p = x_* - x_0\), see Exercise 4.8.
Exercise 6¶
By definition of differentiability and continuity,
is continuously differentiable \(\forall t \in \mathbb{R}\), \(\gamma, \eta \in \mathbb{R}^+_0\), and \(\beta \in \mathbb{R}^+\). Assuming \(t_0 = 0\), \(N(t_0, r)\) is satisfied \(\forall r \in (0, \infty)\).
Since \(f(t): \mathbb{R} \rightarrow \mathbb{R}\), the vector norm and induced operator norm reduce respectively to an absolute value and \(J(t) = f'(t) = \gamma t - \frac{1}{\beta}\).
Notice that \(J \in \text{Lip}_\gamma(N(t_0, r))\) because \(\left\vert f'(t) \right\vert < \left\vert \gamma r \right\vert + \left\vert \frac{1}{\beta} \right\vert\).
The initial conditions for the induced operator norms
are satisfied by definition of \(f(t)\).
Define \(\gamma_\text{Rel} = \beta \gamma, \alpha = \gamma_\text{Rel} \eta\) for ease of notation. Analytically solving for the roots of the univariate quadratic function \(f(t_*) = 0\) gives
Thus \(\alpha \leq \frac{1}{2}\) is a necessary condition for \(f(t)\) to have real roots.
The following results will be useful in proving \(\{t_k\}\) converges.
Proof of \(\left\vert f'(t_{k + 1}) \right\vert \geq \left\vert \frac{f'(t_k)}{2} \right\vert\) via induction. The following shows that the base case holds:
Assume \(\left\vert f'(t_k) \right\vert \geq \left\vert \frac{f'(t_{k - 1})}{2} \right\vert\) holds. The inductive step holds because
The last step can be proven via contradiction. Assume \(0 \leq \alpha \leq \frac{1}{2}\) and
Given \(x = \gamma_\text{Rel} t_k\),
Solving for roots of the quadratic equation yields
which means \(\alpha > \frac{1}{2}\) must hold to yield a real solution, but this contradicts the assumption that \(\alpha \leq \frac{1}{2}\). Therefore
The upper bound \(\left\vert f'(t_{k + 1}) \right\vert \leq \left\vert f'(t_k) \right\vert\) can also be derived as
Solving for roots of the quartic equation yields
and
Since \(0 \leq \alpha \leq \frac{1}{2}\), the second quadratic formula is inconsequential for real roots. Therefore the sequence \(\{ t_k \}\) is well-defined with \(N(t_0, r)\) because
This inequality can be further simplified assuming \(r \geq r_0\)
Unrolling the recursion for \(t_k\) yields
The last step can be proven via induction. The base case when \(k = 0\) gives \(1 - \sqrt{1 - 2 \alpha} \leq 2 \alpha\). Let us prove this via contradiction: assume \(0 \leq \alpha \leq \frac{1}{2}\) and \(1 - \sqrt{1 - 2 \alpha} > 2 \alpha\).
shows that \(\alpha > \frac{1}{2}\) must hold to satisfy the inequality, but this contradicts the assumption that \(\alpha \leq \frac{1}{2}\). Thus the base case is true.
For the inductive step, assume \(2^{-k} \left( 1 - \sqrt{1 - 2 \alpha} \right)^{2^k} \leq \left( 2 \alpha \right)^{2^k}\) for \(0 \leq \alpha \leq \frac{1}{2}\).
By inspection, as \(k \rightarrow \infty\), \(c \rightarrow 1\), which completes the inductive step.
Exercise 7¶
\(\left\Vert J(x_n)^{-1} \right\Vert \leq \left\vert f'(t_n)^{-1} \right\vert\)¶
When \(n = 0\), Exercise 6 shows that \(\left\vert f'(t_0)^{-1} \right\vert = \beta\) and \(F\) is presumed to satisfy the assumptions of the Kantorovich theorem
For the inductive step \(n = k\), recall Theorem 3.14 states that \(J(x_k)\) is nonsingular when \(\left\Vert J(x_0)^{-1} \left( J(x_k) - J(x_0) \right) \right\Vert < 1\) holds.
Picking \(r = \frac{1 - \sqrt{1 - 2 \alpha}}{\beta \gamma}\) where \(0 \leq \alpha \leq \frac{1}{2}\) satisfies Theorem 3.14. Thus \(J(x_k)\) is nonsingular and its induced matrix norm is bounded above by
The second to last inequality is explained in Exercise 6.
\(\left\Vert x_{n + 1} - x_n \right\Vert \leq \left\vert t_{n + 1} - t_n \right\vert\)¶
When \(n = 0\), Exercise 6 shows that \(\left\Vert J(t_0)^{-1} F(t_0) \right\Vert = \eta\) and \(F\) is presumed to satisfy the assumptions of the Kantorovich theorem
The following relations will be useful for the induction.
Applying the same analysis to \(f(t_k) = \frac{\gamma}{2} (t_k - t_{k - 1})^2\) gives
The inductive step for \(n = k\) is
Exercise 8¶
As stated in the section after Theorem 5.3.1, Exercise 6 and Exercise 7 shows that \(\left\Vert x_* - x_k \right\Vert\) is bounded above by \(\left\vert t_* - t_k \right\vert\); hence it is a convergent sequence. Every convergent sequence is a Cauchy sequence, so \(\{ x_k \}\) is a Cauchy sequence that will converge to some \(x_*\). This technique of proof is known as majorization where \(\{ t_k \}\) is said to majorize \(\{ x_k \}\).
See [Ort68] for a very concise and formal proof.
Exercise 9¶
See [Bry68] for an elegant and concise proof that enables the evaluation of nonlinear integral equations using numerical quadrature.
Exercise 10¶
The contractive mapping theorem is also known as the Banach fixed-point theorem or the contraction mapping principle.
Given \(G: D \rightarrow D\) where \(D\) is a closed subset of \(\mathbb{R}^n\), for any starting point \(x_0 \in D\), the sequence \(\{ x_k \}\) generated by \(x_{k + 1} = G(x_k)\) remains in \(D\) for \(k = 0, 1, \ldots\).
(a)¶
Observe that recursively applying
yields
where \(\alpha \in [0,1)\).
For any \(j \geq 0\),
(b)¶
Recall that a sequence \(\{ x_i \}\) is called a Cauchy sequence if \(\forall \epsilon > 0 \, \exists N \in \mathbb{N}\) such that \(\forall \, m, n \geq N\), then \(\left\Vert x_m - x_n \right\Vert < \epsilon\).
Without loss of generality, let \(m \leq n\) with \(k = n - m \geq 0\).
Hence \(\{ x_i \}\) is a Cauchy sequence as long as \(N\) satisfies
The following observations will be useful in proving \(G\) has a fixpoint \(x_*\) where a fixed point (a.k.a. fixpoint, invariant point) of a function is defined as \(G(x_*) = x_*\).
A function is said to be a contraction map if it satisfies the assumptions of the Contractive Mapping Theorem. Notice that the inequality (5.3.2) is a restricted definition of Lipschitz continuity, which means a contraction map is continuous on \(D\).
As shown in (a), \(\{ x_k \}\) converges to some \(x_*\) because \(\lim_{k \rightarrow \infty} \alpha^k = 0\). Therefore,
(c)¶
Proof by contradiction is the easily method to show that \(x_* \in D\) is unique.
Assume that \(\exists \, y_* \in D\) such that \(y_* \neq x_*\).
Since \(\alpha \in [0, 1)\), \(\left\Vert x_* - y_* \right\Vert = 0\), but this contradicts the assumption \(y_* \neq x_*\). Thus a contraction map has a unique fixed point.
By definition
\(\{ x_k \}\) converges q-linearly to \(x_*\).
By applying the results of (b) with \(m = k\) as \(n \rightarrow \infty\), the error bound can be derived as
The last inequality holds because the norm’s continuity enables moving the limit inside.
(d)¶
As shown in [Pet], the other version of the Contractive Mapping Theorem no longer assumes a self-map.
Let \(G \colon D \rightarrow \mathbb{R}^n\) where \(D\) is a closed subset of \(\mathbb{R}^n\). If for some norm \(\left\Vert \cdot \right\Vert\), there exists \(\alpha \in [0, 1)\) such that
then the rest of the Contractive Mapping Theorem still holds. The goal is to show the sequence \(\{ x_k \}\) generated by \(x_{k + 1} = G(x_k)\) remains in \(D\) for \(k = 0, 1, \ldots\) with \(x_0 \in D\).
Since \(D\) is a closed subset and the results of (c) defines a closed neighborhood of radius \(\frac{\eta \alpha^k}{1 - \alpha}\) around \(x_*\), as long as each subsequent element in the sequence are within that iteration’s specified neighborhood, the original assumptions of the Contractive Mapping Theorem are satisfied and the sequence will converge to \(x_*\).
Exercise 11¶
Define \(x_{i + 1} = G(x_i) = x_i - \frac{f(x_i)}{a}\) where \(f(x) = x^2 - 1\) and \(a > 1\). By inspection, the roots of \(f\) are at \(x = \pm 1\) and \(G \colon \mathbb{R} \rightarrow \mathbb{R}\), but \(x_i\) will not converge to the roots \(\forall x_0 \in \mathbb{R}\). The goal is to find \(D \in \mathbb{R}\) such that \(x_0 \in D\) will converge to \(x_* = 1\) and
In order to establish the last inequality, \(x_0\) needs to satisfy
Solving the quadratic equation for the \(x_0\) yields
Hence the inequality will be satisfied i.e. \(\exists \, \alpha \in [0, 1)\) such that \(\left\Vert G(x_0) - G(x_*) \right\Vert \leq \alpha \left\vert x_0 - x_* \right\vert\) when \(x_0 \in (-x_*, 2a - x_*)\).
To show \(G \colon D \rightarrow D\) holds by principle of induction, assume
and
Consequently,
Since \(a > 1\), \(\left\Vert G(x_i) - G(x_*) \right\Vert \leq \alpha \left\vert x_i - x_* \right\vert\) when \(x_{i - 1} \in (-1, a + 1)\).
Thus defining \(\delta = \min\left\{ 2, a + 1, 2a - 1 \right\}\) and applying the results of Exercise 10d proves that the sequence \(\{ x_i \}\) converges q-linearly to \(x_* = 1\) with
Exercise 12¶
If \(\lim_{k \rightarrow \infty} h_k = 0\), then the convergence is q-superlinear.
The proof is similar to the one in the text up to applying Lemma 4.2.1.
Since \(\lim_{k \rightarrow \infty} x_k = x_*\) and by assumption \(\lim_{k \rightarrow \infty} h_k = 0\), \(\lim_{k \rightarrow \infty} c_k = 0\) i.e. \(\{ x_k \}\) converges q-superlinearly.
Notice that the expressions (5.4.2) and (5.4.3) are equivalent via Lemma 4.1.16:
To show q-quadratic convergence, the proof consists of applying the assumption that either (5.4.2) or (5.4.3) holds.
Exercise 13¶
Given \(x_k = \begin{pmatrix} 10^7 & 10^{-7} \end{pmatrix}^\top\), the analytical solution for \(J(x_k)\) is
Approximating \(J(x_k)\) with (5.4.1) gives
When \(h_k = 1\), the approximation
has a relative error of
When \(h_k = 10^{-14}\), the approximation
has a relative error of
Recall that a computer with \(14\) base-\(10\) digits has a machine epsilon \(\eta \approx 10^{-14}\). The first approximation incorrectly computed \(J(x_k)_{22}\) while the second approximation will experience underflow during the evaluation of \(A(x_k)_{22}\). Setting
is a good compromise.
Exercise 14¶
Let \(p \in \mathbb{R}, p \neq 0\). The step size proposed in (5.4.10) is
(a)¶
Let \(\hat{F} = f_1(x) = x^p\) and \(\gamma = f_1''(x) = p (p - 1) x^p\). The optimal step size given in (5.4.13) is
(b)¶
Let \(\hat{F} = f_2(x) = e^{px}\) and \(\gamma = f_2''(x) = p^2 e^{px}\). The optimal step size given in (5.4.13) is
(c)¶
Both step sizes are scale independent, which is not ideal.
(d)¶
The forward finite-difference approximation to \(f_2'(x) = p e^{px}\) is
with a relative error of
Good step sizes will tend to make \(e^{ph} \rightarrow 1\), so the quantity will either be \(0\) or a very small constant compared to \(e^{px}\). This means when \(x\) is a very large positive number, either overflow occurs or the approximation will be off as described by the relative error.
Exercise 15¶
Lemma 4.3.1 asserts that a minimizer of
needs to satisfy
\(x = [1, 1]\) is one such minimizer. By definition 4.1.4, the Hessian is
Based on the results of \(f(x_0)\) and \(f(x_1)\), it does seem like a good step.
[4]:
import numpy
def f(v):
x = v.flat
return 0.5 * (x[0]**2 - x[1])**2 + 0.5 * (1 - x[0])**2
def fp(v):
x = v.flat
return numpy.asmatrix([2 * x[0]**3 - 2 * x[0] * x[1] + x[0] - 1,
x[1] - x[0]**2]).T
def fpp(v):
x = v.flat
return numpy.asmatrix([[6 * x[0]**2 - 2 * x[1] + 1, -2 * x[0]],
[-2 * x[0], 1]])
x_k = numpy.asmatrix([2, 2]).T
for i in range(2):
print('x_{0}: {1}'.format(i, f(x_k)))
_ = x_k - numpy.linalg.inv(fpp(x_k)) * fp(x_k)
x_k = _
x_0: 2.5
x_1: 0.3208000000000001
Exercise 16¶
Given \(f(x) = \sin(x)\), \(f'(x) = \cos(x)\) and \(f''(x) = -\sin(x)\).
Applying Newton’s method yields
Define \(0 < \epsilon \ll 1\). When \(x_0 = -\epsilon\),
When \(x_0 = \epsilon\),
Exercise 17¶
(a)¶
The gradient and Hessian of \(f_1(x) = -x_1^2 - x_2^2\) are respectively
Since \(\nabla^2 f_1(x)\) has an eigenvalue \(\lambda = -2\) of multiplicity \(2\), it is not positive definite. This means
and
From the assumption of algorithm A5.5.1, the denominator is positive. Thus the algorithm will grow towards to \(\pm \infty\) depending on \(\text{sgn}(x_0)\) unless \(x_0 = x_*\).
(b)¶
The gradient and Hessian of \(f_2(x) = x_1^2 - x_2^2\) are respectively
Since the eigenvalues of \(\nabla^2 f_2(x)\) are \(\lambda_0 = 2\) and \(\lambda_1 = -2\), it is not positive definite. This means
and
From the assumption of algorithm A5.5.1, the denominator is positive. Thus the algorithm will grow towards to \(\pm \infty\) depending on \(\text{sgn}(x_0)\) unless \(x_{0_2} = 0\).
Exercise 18¶
[5]:
import numpy
def f(x):
return 0.5 * (x[0]**2 - x[1])**2 + 0.5 * (1 - x[0])**2
def fp(v):
x = v.flat
return numpy.asmatrix([2 * x[0]**3 - 2 * x[0] * x[1] + x[0] - 1,
x[1] - x[0]**2]).T
def fpp(v):
x = v.flat
return numpy.asmatrix([[6 * x[0]**2 - 2 * x[1] + 1, -2 * x[0]],
[-2 * x[0], 1]])
def afpp(v, root_eta):
x_c = v.flat
h = root_eta * x_c
e = [numpy.asfarray([1, 0]), numpy.asfarray([0, 1])]
A = numpy.asmatrix([[0.0, 0.0], [0.0, 0.0]])
for i in range(2):
for j in range(2):
A[i,j] = f(x_c + h[i] * e[i] + h[j] * e[j])
A[i,j] -= f(x_c + h[i] * e[i])
A[i,j] -= f(x_c + h[j] * e[j])
A[i,j] += f(x_c)
A[i,j] /= h[i] * h[j]
return A
x_k = numpy.asmatrix([2.0, 2.0]).T
solutions = [[], [], []]
for s in range(3):
x_k = numpy.asmatrix([2.0, 2.0]).T
for i in range(8):
if 0 == s:
A = fpp(x_k)
elif 1 == s:
A = afpp(x_k, numpy.finfo(float).eps**(1/2))
else:
A = afpp(x_k, numpy.finfo(float).eps**(1/3))
_ = x_k - numpy.linalg.inv(A) * fp(x_k)
x_k = _
solutions[s].append(f(x_k).flat[0])
row_layout = '{0:<25} {1:<25} {2:<25}'
print(row_layout.format('Analytic', 'macheps^(1/2)', 'macheps^(1/3)'))
for i in range(len(solutions[0])):
print(row_layout.format(solutions[0][i], solutions[1][i], solutions[2][i]))
Analytic macheps^(1/2) macheps^(1/3)
0.3208000000000001 0.3208000000000001 0.32080752264304735
0.1522899437566909 0.1203483182966636 0.1522190630342569
0.00048098909417033226 0.012308840948738988 0.00048274254706565654
4.6037027257078434e-07 0.0005726840749779877 4.6389527426665865e-07
4.471929271276434e-15 6.465667087177092e-07 5.707975194043212e-15
3.944304526105059e-29 9.628815678752908e-13 1.369421165569959e-23
0.0 2.77706417041588e-24 2.465190328815662e-32
0.0 0.0 0.0
Exercise 19¶
Recall that the Hessian of \(f \colon \mathbb{R}^n \rightarrow \mathbb{R}\) is equal to the Jacobian of \(\nabla f\).
Exercise 20¶
A5.6.3, A5.6.4, and A5.6.2 uses \(n + 1\), \(2n\), and \(1 + 2n + n^2 / 2\) function evaluations respectively. The overlap between each of these algorithms is at most \(n\) function evaluations. Therefore,
and
The combined algorithm may yield answers with less accuracy and possibly lose cache locality due to more complex code.
References
- Bry68
Charles A Bryan. Approximate solutions to nonlinear integral equations. SIAM Journal on Numerical Analysis, 5(1):151–155, 1968.
- Ort68
James M Ortega. The newton-kantorovich theorem. The American Mathematical Monthly, 75(6):658–660, 1968.
- Pet
Tobias von Petersdorf. Fixed point iteration and contraction mapping theorem. http://terpconnect.umd.edu/ petersd/466/fixedpoint.pdf. Accessed on 2017-04-10.