{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "****************************************************\n", "Globally Convergent Modifications of Newton's Method\n", "****************************************************\n", "\n", "The Model Trust Region Approach\n", "===============================\n", "\n", "- :cite:`gay1981computing` showed that perturbing the Hessian to ensure positive\n", " semidefiniteness still yields a solution that minimizes (6.4.1).\n", "\n", "The level set of :math:`f` at value :math:`c` is the set of all points\n", ":math:`\\mathbf{x} \\in \\mathbb{R}^n` such that :math:`f(\\mathbf{x}) = c`. When\n", ":math:`n = 2`, a level set is called a level curve, contour line, or isoline.\n", "When :math:`n = 3`, it is known as the level surface or isosurface. Higher\n", "dimensions classifies them as hypersurface.\n", "\n", ":cite:`djoycedd` provides a clear derivation of why the gradient of a function\n", "is in the direction of steepest ascent (or greatest change). To see why it is\n", "normal to the surface, suppose that :math:`\\mathbf{t} \\in \\mathbb{R}^n` is\n", "tangent to the level surface of :math:`f` through :math:`\\mathbf{x}_0`. As you\n", "move in the :math:`\\mathbf{t}` direction, you are at that instant moving along\n", "the level surface, and hence the value of :math:`f` does not change. This is\n", "another of saying the directional derivative in this direction is zero.\n", "Therefore, the gradient is orthogonal to the tangent plane." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 1\n", "==========\n", "\n", "A descent direction :math:`p` of :math:`f` at :math:`x_c` satisfies\n", ":math:`\\nabla f(x_c)^\\top p < 0`.\n", "\n", "The steepest-descent direction is the solution to\n", "\n", ".. math::\n", "\n", " \\min_{p \\in \\mathbb{R}^n} \\nabla f(x)^\\top p &\\\\\n", " \\text{subject to} \\quad\n", " \\left\\Vert p \\right\\Vert &= 1.\n", "\n", "The :math:`l_2`-norm restricts the solution to be on the unit ball around\n", ":math:`x`, so\n", ":math:`p = -\\frac{\\nabla f(x)}{\\left\\Vert \\nabla f(x) \\right\\Vert_2}`\n", "is the steepest-descent direction because\n", "\n", ".. math::\n", "\n", " \\nabla f(x)^\\top p\n", " &\\leq \\left\\Vert \\nabla f(x) \\right\\Vert_2\n", " \\left\\Vert p \\right\\Vert_2\n", " & \\quad & \\text{Cauchy-Schwarz Inequality}\\\\\n", " &= \\left\\Vert \\nabla f(x) \\right\\Vert_2.\n", "\n", "Given :math:`f(x) = 3 x_1^2 + 2 x_1 x_2 + x_2^2` and\n", ":math:`\\nabla f(x) =\n", "\\begin{bmatrix} 6 x_1 + 2 x_2\\\\ 2 x_1 + 2 x_2\\end{bmatrix}`, the\n", "steepest-descent direction at :math:`x_0 = (1,1)^\\top` is\n", "\n", ".. math::\n", "\n", " p =\n", " -\\begin{pmatrix} 8\\\\ 4 \\end{pmatrix} \\frac{1}{\\sqrt{8^2 + 4^2}} =\n", " -\\frac{1}{\\sqrt{5}} \\begin{pmatrix} 2\\\\ 1 \\end{pmatrix}.\n", "\n", "The vector :math:`(1, -1)^\\top` is not a descent direction at :math:`x_0`\n", "because\n", "\n", ".. math::\n", "\n", " \\nabla f(x_c)^\\top \\begin{pmatrix} 1\\\\ -1\\end{pmatrix} =\n", " 8 + (-4) = 4 \\not< 0." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 2\n", "==========\n", "\n", "Given :math:`s = -\\lambda_k \\nabla f(x_k) = -\\lambda_k g`,\n", "\n", ".. math::\n", "\n", " f(x_k + s) =\n", " f(x_k - \\lambda_k g) =\n", " -\\lambda_k g^\\top g + \\frac{\\lambda_k^2}{2} g^\\top \\nabla^2 f(x_k) g\n", "\n", "and\n", "\n", ".. math::\n", "\n", " \\frac{\\partial f}{\\partial \\lambda_k} =\n", " -g^\\top g + \\lambda_k g^\\top \\nabla^2 f(x_k) g.\n", "\n", "By Theorem 4.3.2,\n", "\n", ".. math::\n", "\n", " \\lambda_k = \\frac{g^\\top g}{g^\\top \\nabla^2 f(x_k) g}." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 3\n", "==========\n", "\n", "Define :math:`\\hat{c} \\in (0, 1)` and\n", ":math:`\\nabla^2 f(x) =\n", "\\begin{bmatrix} \\lambda_0 & 0\\\\ 0 & \\lambda_1 \\end{bmatrix}` where\n", ":math:`x \\in \\mathbb{R}^2` and :math:`\\lambda_0 \\geq \\lambda_1 > 0`.\n", "\n", "By (6.2.4),\n", ":math:`\\hat{c} \\triangleq \\frac{\\lambda_0 - \\lambda_1}{\\lambda_0 + \\lambda_1}`.\n", "This sets\n", ":math:`\\nabla f(x) =\n", "\\begin{bmatrix} \\lambda_0 x_1\\\\ \\lambda_1 x_2 \\end{bmatrix}` and\n", ":math:`f(x) = \\frac{\\lambda_0}{2} x_1^2 + \\frac{\\lambda_1}{2} x_2^2`.\n", "Clearly, :math:`x_* = (0,0)^\\top` still holds and\n", "\n", ".. math::\n", "\n", " x_{k + 1} &= x_k - \\frac{g^\\top g}{g^\\top \\nabla^2 f(x_k) g} g\\\\\n", " &= x_k -\n", " \\frac{\n", " \\lambda_0^2 x_1^2 + \\lambda_1^2 x_2^2\n", " }{\n", " \\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2\n", " } \\begin{bmatrix} \\lambda_0 x_1\\\\ \\lambda_1 x_2 \\end{bmatrix}\\\\\n", " &= \\frac{1}{\\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2}\n", " \\begin{bmatrix}\n", " x_1 \\left( \\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2 \\right) -\n", " \\left( \\lambda_0^2 x_1^2 + \\lambda_1^2 x_2^2 \\right) \\lambda_0 x_1\\\\\n", " x_2 \\left( \\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2 \\right) -\n", " \\left( \\lambda_0^2 x_1^2 + \\lambda_1^2 x_2^2 \\right) \\lambda_1 x_2\n", " \\end{bmatrix}\\\\\n", " &= \\frac{1}{\\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2}\n", " \\begin{bmatrix}\n", " x_1 \\left( \\lambda_1^3 x_2^2 \\right) -\n", " \\left( \\lambda_1^2 x_2^2 \\right) \\lambda_0 x_1\\\\\n", " x_2 \\left( \\lambda_0^3 x_1^2 \\right) -\n", " \\left( \\lambda_0^2 x_1^2 \\right) \\lambda_1 x_2\n", " \\end{bmatrix}\\\\\n", " &= \\frac{1}{\\lambda_0^3 x_1^2 + \\lambda_1^3 x_2^2}\n", " \\begin{bmatrix}\n", " \\left( \\lambda_1^3 - \\lambda_0 \\lambda_1^2 \\right) x_1 x_2^2\\\\\n", " \\left( \\lambda_0^3 - \\lambda_0^2 \\lambda_1 \\right) x_1^2 x_2\n", " \\end{bmatrix}\n", "\n", "will converge as long as\n", "\n", ".. math::\n", "\n", " \\frac{\n", " \\left\\Vert x_{k + 1} - x_* \\right\\Vert\n", " }{\n", " \\left\\Vert x_k - x_* \\right\\Vert\n", " } =\n", " \\frac{\\left\\Vert x_{k + 1} \\right\\Vert}{\\left\\Vert x_k \\right\\Vert} \\leq\n", " \\hat{c}." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 4\n", "==========\n", "\n", "Given\n", "\n", ".. math::\n", "\n", " f(x) = x^2,\\quad\n", " x_0 = 2,\\quad\n", " \\left\\{ p_k \\right\\} = \\left\\{ \\left( -1 \\right)^{k + 1} \\right\\},\\quad\n", " \\left\\{ \\lambda_k \\right\\} = \\left\\{ 2 + \\frac{3}{2^{k + 1}} \\right\\},\n", "\n", ":math:`\\left\\{ x_k \\right\\} =\n", "\\left\\{ \\left( -1 \\right)^k \\left( 1 + 2^{-k} \\right) \\right\\}` for\n", ":math:`k \\in \\mathbb{N}_0`.\n", "\n", "This infinite sequence is not permitted by (6.3.3) because\n", "\n", ".. math::\n", "\n", " f(x_k + \\lambda_k p_k)\n", " &\\leq f(x_k) + \\alpha \\lambda_k \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\text{(6.3.3)}\\\\\n", " x_k^2 + 2 \\lambda_k x_k p_k + \\lambda_k^2 p_k^2\n", " &\\leq x_k^2 + \\alpha 2 \\lambda_k x_k p_k\\\\\n", " \\frac{\\lambda_k^2}{2} + \\lambda_k x_k p_k - \\alpha \\lambda_k x_k p_k\n", " &\\leq 0\\\\\n", " \\frac{\\lambda_k^2}{2} - \\lambda_k \\left( 1 + 2^{-k} \\right) +\n", " \\alpha \\lambda_k \\left( 1 + 2^{-k} \\right) &\\leq 0\\\\\n", " 2^{-(2k + 3)} \\left( 2^{k + 2} + 3 \\right)\n", " \\left( \\alpha 2^{k + 2} + 4 \\alpha - 1 \\right) &\\leq 0\\\\\n", " \\alpha &\\leq \\left( 2^{k + 2} + 4 \\right)^{-1}\n", "\n", "where the last inequality holds :math:`\\forall k \\in \\mathbb{N}_0` only when\n", ":math:`\\alpha \\leq 0`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 5\n", "==========\n", "\n", "Given\n", "\n", ".. math::\n", "\n", " f(x) = x^2,\\quad\n", " x_0 = 2,\\quad\n", " \\left\\{ p_k \\right\\} = \\left\\{ -1 \\right\\},\\quad\n", " \\left\\{ \\lambda_k \\right\\} = \\left\\{ 2^{-k + 1} \\right\\},\n", "\n", ":math:`\\left\\{ x_k \\right\\} = \\left\\{ 1 + 2^{-k} \\right\\}` for\n", ":math:`k \\in \\mathbb{N}_0`.\n", "\n", "The infinite sequence is permitted by (6.3.3) when\n", ":math:`\\alpha \\leq \\frac{1}{2}` because\n", "\n", ".. math::\n", "\n", " f(x_k + \\lambda_k p_k)\n", " &\\leq f(x_k) + \\alpha \\lambda_k \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\text{(6.3.3)}\\\\\n", " x_k^2 + 2 \\lambda_k x_k p_k + \\lambda_k^2 p_k^2\n", " &\\leq x_k^2 + \\alpha 2 \\lambda_k x_k p_k\\\\\n", " \\frac{\\lambda_k^2}{2} + \\lambda_k x_k p_k - \\alpha \\lambda_k x_k p_k\n", " &\\leq 0\\\\\n", " \\frac{\\lambda_k^2}{2} - \\lambda_k \\left( 1 + 2^{-k} \\right) +\n", " \\alpha \\lambda_k \\left( 1 + 2^{-k} \\right) &\\leq 0\\\\\n", " 2^{1 - 2k} \\left( \\alpha 2^k - 2^k + \\alpha \\right) &\\leq 0\\\\\n", " \\alpha &\\leq \\frac{2^k}{2^k + 1}\\\\\n", " &= \\left( 1 + 2^{-k} \\right)^{-1}.\n", "\n", "The infinite sequence is prohibited by (6.3.4) because\n", "\n", ".. math::\n", "\n", " \\nabla f(x_k + \\lambda_k p_k)^\\top p_k\n", " &\\geq \\beta \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\text{(6.3.4)}\\\\\n", " 2 \\left( x_k + \\lambda_k p_k \\right)^\\top p_k &\\geq 2 \\beta x_k^\\top p_k\\\\\n", " \\beta x_k &\\geq x_k - \\lambda_k\\\\\n", " \\beta &\\geq \\frac{1 - 2^{-k}}{1 + 2^{-k}}\n", "\n", "where the last inequality holds :math:`\\forall k \\in \\mathbb{N}_0` when\n", ":math:`\\beta \\geq 1`, but :math:`\\beta \\in (\\alpha, 1)`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 6\n", "==========\n", "\n", "Let :math:`f(x)` be a positive definite quadratic such that\n", "\n", ".. math::\n", "\n", " f(x) &= \\frac{1}{2} x^\\top A x + b^\\top x + c\\\\\n", " \\nabla f(x) &= A x + b\\\\\n", " \\nabla^2 f(x) &= A\n", "\n", "with :math:`x \\in \\mathbb{R}^n` and :math:`A \\succ 0`.\n", "\n", "Recall that a full Newton step at :math:`x_k \\in \\mathbb{R}^n` means\n", ":math:`\\lambda = 1` and :math:`p = -H_k^{-1} \\nabla f(x_k)` where\n", ":math:`H_k = \\nabla^2 f(x_k) + \\mu_k I` is positive definite with\n", ":math:`\\mu_k = 0` if :math:`\\nabla^2 f(x_k)` is positive definite. Applying a\n", "full Newton step to :math:`f(x)` yields\n", "\n", ".. math::\n", "\n", " f(x_k + \\lambda p)\n", " &= \\frac{1}{2} (x_k + p)^\\top \\nabla^2 f(x_k) (x_k + p) +\n", " b^\\top (x_k + p) + c\\\\\n", " &= f(x_k) + \\frac{1}{2} p^\\top \\nabla^2 f(x_k) p +\n", " x_k^\\top \\nabla^2 f(x_k) p + b^\\top p\n", " & \\quad & \\text{definition of } f(x)\\\\\n", " &= f(x_k) + \\frac{1}{2} p^\\top \\nabla^2 f(x_k) p + \\nabla f(x_k)^\\top p\n", " & \\quad & \\text{definition of } \\nabla f(x)\\\\\n", " &= f(x_k) +\n", " \\frac{1}{2} \\left( -H_k^{-1} \\nabla f(x_k) \\right)^\\top\n", " \\nabla^2 f(x_k) \\left( -H_k^{-1} \\nabla f(x_k) \\right) +\n", " \\nabla f(x_k)^\\top \\left( -H_k^{-1} \\nabla f(x_k) \\right)\\\\\n", " &= f(x_k) -\n", " \\frac{1}{2} \\nabla f(x_k)^\\top \\nabla^2 f(x_k)^{-1} \\nabla f(x_k)\n", " & \\quad & \\text{Hessian is symmetric and }\n", " \\left( A^{-1} \\right)^\\top = \\left( A^\\top \\right)^{-1}\\\\\n", " &= f(x_k) + \\frac{1}{2} \\nabla f(x_k)^\\top p\\\\\n", " &\\leq f(x_k) + \\alpha \\nabla f(x_k)^\\top p,\n", "\n", "where :math:`\\alpha \\leq \\frac{1}{2}`, which satisfies condition (6.3.3).\n", "\n", "Condition (6.3.4) is also satisfied when :math:`\\beta \\in (0, 1)` because\n", "\n", ".. math::\n", "\n", " \\nabla f(x_k + \\lambda p) p\n", " &= \\left[ \\nabla^2 f(x_k) (x_k + p) + b \\right]^\\top p\n", " & \\quad & \\text{definition of } \\nabla f(x)\\\\\n", " &= x_k^\\top \\nabla^2 f(x_k)^\\top p +\n", " b^\\top p + p^\\top \\nabla^2 f(x_k)^\\top p\\\\\n", " &= \\nabla f(x_k)^\\top p + p^\\top \\nabla^2 f(x_k) p\\\\\n", " &\\geq \\beta \\nabla f(x_k)^\\top p\n", " & \\quad & \\text{Hessian is assumed to be positive definite.}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7\n", "==========\n", "\n", "Let (6.3.4) be replaced with\n", ":math:`f(x_{k + 1}) \\geq f(x_k) + \\beta \\nabla f(x_k)^\\top (x_{k + 1} - x_k)`\n", "where :math:`\\beta \\in (\\alpha, 1)`.\n", "\n", "The proof for Theorem 6.3.2 still proceeds as in (6.3.5). Observe that\n", "\n", ".. math::\n", "\n", " f(\\hat{x}) &= f(x_k) + \\alpha \\hat{\\lambda} \\nabla f(x_k)^\\top p_k\\\\\n", " f(\\hat{x}) - f(x_k) &= \\alpha \\hat{\\lambda} \\nabla f(x_k)^\\top p_k\\\\\n", " &> \\beta \\hat{\\lambda} \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\alpha < \\beta \\text{ and } \\nabla f(x_k)^\\top p_k < 0\\\\\n", " f(\\hat{x}) &> f(x_k) + \\beta \\nabla f(x_k)^\\top (\\hat{x} - x_k).\n", "\n", "Thus, by the continuity of :math:`\\nabla f`, Theorem 6.3.2 holds as long as\n", ":math:`\\lambda_k \\in (\\beta \\hat{\\lambda}, \\alpha \\hat{\\lambda})`.\n", "\n", "The proof for Theorem 6.3.3 is the same except for the direct usage of\n", "condition (6.3.4). That condition needs to be derived as\n", "\n", ".. math::\n", "\n", " f(x_{k + 1}) &\\geq f(x_k) + \\beta \\nabla f(x_k)^\\top (x_{k + 1} - x_k)\\\\\n", " \\alpha \\nabla f(x_k)^\\top p_k\n", " &\\geq \\beta \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\text{(6.3.3)}\\\\\n", " \\nabla f(x_k + \\bar{\\lambda} p_k)^\\top p_k\n", " &\\geq \\beta \\nabla f(x_k)^\\top p_k\n", " & \\quad & \\text{(6.3.7)}\\\\\n", " \\nabla f(x_k + \\lambda_k p_k) \\lambda_k p_k\n", " &\\geq \\beta \\nabla f(x_k)^\\top \\lambda_k p_k\n", " & \\quad & \\text{pick } \\lambda_k \\in (\\lambda_l, \\lambda_r)\n", " \\text{ about } \\bar{\\lambda}\\\\\n", " \\nabla f(x_{k + 1}) s_k &\\geq \\beta \\nabla f(x_k)^\\top s_k.\n", "\n", "This derivation can also be used as a transition into the part of Theorem\n", "6.3.4's proof that uses condition (6.3.4)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8\n", "==========\n", "\n", "Given\n", "\n", ".. math::\n", "\n", " x_c = \\begin{pmatrix} 1\\\\ 1 \\end{pmatrix}^\\top,\\quad\n", " f(x) = x_1^4 + x_2^2,\n", "\n", ":math:`\\nabla f(x) = \\begin{bmatrix} 4 x_1^3\\\\ 2 x_2 \\end{bmatrix}` and\n", ":math:`\\nabla^2 f(x) = \\begin{bmatrix} 12 x_1^2 & 0\\\\ 0 & 2 \\end{bmatrix}`.\n", "\n", "Since the Hessian is positive definite,\n", "\n", ".. math::\n", "\n", " p_c =\n", " -H_k^{-1} \\nabla f(x_c) =\n", " -\\nabla^2 f(x_c)^{-1} \\nabla f(x_c) =\n", " \\begin{bmatrix} -\\frac{1}{3} & -1 \\end{bmatrix}^\\top.\n", "\n", "(a)\n", "---\n", "\n", "Performing a full Newton step (:math:`\\lambda = 1`) gives\n", "\n", ".. math::\n", "\n", " x_+ =\n", " x_c + \\lambda p_c =\n", " \\begin{bmatrix} \\frac{2}{3} & 0 \\end{bmatrix}^\\top.\n", "\n", "(b)\n", "---\n", "\n", "Given :math:`\\alpha = 10^{-4}` and :math:`\\lambda_k = 1`, the backtracking\n", "line-search gives the same :math:`x_+` as in (a) because\n", "\n", ".. math::\n", "\n", " f(x_k + \\lambda_k p_k) &< f(x_k) + \\alpha \\lambda_k \\nabla f(x_k)^\\top p_k\\\\\n", " \\left( \\frac{2}{3} \\right)^4 + 0^2\n", " &< \\left( 1^4 + 1^2 \\right) + 10^{-4} (1) \\left( -\\frac{4}{3} - 2 \\right)\\\\\n", " 0.1975 &< 1.9996\n", "\n", "shows that condition (6.3.3) is satisfied.\n", "\n", "(c)\n", "---\n", "\n", "The only thing I can think of is to use (6.3.17) yielding\n", ":math:`\\lambda = 1.08871`, which makes :math:`x_+` closer to the minimizer than\n", "in (b)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 9\n", "==========\n", "\n", "Given :math:`x_c = (1, 1)^\\top` and :math:`f(x) = \\frac{1}{2} x_1^2 + x_2^2`,\n", "\n", ".. math::\n", "\n", " \\nabla f(x) = \\begin{bmatrix} x_1\\\\ 2 x_2 \\end{bmatrix}\n", " \\quad \\text{and} \\quad\n", " \\nabla^2 f(x) = \\begin{bmatrix} 1 & 0\\\\ 0 & 2 \\end{bmatrix}.\n", "\n", "The unconstrained solution to (6.4.1) is\n", "\n", ".. math::\n", "\n", " \\frac{\\partial m_c}{\\partial x} &= 0\\\\\n", " \\nabla f(x_c)^\\top + H_c s &= 0\\\\\n", " s &= -H_c^{-1} \\nabla f(x_c)\\\\\n", " &= -\\begin{bmatrix} 1 & 0\\\\ 0 & 0.5 \\end{bmatrix}\n", " \\begin{bmatrix} 1\\\\ 2 \\end{bmatrix}\\\\\n", " &= -\\begin{bmatrix} 1 & 1 \\end{bmatrix}^\\top.\n", "\n", "When :math:`\\delta = 2`, then the unconstrained solution is valid.\n", "\n", "When :math:`\\delta = \\frac{5}{6}`, (6.4.2) needs to be invoked:\n", "\n", ".. math::\n", "\n", " \\left\\Vert -\\left( H_c + \\mu I \\right)^{-1} \\nabla f(x_c) \\right\\Vert_2\n", " &\\leq \\delta\\\\\n", " \\left\\Vert\n", " -\\begin{bmatrix} (1 + \\mu)^{-1} & 0\\\\ 0 & (2 + \\mu)^{-1} \\end{bmatrix}\n", " \\begin{bmatrix} 1\\\\ 2 \\end{bmatrix}\n", " \\right\\Vert_2 &\\leq \\delta\\\\\n", " \\left( 1 + \\mu \\right)^{-2} + 2^2 \\left( 2 + \\mu \\right)^{-2} &\\leq \\delta\\\\\n", " \\mu &\\geq 0.7798." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 10\n", "===========\n", "\n", "Given :math:`f(x) = x_1^2 + 2 x_2^2 + 3 x_3^2`,\n", "\n", ".. math::\n", "\n", " \\nabla f(x) = \\begin{bmatrix} 2 x_1\\\\ 4 x_2\\\\ 6 x_3 \\end{bmatrix}\n", " \\quad \\text{and} \\quad\n", " \\nabla^2 f(x) =\n", " \\begin{bmatrix} 2 & 0 & 0\\\\ 0 & 4 & 0\\\\ 0 & 0 & 6 \\end{bmatrix}.\n", "\n", "When :math:`x_c = (1, 1, 1)^\\top`,\n", "\n", ".. math::\n", "\n", " \\nabla f(x_c) = (2, 4, 6)^\\top\n", " \\quad \\text{and} \\quad\n", " H_c^{-1} \\nabla f(x_c) = (1, 1, 1)^\\top.\n", "\n", "Let :math:`\\mu_0 = 1`, :math:`\\mu_1 = 2`, and :math:`\\mu_2 = 3`.\n", "\n", ".. math::\n", "\n", " s(\\mu_0) = -(H_c + \\mu_0 I)^{-1} \\nabla f(x_c) =\n", " -(\\frac{2}{3}, \\frac{4}{5}, \\frac{6}{7})^\\top\\\\\n", " s(\\mu_1) = -(H_c + \\mu_1 I)^{-1} \\nabla f(x_c) =\n", " -(\\frac{2}{4}, \\frac{4}{6}, \\frac{6}{8})^\\top\\\\\n", " s(\\mu_2) = -(H_c + \\mu_2 I)^{-1} \\nabla f(x_c) =\n", " -(\\frac{2}{5}, \\frac{4}{7}, \\frac{6}{9})^\\top.\n", "\n", "As shown below, the volume of the tetrahedron defined by these points is not\n", "zero and none of the singular values are zero." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy\n", "\n", "M = numpy.ones((4,4))\n", "M[0,:3] = numpy.asfarray([1, 1, 1])\n", "M[1,:3] = -numpy.asfarray([2/3, 4/5, 6/7])\n", "M[2,:3] = -numpy.asfarray([2/4, 4/6, 6/8])\n", "M[3,:3] = -numpy.asfarray([2/5, 4/7, 6/9])\n", "print('Volume of Tetrahedron = {0}'.format(numpy.linalg.det(M)))\n", "print('Singular Values = {0}'.format(numpy.linalg.svd(M[:,:3])[1]))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 11\n", "===========\n", "\n", "Given :math:`H \\in \\mathbb{R}^{n \\times n}` is symmetric and positive\n", "definite, it has an eigendecomposition of\n", ":math:`H = Q \\Lambda Q^\\top = \\sum_{i = 1}^n \\lambda_i v_i v_i^\\top`.\n", "\n", "Let :math:`g = \\sum_{j = 1}^n \\alpha_j v_j` and :math:`\\mu \\geq 0`.\n", "\n", ".. math::\n", "\n", " \\left( H + \\mu I \\right)^{-1} g\n", " &= \\left( Q \\Lambda Q^\\top + Q \\text{M} Q^\\top \\right)^{-1} g\n", " & \\quad & \\text{M is a diagonal matrix of } \\mu\\\\\n", " &= \\sum_{j = 1}^n \\sum_{i = 1}^n\n", " \\frac{\\alpha_j}{\\lambda_i + \\mu} v_i v_i^\\top v_j\\\\\n", " &= \\sum_{j = 1}^n \\frac{\\alpha_j}{\\lambda_j + \\mu} v_j\n", " & \\quad & \\text{eigenvectors form an orthonormal basis}\n", "\n", "and\n", "\n", ".. math::\n", "\n", " \\left\\Vert \\left( H + \\mu I \\right)^{-1} g \\right\\Vert_2^2\n", " &= \\left\\Vert\n", " \\sum_{j = 1}^n \\frac{\\alpha_j}{\\lambda_j + \\mu} v_j\n", " \\right\\Vert_2^2\\\\\n", " &= \\left( \\sum_{j = 1}^n \\frac{\\alpha_j}{\\lambda_j + \\mu} v_j \\right)^\\top\n", " \\left( \\sum_{j = 1}^n \\frac{\\alpha_j}{\\lambda_j + \\mu} v_j \\right)\\\\\n", " &= \\sum_{j = 1}^n \\sum_{i = 1}^n \\frac{\\alpha_j}{\\lambda_j + \\mu}\n", " \\frac{\\alpha_i}{\\lambda_i + \\mu} v_j^\\top v_i\\\\\n", " &= \\sum_{j = 1}^n \\left( \\frac{\\alpha_j}{\\lambda_j + \\mu} \\right)^2\n", " & \\quad & \\text{definition of orthonormal basis}\\\\\n", " \\left\\Vert s(\\mu) \\right\\Vert_2\n", " &= \\left[\n", " \\sum_{j = 1}^n \\left( \\frac{\\alpha_j}{\\lambda_j + \\mu} \\right)^2\n", " \\right]^{1/2}.\n", "\n", "This relates to :math:`m_c(\\mu)` in (6.4.5) in the sense that\n", ":math:`g = -\\nabla f(x_c)`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 12\n", "===========\n", "\n", "Let :math:`s(\\mu) = (H + \\mu I)^{-1} g` for :math:`\\mu \\geq 0` and\n", ":math:`\\eta(\\mu) = \\left\\Vert s(\\mu) \\right\\Vert`.\n", "\n", ".. math::\n", "\n", " \\frac{d}{d \\mu} \\eta(\\mu)\n", " &= \\frac{d}{d \\mu}\n", " \\left[\n", " \\sum_{j = 1}^n \\left( \\frac{\\alpha_j}{\\lambda_j + \\mu} \\right)^2\n", " \\right]^{1/2}\\\\\n", " &= \\frac{1}{2} \\left\\Vert s(\\mu) \\right\\Vert_2^{-1}\n", " \\left[ \\frac{d}{d \\mu}\n", " \\sum_{j = 1}^n \\left( \\frac{\\alpha_j}{\\lambda_j + \\mu} \\right)^2\n", " \\right]\n", " & \\quad & \\text{chain rule}\\\\\n", " &= \\frac{1}{2} \\left\\Vert s(\\mu) \\right\\Vert_2^{-1}\n", " \\left[\n", " \\sum_{j = 1}^n 2 \\frac{\\alpha_j}{\\lambda_j + \\mu}\n", " \\frac{d \\alpha_j \\left( \\lambda_j + \\mu \\right)^{-1}}{d \\mu}\n", " \\right]\n", " & \\quad & \\text{linearity of differentiation and chain rule}\\\\\n", " &= \\left\\Vert s(\\mu) \\right\\Vert_2^{-1}\n", " \\left[\n", " \\sum_{j = 1}^n -\\frac{\\alpha_j^2}{\\left( \\lambda_j + \\mu \\right)^3}\n", " \\right]\n", " & \\quad & \\text{linearity of differentiation and chain rule}\\\\\n", " &= -\\left\\Vert s(\\mu) \\right\\Vert_2^{-1}\n", " \\left[ g^\\top Q \\left( \\Lambda + \\text{M} \\right)^{-3} Q^\\top g \\right]\n", " & \\quad & s(\\mu)^\\top s(\\mu)\\\\\n", " &= -\\frac{\n", " s(\\mu)^\\top \\left( H + \\mu I \\right)^{-1} s(\\mu)\n", " }{\n", " \\left\\Vert s(\\mu) \\right\\Vert_2\n", " }\n", " & \\quad & Q Q^\\top = Q^\\top Q = I." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 13\n", "===========\n", "\n", "Given :math:`f(x) = \\frac{1}{2} x_1^2 + x_2^2`,\n", "\n", ".. math::\n", "\n", " \\nabla f(x) = (x_1, 2 x_2)^\\top\n", " \\quad \\text{and} \\quad\n", " \\nabla^2 f(x) = \\begin{bmatrix} 1 & 0\\\\ 0 & 2 \\end{bmatrix}.\n", "\n", "Let :math:`x_0 = (1,1)^\\top`, :math:`g = \\nabla f(x_0) = (1, 2)^\\top`, and\n", ":math:`H = \\nabla^2 f(x_0) = \\begin{bmatrix} 1 & 0\\\\ 0 & 2 \\end{bmatrix}`.\n", "\n", "When :math:`\\delta = 1`, the Cauchy point is outside the region and does not\n", "trigger the estimation of the parameterized line unlike when\n", ":math:`\\delta = 1.25`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy\n", "\n", "def double_dogleg_step(x, g, H, delta):\n", " s_N = -numpy.linalg.inv(H) * g\n", "\n", " #return the Newton point via Newton step\n", " if (s_N.T * s_N).flat[0] <= delta**2:\n", " return x + s_N\n", "\n", " #compute Cauchy point\n", " lambda_star = (g.T * g / (g.T * H * g)).flat[0]\n", " C_P = x - lambda_star * g\n", "\n", " #compute Newton point\n", " s_CP = -lambda_star * g\n", "\n", " _ = (g.T * H * g).flat[0] * (g.T * numpy.linalg.inv(H) * g).flat[0]\n", " gamma = (g.T * g).flat[0]**2 / _\n", " eta = 0.8 * gamma + 0.2\n", " N = x + eta * s_N\n", "\n", " s_hat_N = eta * s_N\n", "\n", " a = ((s_hat_N - s_CP).T * (s_hat_N - s_CP)).flat[0]\n", " b = 2 * ((s_hat_N - s_CP).T * s_CP).flat[0]\n", " c = (s_CP.T * s_CP).flat[0] - delta**2\n", "\n", " if delta <= numpy.linalg.norm(g) * lambda_star:\n", " #C.P. exceeds delta-region\n", " _ = x - delta / numpy.linalg.norm(g) * g\n", " else:\n", " #C.P. is within delta-region, so solve for parameterized line\n", " lambda_c = max((-b + numpy.sqrt(b**2 - 4 * a * c)) / (2 * a),\n", " (-b - numpy.sqrt(b**2 - 4 * a * c)) / (2 * a))\n", " _ = x + s_CP + lambda_c * (s_hat_N - s_CP)\n", "\n", " fig = plt.figure(figsize=[16,8])\n", " plt.plot([x.flat[0], C_P.flat[0]], [x.flat[1], C_P.flat[1]],\n", " marker='o', linestyle='--', color='r', label='CP')\n", " plt.plot([x.flat[0], N.flat[0]], [x.flat[1], N.flat[1]],\n", " marker=r'$\\checkmark$', linestyle=':', color='b', label='N')\n", " plt.plot([C_P.flat[0], N.flat[0]], [C_P.flat[1], N.flat[1]],\n", " marker=r'$\\bowtie$', linestyle='-', color='g')\n", " plt.plot([_.flat[0]], [_.flat[1]],\n", " marker=r'$\\lambda$', color='k', label='x', markersize=10)\n", " circ = numpy.linspace(0, 2 * numpy.pi, 128)\n", " plt.plot(x.flat[0] + delta * numpy.cos(circ),\n", " x.flat[1] + delta * numpy.sin(circ))\n", " plt.axis('equal')\n", " plt.title(r'Double Dogleg Curve $\\delta = {0}$'.format(delta))\n", " plt.legend()\n", " plt.show()\n", " return _\n", "\n", "def fp(x):\n", " _ = x.flat\n", " return numpy.asmatrix([_[0], 2 * _[1]]).T\n", "\n", "def fpp(x):\n", " return numpy.asmatrix([[1, 0], [0, 2]])\n", "\n", "x_0 = numpy.asmatrix([1, 1]).T\n", "g = fp(x_0)\n", "H = fpp(x_0)\n", "for delta in [1, 5 / 4]:\n", " _ = double_dogleg_step(x_0, g, H, delta)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 14\n", "===========\n", "\n", ".. math::\n", "\n", " \\left( g^\\top g \\right)^2\n", " &= \\left( u^\\top v \\right)^2\n", " & \\quad & \\text{change of variables with }\n", " u = H^{1/2} g \\text{ and } v = H^{-1/2} g\\\\\n", " &\\leq \\left( u^\\top u \\right) \\left( v^\\top v \\right)\n", " & \\quad & \\text{Cauchy-Schwarz inequality}\\\\\n", " &= \\left( g^\\top H g \\right) \\left( g^\\top H^{-1} g \\right)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 15\n", "===========\n", "\n", "Let :math:`f(x_1, x_2) = x_1^4 + x_1^2 + x_2^2`, then\n", ":math:`\\nabla f(x) = (4 x_1^3 + 2 x_1, 2 x_2)^\\top` and\n", ":math:`H = \\nabla^2 f(x) =\n", "\\begin{bmatrix} 12 x_1^2 + 2 & 0\\\\ 0 & 2 \\end{bmatrix}`.\n", "\n", "With the update of :math:`\\delta_c = 1` and :math:`x_c = (0.666, 0.665)^\\top`,\n", ":math:`\\nabla f(x_c) = (2.514, 1.33)^\\top` and\n", ":math:`H = \\begin{bmatrix} 7.323 & 0\\\\ 0 & 2 \\end{bmatrix}`.\n", "\n", "Based on lemma 6.4.1, performing a Newton step yields\n", ":math:`s_c^N = -H^{-1} \\nabla f(x_c) = (-0.343, -0.665)^\\top`.\n", "\n", "Since :math:`\\left\\Vert s_c^N \\right\\Vert_2 = 0.74 \\leq \\delta_c`, the solution\n", "for (6.4.2) is :math:`s(0) = s_c^N`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 16\n", "===========\n", "\n", "Let :math:`F(x) = (x_1, 2 x_2)^\\top` and :math:`x_0 = (1, 1)^\\top`, then\n", ":math:`J(x_0) = \\begin{bmatrix} 1 & 0\\\\ 0 & 2 \\end{bmatrix}`.\n", "\n", "The steepest-descent direction for (6.5.3) is\n", "\n", ".. math::\n", "\n", " s = -\\frac{\\nabla f(x_0) }{\\left\\Vert \\nabla f(x_0) \\right\\Vert_2} =\n", " -\\frac{\n", " J(x_0)^\\top F(x_c)\n", " }{\n", " \\left\\Vert \\nabla J(x_0)^\\top F(x_c) \\right\\Vert_2\n", " } =\n", " \\begin{bmatrix} -0.323\\\\ -1.291 \\end{bmatrix}.\n", "\n", "Applying (6.2.4) shows that\n", ":math:`c \\triangleq \\frac{2 - 1}{2 + 1} = \\frac{1}{3}`. Therefore, the\n", "convergence rate will be slower than linear." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def F(x):\n", " x = x.flat\n", " return numpy.asmatrix([x[0], x[1] * 2]).T\n", "\n", "def J(x):\n", " return numpy.asmatrix([[1, 0], [0, 2]])\n", "\n", "x_c = numpy.asmatrix([1, 1]).T\n", "for i in range(8):\n", " top = J(x_c).T * F(x_c)\n", " s = -top / numpy.sqrt(top.T * top)\n", " x_c = x_c + s\n", " print('Iteration {0}: {1}'.format(i, numpy.asarray(x_c).squeeze()))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-6.17:\n", "\n", "Exercise 17\n", "===========\n", "\n", "The following facts are useful in the proof that follows.\n", "\n", "Given a matrix :math:`A \\in \\mathbb{R}^{m \\times n}` of rank :math:`r`, the\n", "nullspace :math:`\\mathbf{N}(A)` is the space of all vectors :math:`x` such that\n", ":math:`Ax = 0`. Likewise, the left nullspace is the space of all vectors\n", ":math:`y` such that :math:`A^\\top y = 0` or equivalently :math:`y^\\top A = 0`.\n", "The dimension of the column space :math:`\\mathbf{C}(A)` is :math:`r` and the\n", "dimension of the row space :math:`\\mathbf{C}(A^\\top)` is also :math:`r`. The\n", "dimension of the nullspace :math:`\\mathbf{N}(A)` is :math:`n - r` and the\n", "dimension of the left nullspace :math:`\\mathbf{N}(A^\\top)` is :math:`m - r`.\n", "\n", "Let :math:`f(x) = \\frac{1}{2} F(x)^\\top F(x)` where\n", ":math:`F \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^n` and each component\n", ":math:`f_i` for :math:`i = 1 \\ldots, n` is continuously differentiable.\n", "\n", "Recall from definition 4.1.7 that\n", "\n", ".. math::\n", "\n", " F'(x) = \\nabla F(x)^\\top =\n", " \\begin{bmatrix}\n", " \\frac{\\partial}{\\partial x_1}\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial}{\\partial x_n}\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " f_1(x) & \\cdots & f_n(x)\n", " \\end{bmatrix} =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_1(x)}{\\partial x_n}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial f_n(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_n}\\\\\n", " \\end{bmatrix} =\n", " J(x).\n", "\n", "Since :math:`F(\\hat{x}) \\neq 0`, by the definition of local minimizer\n", "(Lemma 4.3.1),\n", "\n", ".. math::\n", "\n", " \\nabla f(x_c)\n", " &= \\frac{d}{dx} \\sum_{i = 1}^n \\frac{1}{2} \\left( f_i(x_c) \\right)^2\\\\\n", " &= \\frac{1}{2} \\sum_{i = 1}^n \\frac{d}{dx} \\left( f_i(x_c) \\right)^2\\\\\n", " &= \\frac{1}{2} \\sum_{i = 1}^n\n", " \\begin{bmatrix}\n", " \\frac{\\partial}{\\partial x_1} \\left( f_i(x_c) \\right)^2\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial}{\\partial x_n} \\left( f_i(x_c) \\right)^2\n", " \\end{bmatrix}\\\\\n", " &= \\frac{1}{2} \\sum_{i = 1}^n\n", " \\begin{bmatrix}\n", " 2 f_i(x_c) \\frac{\\partial f_i(x_c)}{\\partial x_1}\\\\\n", " \\vdots\\\\\n", " 2 f_i(x_c) \\frac{\\partial f_i(x_c)}{\\partial x_n}\\\\\n", " \\end{bmatrix}\\\\\n", " &= \\sum_{i = 1}^n \\nabla f_i(x_c) f_i(x_c)\\\\\n", " &= J(x_c)^\\top F(x_c)\n", "\n", "implies that\n", "\n", ".. math::\n", "\n", " \\nabla f(x_c) = \\boldsymbol{0} = J(\\hat{x})^\\top F(\\hat{x}) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_1}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial f_1(x)}{\\partial x_n} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_n}\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " f_1(x)\\\\\n", " \\vdots\\\\\n", " f_n(x)\n", " \\end{bmatrix}.\n", "\n", "By the fundamental theorem of linear algebra, the corank of :math:`J(\\hat{x})`\n", "is not zero. Since :math:`J` is a square matrix, its corank is equal to its\n", "nullity. Thus, :math:`J(\\hat{x})` is singular.\n", "\n", "As shown in Figure 6.5.1, when :math:`J(\\hat{x})` is singular,\n", ":math:`\\hat{x}` could also be a global minimizer." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-6.18:\n", "\n", "Exercise 18\n", "===========\n", "\n", "A useful fact to observe is\n", "\n", ".. math::\n", "\n", " J(x)^\\top J(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_1}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial f_1(x)}{\\partial x_n} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_n}\\\\\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_1(x)}{\\partial x_n}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial f_n(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_n}\\\\\n", " \\end{bmatrix}.\n", "\n", "To show that the model holds, we need to compute the gradient (see\n", ":ref:`Exercise 17 `) and Hessian of the\n", "second-order Taylor series.\n", "\n", ".. math::\n", "\n", " \\nabla^2 f(x_c) &= \\sum_{i = 1}^n \\frac{d}{dx} f_i(x_c) \\nabla f_i(x_c)\\\\\n", " &= \\sum_{i = 1}^n f_i(x_c) \\frac{d \\nabla f_i(x_c)}{dx} +\n", " \\frac{d f_i(x_c)}{dx} \\nabla f_i(x_c)^\\top\\\\\n", " &= \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c) +\n", " \\frac{d f_i(x_c)}{dx} \\nabla f_i(x_c)^\\top\n", " & \\quad & \\text{(4.1.5)}\\\\\n", " &= \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c) +\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_i(x_c)}{\\partial x_1}\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial f_i(x_c)}{\\partial x_n}\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_i(x_c)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_i(x_c)}{\\partial x_n}\n", " \\end{bmatrix}\\\\\n", " &= J(x_c)^\\top J(x_c) + \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c).\n", "\n", "Compared to (6.5.4), this model requires the extra term\n", ":math:`\\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c)`. The Newton step for\n", "minimizing :math:`m_c(x)` is\n", "\n", ".. math::\n", "\n", " \\frac{d}{ds} m_c(x_c + s)\n", " &= \\frac{d}{ds} \\left(\n", " \\frac{1}{2} F(x_c)^\\top F(x_c) +\n", " \\left[ J(x_c)^\\top F(x_c) \\right]^\\top s +\n", " \\frac{1}{2} s^\\top \\left[\n", " J(x_c)^\\top J(x_c) + \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c)\n", " \\right] s\n", " \\right)\\\\\n", " 0 &= J(x_c)^\\top F(x_c) +\n", " \\left[\n", " J(x_c)^\\top J(x_c) + \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c)\n", " \\right] s\\\\\n", " -J(x_c)^\\top F(x_c)\n", " &= \\left[\n", " J(x_c)^\\top J(x_c) + \\sum_{i = 1}^n f_i(x_c) \\nabla^2 f_i(x_c)\n", " \\right] s,\n", "\n", "which is the same as the Newton step for :math:`F(x) = 0` if :math:`J` is\n", "nonsingular and the extra term evaluates to zero. Compared to (6.5.5)\n", "and the section after example 6.5.1, the attractiveness of this approach is\n", "that it does not need to arbitrarily perturb the model when :math:`J` is\n", "singular as long as the overall sum is nonsingular." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 19\n", "===========\n", "\n", "The second term of the original :math:`x_0`, which is very close to\n", ":math:`\\frac{e}{6} \\cong 0.45`, would make :math:`J(x_0)` singular." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy\n", "\n", "def J(x):\n", " x = x.flat\n", " return numpy.asmatrix([[2 * x[0], 2 * x[1]],\n", " [numpy.e**(x[0] - 1), 3 * x[1]**2]])\n", "\n", "x = numpy.asmatrix([2, numpy.e / 6]).T\n", "print('Singular Values of J(x_0): {0}'.format(numpy.linalg.svd(J(x))[1]))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 20\n", "===========\n", "\n", "If the approximation is the one in :ref:`Exercise 18 `,\n", "then the double dogleg strategy would return the Newton point.\n", "\n", "If the approximation was using (6.5.4), the strategy would compute the point\n", "along the line connecting the Cauchy point and the Newton point." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def F(x, i=None):\n", " x = x.flat\n", " if i is not None:\n", " if 0 == i:\n", " return x[0]**2 + x[1]**2 - 2\n", " elif 1 == i:\n", " _ = numpy.e**(x[0] - 1)\n", " return _ + x[1]**3 - 2\n", "\n", " _ = numpy.e**(x[0] - 1)\n", " return numpy.asmatrix([x[0]**2 + x[1]**2 - 2,\n", " _ + x[1]**3 - 2]).T\n", "\n", "def Fpp(x, i):\n", " x = x.flat\n", " if 0 == i:\n", " return numpy.asmatrix([[2, 0], [0, 2]]).T\n", " elif 1 == i:\n", " _ = numpy.e**(x[0] - 1)\n", " return numpy.asmatrix([[_, 0], [0, 6 * x[1]]]).T\n", " raise Exception('Invalid dimension {0}'.format(i))\n", "\n", "def J(x):\n", " x = x.flat\n", " _ = numpy.e**(x[0] - 1)\n", " return numpy.asmatrix([[2 * x[0], 2 * x[1]],\n", " [_, 3 * x[1]**2]])\n", "\n", "def f(x):\n", " return (0.5 * F(x).T * F(x)).flat[0]\n", "\n", "def fp(x):\n", " return J(x).T * F(x)\n", "\n", "def fpp(x):\n", " return J(x).T * J(x) + F(x, 0) * Fpp(x, 0) + F(x, 1) * Fpp(x, 1)\n", "\n", "x_0 = numpy.asmatrix([2, 0.5]).T\n", "g = fp(x_0)\n", "H = fpp(x_0)\n", "delta_c = 0.5\n", "_ = double_dogleg_step(x_0, g, H, delta_c)\n", "print('x_+ = {}'.format(numpy.asarray(_).squeeze()))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-6.21:\n", "\n", "Exercise 21\n", "===========\n", "\n", "Recall that :math:`m_c(x_c + s) = f(x_c) + \\nabla f(x_c)^\\top s +\n", "\\frac{1}{2} s^\\top H_c s`. The residual in nonlinear :doc:`conjugate gradient `\n", "is :math:`-\\nabla f(x_c) = s^{\\text{C.P}}`. Nonlinear CG and the dogleg step\n", "proceed with computing the step size that minimizes\n", ":math:`f(x_c - \\lambda \\nabla f(x_c))`.\n", "\n", "When\n", "\n", ".. math::\n", "\n", " \\hat{N} =\n", " x_+^N =\n", " x_c - H_c^{-1} \\nabla f(x_c) = x_c + s^N,\n", "\n", ":math:`\\eta = 1` must be true by definition.\n", "\n", "By definition of linear subspace, the parameterized line segment\n", "\n", ".. math::\n", "\n", " x_+(\\lambda)\n", " &= x_c + s^{\\text{C.P}} + \\lambda (s^{N} - s^{\\text{C.P}})\\\\\n", " &= x_c + (1 - \\lambda) s^{\\text{C.P}} + \\lambda s^{N}\n", "\n", "occupies the convex space spanned by the steepest-descent and Newton directions.\n", "In nonlinear CG, the conjugate directions are sums of scaled residuals defining\n", "a linear subspace. Q.E.D. because any linear subspace is a convex subspace." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 22\n", "===========\n", "\n", "Notice that the analysis of :ref:`Exercise 21 `\n", "is applicable only when :math:`\\hat{N} = x_+^N`. One way to alleviate this\n", "criticism is to use the conjugate direction as a solution i.e. combine dogleg\n", "with conjugate gradient. See :cite:`steihaug1983conjugate` for more details.\n", "At the time of reading (2016) this paper, the steps taken in the proof and\n", "the accompanying algorithm are reasonable (i.e. no mysterious jumps in\n", "reasoning) but the reader needs to be willing to work out the details step\n", "by step." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 23\n", "===========\n", "\n", "Let :math:`J \\in \\mathbb{R}^{n \\times n}` be singular. By Theorem 3.6.4, there\n", "exists a decomposition\n", ":math:`J = U D V^\\top = \\sum_{i = 1}^r \\sigma_i u_i v_i^\\top` where\n", ":math:`r < n` is the rank of :math:`J` with :math:`U^\\top U = I`,\n", ":math:`V^\\top V = I`, and :math:`D^\\top D = \\Lambda`.\n", "\n", "(a)\n", "---\n", "\n", "For finite square matrices :math:`AB = I`,\n", "\n", ".. math::\n", "\n", " \\boldsymbol{0}_{n, n} = B - B = B - B I = B - B (AB) = (I - BA) B\n", "\n", "i.e. :math:`I = BA` implies :math:`A` and :math:`B` are both orthogonal\n", "matrices.\n", "\n", ":math:`\\alpha =\n", "\\left(n \\epsilon \\right)^{1/2} \\left\\Vert J^\\top J \\right\\Vert_1 > 0`\n", "unless :math:`J` is the null matrix because\n", "\n", ".. math::\n", "\n", " \\begin{array}{rcl}\n", " n^{-1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " \\leq& \\left\\Vert J^\\top J \\right\\Vert_2\n", " &\\leq n^{1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " & \\quad & \\text{(3.1.13)}\\\\\n", " n^{-1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " \\leq&\n", " \\left(\n", " \\max_{\\lambda} \\left\\{ \\left( J^\\top J \\right)^\\top J^\\top J \\right\\}\n", " \\right)^{1/2}\n", " &\\leq n^{1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " & \\quad & \\text{(3.1.8b)}\\\\\n", " n^{-1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " \\leq& \\lambda_1\n", " &\\leq n^{1/2} \\left\\Vert J^\\top J \\right\\Vert_1\n", " \\end{array}\n", "\n", "where :math:`\\lambda_1 \\geq \\cdots \\geq \\lambda_r` and\n", ":math:`\\lambda_{r + 1} = \\cdots = \\lambda_{n} = 0`.\n", "\n", "Applying Theorem 3.5.7 to :math:`A = J^\\top J + \\alpha I` gives\n", "\n", ".. math::\n", "\n", " A &= V \\Lambda V^\\top + \\alpha V V^\\top\n", " & \\quad & \\text{orthogonal matrix}\\\\\n", " &= V \\left( \\Lambda + \\alpha I \\right) V^\\top\n", "\n", "which affirms that :math:`J^\\top J + \\alpha I` is nonsingular while\n", ":math:`J^\\top J` is singular.\n", "\n", "The condition number in the induced :math:`l_2` matrix norm is\n", "\n", ".. math::\n", "\n", " \\kappa_2(A)\n", " &= \\left\\Vert A \\right\\Vert_2 \\left\\Vert A^{-1} \\right\\Vert_2\n", " & \\quad & \\text{definition of condition number page 53}\\\\\n", " &= \\left( \\max_{\\lambda} \\left\\{ A^\\top A \\right\\} \\right)^{1/2}\n", " \\left( \\min_{\\lambda} \\left\\{ A^\\top A \\right\\} \\right)^{-1/2}\n", " & \\quad & \\text{(3.1.8b), (3.1.18)}\\\\\n", " &= \\frac{\\lambda_1 + \\alpha}{\\alpha}\\\\\n", " &= \\frac{\\lambda_1}{\n", " \\left(n \\epsilon \\right)^{1/2} \\left\\Vert J^\\top J \\right\\Vert_1} + 1.\n", "\n", "See :ref:`Exercise 3.6 ` for a proof on (3.1.8b).\n", "\n", "The foregoing derivations imply\n", "\n", ".. math::\n", "\n", " \\begin{array}{rcl}\n", " n^{-1/2} + \\left( n \\epsilon \\right)^{1/2}\n", " \\leq& \\left( n \\epsilon \\right)^{1/2} \\kappa_2(A)\n", " &\\leq n^{1/2} + \\left( n \\epsilon \\right)^{1/2}\\\\\n", " \\frac{1}{n \\epsilon^{1/2}}\n", " \\leq& \\kappa_2 \\left( J^\\top J + \\left( n \\epsilon \\right)^{1/2}\n", " \\left\\Vert J^\\top J \\right\\Vert_1 I \\right) - 1\n", " &\\leq \\frac{1}{\\epsilon^{1/2}}.\n", " \\end{array}\n", "\n", "(b)\n", "---\n", "\n", "The case :math:`\\kappa_2(J) \\geq \\epsilon^{-1/2}` implies that\n", ":math:`J` is nonsingular because\n", "\n", ".. math::\n", "\n", " \\kappa_2(J) &= \\left\\Vert J \\right\\Vert_2 \\left\\Vert J^{-1} \\right\\Vert_2\\\\\n", " &= \\left( \\max_\\lambda J^\\top J \\right)^{1/2}\n", " \\left( \\min_\\lambda J^\\top J \\right)^{-1/2}\\\\\n", " &= \\left( \\frac{\\lambda_1}{\\lambda_n} \\right)^{1/2}\\\\\n", " &= \\frac{\\sigma_1}{\\sigma_n}.\n", "\n", "Applying (3.1.13) yields\n", "\n", ".. math::\n", "\n", " \\begin{array}{rcl}\n", " n^{-1/2} \\left\\Vert J \\right\\Vert_1\n", " \\leq& \\sigma_1\n", " &\\leq n^{1/2} \\left\\Vert J \\right\\Vert_1\\\\\n", " n^{-1/2} \\left\\Vert J \\right\\Vert_1\n", " \\leq& \\kappa_2(J) \\sigma_n\n", " &\\leq n^{1/2} \\left\\Vert J \\right\\Vert_1.\n", " \\end{array}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 24\n", "===========\n", "\n", "Let :math:`F \\in \\mathbb{R}^n` be nonzero and\n", ":math:`J \\in \\mathbb{R}^{n \\times n}` be singular.\n", "\n", "By Theorem 3.6.4, there exists a decomposition\n", ":math:`J = U D V^\\top = \\sum_{i = 1}^r \\sigma_i u_i v_i^\\top` where\n", ":math:`r < n` is the rank of :math:`J` with :math:`U` and :math:`V` being\n", "orthogonal matrices and :math:`D^\\top D = \\Lambda`.\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\lim_{\\alpha \\rightarrow 0} \\left( J^\\top J + \\alpha I \\right)^{-1} J^\\top\n", " &= \\lim_{\\alpha \\rightarrow 0}\n", " \\left( V \\left( \\Lambda + \\alpha I \\right) V^\\top \\right)^{-1}\n", " V D U^\\top\\\\\n", " &= \\lim_{\\alpha \\rightarrow 0}\n", " V \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top\\\\\n", " &= \\sum_{i = 1}^r \\lim_{\\alpha \\rightarrow 0}\n", " \\frac{\\sigma_i}{\\sigma_i^2 + \\alpha} v_i u_i^\\top\n", " & \\quad & D_{ii} = 0\\, \\forall i > r\\\\\n", " &= V D^+ U^\\top.\n", "\n", "(b)\n", "---\n", "\n", "Recall that the null space :math:`\\mathbf{N}(J)` is the space of all vectors\n", ":math:`w` such that :math:`Jw = U D V^\\top w = 0`. A useful observation is\n", "\n", ".. math::\n", "\n", " I^+ = D D^+ = \\text{diag}(d_1, d_2, \\ldots, d_r, d_{r + 1}, \\ldots, d_n)\n", "\n", "where :math:`d_i = 1` when :math:`i \\leq r` and zero otherwise.\n", "\n", ".. math::\n", "\n", " w^\\top \\left( J^\\top J + \\alpha I \\right) J^\\top v\n", " &= w^\\top V \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top v\\\\\n", " &= w^\\top V I^+ \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top v\n", " & \\quad & \\text{(a)}\\\\\n", " &= w^\\top V D U^\\top U D^+ \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top v\n", " & \\quad & U^\\top U = I\\\\\n", " &= w^\\top J^\\top U D^+ \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top v\\\\\n", " &= \\vec{0}^\\top U D^+ \\left( \\Lambda + \\alpha I \\right)^{-1} D U^\\top v\\\\\n", " &= 0\n", "\n", ".. math::\n", "\n", " w^\\top J^+ v &= w^\\top V D^+ U^\\top v\\\\\n", " &= w^\\top V I^+ D^+ U^\\top v\\\\\n", " &= w^\\top V D U^\\top U D^+ D^+ U^\\top v\\\\\n", " &= w^\\top J^\\top U D^+ D^+ U^\\top v\\\\\n", " &= \\vec{0}^\\top U D^+ D^+ U^\\top v\\\\\n", " &= 0." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 25\n", "===========\n", "\n", "See :cite:`shultz1985family` for a nice proof on the convergence of line search\n", "and trust region algorithms." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. rubric:: References\n", "\n", ".. bibliography:: chapter-06.bib" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }