{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "***********************\n", "Nonlinear Least Squares\n", "***********************\n", "\n", "Gauss-Newton-Type Methods\n", "=========================\n", "\n", "- Advantages and disadvantages of Gauss-Newton method\n", "\n", " - See Table 10.2.3.\n", " - Each step is a descent direction.\n", "\n", "- Damped Gauss-Newton\n", "\n", " - Combine line-search with Gauss-Newton to make it locally convergent on most\n", " large-residual or very nonlinear problems.\n", " - Still requires Jacobian to be full column rank.\n", "\n", "- Levenberg-Marquardt\n", "\n", " - Combine Gauss-Newton with trust region to ensure inverse is well-defined\n", " even if Jacobian is not full column rank.\n", " - Proven to be globally convergent.\n", " - Each step is within the vicinity of steepest descent direction.\n", " - Often superior to damped Gauss-Newton.\n", " - Theorem 10.2.6 suggests that this algorithm may still be slowly\n", " convergent on large residual or very nonlinear problems.\n", "\n", " - In practice, More's scaled trust region is very successful." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-10.1:\n", "\n", "Exercise 1\n", "==========\n", "\n", "Define\n", "\n", ".. math::\n", "\n", " \\begin{gather*}\n", " R(x) \\colon \\mathbb{R}^{n = 4} \\mapsto \\mathbb{R}^{m = 20},\\\\\n", " r_i(x) = x_1 + x_2 \\exp[-(t_i + x_3)^2 / x_4] - y_i, \\text{and}\\\\\n", " f(x) = \\frac{1}{2} R(x)^\\top R(x) = \\frac{1}{2} \\sum_{i = 1}^m r_i(x)^2.\n", " \\end{gather*}\n", "\n", ":math:`J(x)`\n", "------------\n", "\n", "See :ref:`Exercise 6.17 ` for more details.\n", "\n", ".. math::\n", "\n", " J(x) = R'(x)\n", " &= \\begin{bmatrix}\n", " \\frac{\\partial r_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial r_1(x)}{\\partial x_n}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial r_m(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial r_m(x)}{\\partial x_n}\\\\\n", " \\end{bmatrix}\n", " & \\quad & \\text{Exercise 6.17}\\\\\n", " &= \\begin{bmatrix}\n", " 1 & \\exp[-(t_1 + x_3)^2 x_4^{-1}] &\n", " -2 x_2 (t_1 + x_3) x_4^{-1} \\exp[-(t_1 + x_3)^2 x_4^{-1}] &\n", " x_2 (t_1 + x_3)^2 x_4^{-2} \\exp[-(t_1 + x_3)^2 x_4^{-1}]\\\\\n", " \\vdots & \\vdots & \\vdots & \\vdots\\\\\n", " 1 & \\exp[-(t_{20} + x_3)^2 x_4^{-1}] &\n", " -2 x_2 (t_{20} + x_3) x_4^{-1} \\exp[-(t_{20} + x_3)^2 x_4^{-1}] &\n", " x_2 (t_{20} + x_3)^2 x_4^{-2} \\exp[-(t_{20} + x_3)^2 x_4^{-1}]\n", " \\end{bmatrix}\n", "\n", ":math:`\\nabla f(x)`\n", "-------------------\n", "\n", "See :ref:`Exercise 6.17 ` for more details.\n", "\n", ".. math::\n", "\n", " \\nabla f(x) = \\sum_{i = 1}^m r_i(x) \\nabla r_i(x)\n", " &= J(x)^\\top R(x) \\qquad \\text{Exercise 6.17}\\\\\n", " &= \\begin{bmatrix}\n", " 1 & \\cdots & 1\\\\\n", " \\exp[-(t_1 + x_3)^2 x_4^{-1}] & \\cdots &\n", " \\exp[-(t_{20} + x_3)^2 x_4^{-1}]\\\\\n", " -2 x_2 (t_1 + x_3) x_4^{-1} \\exp[-(t_1 + x_3)^2 x_4^{-1}] & \\cdots &\n", " -2 x_2 (t_{20} + x_3) x_4^{-1} \\exp[-(t_{20} + x_3)^2 x_4^{-1}]\\\\\n", " x_2 (t_1 + x_3)^2 x_4^{-2} \\exp[-(t_1 + x_3)^2 x_4^{-1}] & \\cdots &\n", " x_2 (t_{20} + x_3)^2 x_4^{-2} \\exp[-(t_{20} + x_3)^2 x_4^{-1}]\n", " \\end{bmatrix} R(x)\\\\\n", " &= \\begin{bmatrix}\n", " \\sum_{i = 1}^m r_i(x)\\\\\n", " \\sum_{i = 1}^m r_i(x) \\exp[-(t_i + x_3)^2 x_4^{-1}]\\\\\n", " -\\sum_{i = 1}^m r_i(x) 2 x_2 (t_i + x_3) x_4^{-1}\n", " \\exp[-(t_i + x_3)^2 x_4^{-1}]\\\\\n", " \\sum_{i = 1}^m r_i(x) x_2 (t_i + x_3)^2 x_4^{-2}\n", " \\exp[-(t_i + x_3)^2 x_4^{-1}]\n", " \\end{bmatrix}\n", "\n", ":math:`S(x)`\n", "------------\n", "\n", "See :ref:`Exercise 6.18 ` for more details.\n", "\n", ".. math::\n", "\n", " S(x) &= \\sum_{i = 1}^m r_i(x) \\nabla^2 r_i(x)\n", " & \\quad & \\text{Exercise 6.18}\\\\\n", " &= \\sum_{i = 1}^m r_i(x) \\nabla\n", " \\begin{bmatrix}\n", " 1\\\\\n", " \\exp[-(t_i + x_3)^2 x_4^{-1}]\\\\\n", " -2 x_2 (t_i + x_3) x_4^{-1} \\exp[-(t_i + x_3)^2 x_4^{-1}]\\\\\n", " x_2 (t_i + x_3)^2 x_4^{-2} \\exp[-(t_i + x_3)^2 x_4^{-1}]\n", " \\end{bmatrix}\\\\\n", " &= \\sum_{i = 1}^m r_i(x)\n", " \\begin{bmatrix}\n", " 0 & 0 & 0 & 0\\\\\n", " \\star & 0 & -2 (t_i + x_3) x_4^{-1} \\exp[-(t_i + x_3)^2 x_4^{-1}] &\n", " (t_i + x_3)^2 x_4^{-2} \\exp[-(t_i + x_3)^2 x_4^{-1}]\\\\\n", " \\star & \\star & \\ast & \\ast\\\\\n", " \\star & \\star & \\star & \\ast\n", " \\end{bmatrix}\n", "\n", "where :math:`\\star` have the same value as its transposed location and\n", ":math:`\\ast` are to be completed by the reader.\n", "\n", ":math:`\\nabla^2 f(x)`\n", "---------------------\n", "\n", "See :ref:`Exercise 6.18 ` for more details.\n", "\n", ".. math::\n", "\n", " \\nabla^2 f(x) = J(x)^\\top J(x) + S(x)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 2\n", "==========\n", "\n", "Recall that :math:`\\lambda > \\sigma \\geq 0` and\n", ":math:`c \\in (1, \\lambda / \\sigma)`.\n", "\n", "The constant in (10.2.6) has a range of\n", "\n", ".. math::\n", "\n", " \\frac{c \\sigma + \\lambda}{2 \\lambda} \\in\n", " \\left( \\frac{\\sigma + \\lambda}{2 \\lambda}, 1 \\right).\n", "\n", "That constant's original form was\n", "\n", ".. math::\n", "\n", " c \\frac{\\sigma}{\\lambda} + \\frac{\\lambda - c \\sigma}{2 \\lambda},\n", "\n", "which is reduced to the proposed :math:`c \\frac{\\sigma}{\\lambda} = 1` when\n", ":math:`c = \\lambda / \\sigma`.\n", "\n", "This is what I could understand from the problem description. Otherwise,\n", "(10.2.6) is violated because\n", "\n", ".. math::\n", "\n", " c \\frac{\\sigma}{\\lambda} \\in \\left( \\frac{\\sigma}{\\lambda}, 1 \\right)\n", "\n", "unless when :math:`\\alpha = 0` in (10.2.5)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 3\n", "==========\n", "\n", ".. math::\n", "\n", " \\left[ J(x) - J(x_*) \\right]^\\top R(x_*)\n", " &= J(x)^\\top R(x_*)\n", " & \\quad & \\text{Theorem 10.2.1 and } J(x_*)^\\top R(x_*) = 0\\\\\n", " &= \\sum_{i = 1}^m r_i(x_*) \\nabla r_i(x)\n", " & \\quad & \\text{Exercise 1}\\\\\n", " &= \\sum_{i = 1}^m r_i(x_*) \\left[\n", " \\nabla r_i(x_*) + \\nabla^2 r_i(x_*) (x - x_*) +\n", " \\mathcal{O}(\\left\\Vert x - x_* \\right\\Vert_2^2)\n", " \\right]\n", " & \\quad & \\text{first-order Taylor expansion}\\\\\n", " &= S(x_*) (x - x_*) + J(x_*)^\\top R(x_*) +\n", " \\sum_{i = 1}^m r_i(x_*) \\mathcal{O}(\\left\\Vert x - x_* \\right\\Vert_2^2)\n", " & \\quad & \\text{Exercise 1}\\\\\n", " &= S(x_*) (x - x_*) + \\mathcal{O}(\\left\\Vert x - x_* \\right\\Vert_2^2)\n", "\n", "See :ref:`Exercise 1 ` and\n", ":cite:`binegarhodte,follandhodtfsv` for more details." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-10.4:\n", "\n", "Exercise 4\n", "==========" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy\n", "\n", "def newton(x):\n", " S_c = S(x)\n", " J_c = J(x)\n", " g = J_c.T * R(x)\n", " s = -numpy.linalg.inv(J_c.T * J_c + S_c) * g\n", " _ = x + s\n", "\n", " #extract scalar from matrix\n", " return _\n", "\n", "def gauss_newton(x):\n", " J_c = J(x)\n", " g = J_c.T * R(x)\n", " s = -numpy.linalg.inv(J_c.T * J_c) * g\n", " _ = x + s\n", "\n", " #extract scalar from matrix\n", " return _\n", "\n", "def secant_1d(x, a):\n", " J_c = J(x)\n", " g = J_c.T * R(x)\n", " x_next = (x - numpy.linalg.inv(a) * g).item(0)\n", " a_next = (J(x_next).T * R(x_next) - g) / (x_next - x)\n", " return x_next, a_next\n", "\n", "def secant_newton(x, A):\n", " J_c = J(x)\n", " g = J_c.T * R(x)\n", " _ = J_c.T * J_c + A\n", " x_next = (x - numpy.linalg.inv(_) * g).item(0)\n", " A_next = ((J(x_next) - J_c).T * R(x_next)) / (x_next - x)\n", " return x_next, A_next\n", "\n", "t = numpy.asarray([1.0, 2.0, 3.0])\n", "\n", "def J(x):\n", " return numpy.asmatrix([[t[0] * numpy.exp(t[0] * x)],\n", " [t[1] * numpy.exp(t[1] * x)],\n", " [t[2] * numpy.exp(t[2] * x)]])\n", "\n", "def R(x):\n", " return numpy.asmatrix([[numpy.exp(t[0] * x) - y[0]],\n", " [numpy.exp(t[1] * x) - y[1]],\n", " [numpy.exp(t[2] * x) - y[2]]])\n", "\n", "def S(x):\n", " _ = 0.0\n", " for i, r_i in enumerate(R(x)):\n", " _ += r_i.item(0) * t[i]**2 * numpy.exp(t[i] * x)\n", " return numpy.asmatrix([[_]])\n", "\n", "def f(x):\n", " return 0.5 * R(x).T * R(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = numpy.asarray([2.0, 4.0, 8.0])\n", "x_0 = 1\n", "\n", "print('Gauss-Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = gauss_newton(x_c)\n", " x_c = _.item(0)\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = newton(x_c)\n", " x_c = _.item(0)\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Secant-1D')\n", "x_c = x_0\n", "a_c = J(x_0).T * J(x_0) + S(x_0)\n", "for i in range(8):\n", " _ = secant_1d(x_c, a_c)\n", " x_c = _[0]\n", " a_c = _[1]\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Secant-Newton')\n", "x_c = x_0\n", "A_c = J(x_0).T * J(x_0) * 0\n", "for i in range(8):\n", " _ = secant_newton(x_c, A_c)\n", " x_c = _[0]\n", " A_c = _[1]\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-10.5:\n", "\n", "Exercise 5\n", "==========" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = numpy.asarray([2.0, 4.0, -8.0])\n", "x_0 = -0.7\n", "\n", "print('Gauss-Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = gauss_newton(x_c)\n", " x_c = _.item(0)\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = newton(x_c)\n", " x_c = _.item(0)\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Secant-1D')\n", "x_c = x_0\n", "a_c = J(x_0).T * J(x_0) + S(x_0)\n", "for i in range(8):\n", " _ = secant_1d(x_c, a_c)\n", " x_c = _[0]\n", " a_c = _[1]\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "\n", "print('Secant-Newton')\n", "x_c = x_0\n", "A_c = J(x_0).T * J(x_0) * 0\n", "for i in range(8):\n", " _ = secant_newton(x_c, A_c)\n", " x_c = _[0]\n", " A_c = _[1]\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = numpy.asarray([2.0, 4.0, 10.0])\n", "x_0 = 1.0\n", "\n", "print('Secant-Newton')\n", "x_c = x_0\n", "A_c = J(x_0).T * J(x_0) * 0\n", "for i in range(7):\n", " _ = secant_newton(x_c, A_c)\n", " x_c = _[0]\n", " A_c = _[1]\n", " print(i + 1, x_c, f(x_c).item(0), (J(x_c).T * R(x_c)).item(0))\n", "print(S(x_c).item(0))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 6\n", "==========\n", "\n", ".. math::\n", "\n", " S_G - S_N &= (x_+^G - x_c) - (x_+^N - x_c)\\\\\n", " &= -(J_c^\\top J_c)^{-1} J_c^\\top R_c +\n", " (J_c^\\top J_c + S_c)^{-1} J_c^\\top R_c\\\\\n", " &= -(J_c^\\top J_c)^{-1}\n", " \\left[\n", " I - J_c^\\top J_c (J_c^\\top J_c + S_c)^{-1}\n", " \\right] J_c^\\top R_c\\\\\n", " &= -(J_c^\\top J_c)^{-1}\n", " \\left[\n", " -(J_c^\\top J_c + S_c) + J_c^\\top J_c\n", " \\right] S_N\\\\\n", " &= (J_c^\\top J_c)^{-1} S_c S_N\n", "\n", "Let :math:`\\nabla f(x_c) = \\nabla f_c = J_c^\\top R_c`.\n", "\n", ".. math::\n", "\n", " x_+^G - x_* &= x_c - x_* - (J_c^\\top J_c)^{-1} J_c^\\top R_c\\\\\n", " &= -(J_c^\\top J_c)^{-1} \\left[\n", " \\nabla f_c + J_c^\\top J_c (x_* - x_c)\n", " \\right]\\\\\n", " &= -(J_c^\\top J_c)^{-1} \\left[\n", " \\nabla f_c - S_c (x_* - x_c) +\n", " (J_c^\\top J_c + S_c) (x_* - x_c)\n", " \\right]\\\\\n", " &= (J_c^\\top J_c)^{-1} \\left[\n", " S_c (x_* - x_c) + \\nabla f_* - \\nabla f_c -\n", " (J_c^\\top J_c + S_c) (x_* - x_c)\n", " \\right]\n", " & \\quad & \\nabla f_* = J_*^\\top R_* = 0 \\text{ by Lemma 4.3.1}\\\\\n", " \\left\\Vert x_+^G - x_* \\right\\Vert_2\n", " &= \\left\\Vert\n", " (J_c^\\top J_c)^{-1} \\left[\n", " S_c (x_* - x_c) + \\nabla f_* - \\nabla f_c -\n", " (J_c^\\top J_c + S_c) (x_* - x_c)\n", " \\right]\n", " \\right\\Vert_2\\\\\n", " &\\leq \\left\\Vert (J_c^\\top J_c)^{-1} \\right\\Vert_2\n", " \\left\\Vert S_c \\right\\Vert_2 \\left\\Vert x_c - x_* \\right\\Vert_2 +\n", " \\left\\Vert (J_c^\\top J_c)^{-1} \\right\\Vert_2\n", " \\left\\Vert\n", " \\nabla f_* - \\nabla f_c - (J_c^\\top J_c + S_c) (x_* - x_c)\n", " \\right\\Vert_2\n", " & \\quad & \\text{(3.1.10) and Exercise 3.4}\\\\\n", " &\\leq \\left\\Vert (J_c^\\top J_c)^{-1} \\right\\Vert_2\n", " \\left\\Vert S_c \\right\\Vert_2 \\left\\Vert x_c - x_* \\right\\Vert_2 +\n", " \\frac{\\gamma}{2} \\left\\Vert (J_c^\\top J_c)^{-1} \\right\\Vert_2\n", " \\left\\Vert x_* - x_c \\right\\Vert_2^2\n", " & \\quad & \\text{Lemma 4.1.12}\\\\\n", " &\\leq \\left\\Vert (J_c^\\top J_c)^{-1} \\right\\Vert_2\n", " \\left\\Vert S_c \\right\\Vert_2 \\left\\Vert x_c - x_* \\right\\Vert_2 +\n", " \\mathcal{O}\\left( \\left\\Vert x_c - x_* \\right\\Vert_2^2 \\right)\n", " & \\quad & \\text{Exercise 3.4}\n", "\n", "In the case of :math:`n = 1`, applying (3.1.2c) gives (10.5.1).\n", "See :ref:`Exercise 3.4 ` for more details." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7\n", "==========\n", "\n", "Let :math:`x_0 = x_* + \\delta` where :math:`\\delta > 0`. When\n", ":math:`\\lvert \\delta \\rvert` is sufficiently small,\n", "\n", ".. math::\n", "\n", " x_1^G &= x_0 - (J_0^\\top J_0)^{-1} J_0^\\top R_0\n", " & \\quad & \\text{by definition of Gauss-Newton}\\\\\n", " \\left\\Vert x_1^G - x_0 \\right\\Vert_2\n", " &\\leq \\alpha \\lvert \\delta \\rvert\n", " & \\quad & \\text{Lemma 4.2.1}.\n", "\n", "Hence :math:`x_1 \\cong x_0 - \\alpha \\delta`. Invoking (10.5.1) yields\n", "\n", ".. math::\n", "\n", " (x_+^G - x_*) - \\left( S_c (J_c^\\top J_c)^{-1} \\right) (x_c - x_*)\n", " &= \\mathcal{O}(\\left\\vert x_c - x_* \\right\\vert^2)\\\\\n", " x_+^G - x_*\n", " &= \\left( S_c (J_c^\\top J_c)^{-1} \\right) (x_c - x_*) +\n", " \\mathcal{O}(\\left\\vert x_c - x_* \\right\\vert^2)\\\\\n", " x_1 - x_*\n", " &= \\delta \\left[\n", " \\left( S_0 (J_0^\\top J_0)^{-1} \\right) +\n", " \\mathcal{O}(\\delta)\n", " \\right]\\\\\n", " \\left\\vert \\frac{x_1 - x_*}{\\delta} \\right\\vert\n", " &= \\left\\vert\n", " \\left( S_0 (J_0^\\top J_0)^{-1} \\right) + \\mathcal{O}(\\delta)\n", " \\right\\vert." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "delta = 0.0001\n", "\n", "y = numpy.asarray([2.0, 4.0, -4.0])\n", "x_star = -0.3719287\n", "J_star = J(x_star)\n", "S_star = S(x_star)\n", "print(J_star.T * J_star, S_star)\n", "\n", "x_0 = x_star + delta\n", "e = abs((gauss_newton(x_0) - x_0) / delta)\n", "print(abs(x_0 - e * delta - x_star) / (x_0 - x_star))\n", "\n", "\n", "y = numpy.asarray([2.0, 4.0, -8.0])\n", "x_star = -0.7914863\n", "J_star = J(x_star)\n", "S_star = S(x_star)\n", "print(J_star.T * J_star, S_star)\n", "\n", "x_0 = x_star + delta\n", "e = abs((gauss_newton(x_0) - x_0) / delta)\n", "print(abs(x_0 - e * delta - x_star) / (x_0 - x_star))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8\n", "==========\n", "\n", "The nonlinearity measure for the latter configuration is twice that of the\n", "former's settings. This could explain why both methods converged to the\n", "wrong result.\n", "\n", "The Gauss-Newton method converges faster in both settings compared to Newton's\n", "method. In the latter configuration, the Hessian of Newton's method shrinks\n", "towards the zero matrix as the magnitude of the gradient goes to zero.\n", "Gauss-Newton did not experience that despite having a faster shrinking gradient." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def J(x):\n", " x = x.flat\n", " return numpy.asmatrix([[numpy.exp(x[0] + x[1] * t[0]), t[0] * numpy.exp(x[0] + x[1] * t[0])],\n", " [numpy.exp(x[0] + x[1] * t[1]), t[1] * numpy.exp(x[0] + x[1] * t[1])],\n", " [numpy.exp(x[0] + x[1] * t[2]), t[2] * numpy.exp(x[0] + x[1] * t[2])],\n", " [numpy.exp(x[0] + x[1] * t[3]), t[3] * numpy.exp(x[0] + x[1] * t[3])]])\n", "\n", "def R(x):\n", " x = x.flat\n", " return numpy.asmatrix([[numpy.exp(x[0] + x[1] * t[0]) - y[0]],\n", " [numpy.exp(x[0] + x[1] * t[1]) - y[1]],\n", " [numpy.exp(x[0] + x[1] * t[2]) - y[2]],\n", " [numpy.exp(x[0] + x[1] * t[3]) - y[3]]])\n", "\n", "def S(x):\n", " _ = numpy.zeros((len(x), len(x)))\n", " for i, r_i in enumerate(R(x)):\n", " a = numpy.exp(x[0] + x[1] * t[i]).item(0)\n", " b = t[i] * numpy.exp(x[0] + x[1] * t[i]).item(0)\n", " c = t[i]**2 * numpy.exp(x[0] + x[1] * t[i]).item(0)\n", " _ += r_i.item(0) * numpy.asmatrix([[a, b],\n", " [b, c]])\n", " return _\n", "\n", "def f(x):\n", " return 0.5 * R(x).T * R(x)\n", "\n", "def fp(x):\n", " return J(x).T * R(x)\n", "\n", "def fpp(x):\n", " return J(x).T * J(x) + S(x)\n", "\n", "t = numpy.asarray([-2.0, -1.0, 0.0, 1.0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = numpy.asarray([0.5, 1.0, 2.0, 4.0])\n", "x_0 = numpy.asmatrix([[1.0],\n", " [1.0]])\n", "\n", "print('Gauss-Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = gauss_newton(x_c)\n", " x_c = _\n", " print(i + 1, numpy.asarray(x_c).squeeze(),\n", " numpy.asarray(f(x_c)).squeeze(),\n", " numpy.linalg.norm(fp(x_c)), numpy.linalg.norm(fpp(x_c)))\n", "\n", "print('Newton')\n", "x_c = x_0\n", "for i in range(8):\n", " _ = newton(x_c)\n", " x_c = _\n", " print(i + 1, numpy.asarray(x_c).squeeze(),\n", " numpy.asarray(f(x_c)).squeeze(),\n", " numpy.linalg.norm(fp(x_c)), numpy.linalg.norm(fpp(x_c)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = numpy.asarray([5.0, 1.0, 2.0, -4.0])\n", "x_0 = numpy.asmatrix([[1.0],\n", " [1.0]])\n", "\n", "print('Gauss-Newton')\n", "x_c = x_0\n", "for i in range(16):\n", " _ = gauss_newton(x_c)\n", " x_c = _\n", " print(i + 1, numpy.asarray(x_c).squeeze(),\n", " numpy.asarray(f(x_c)).squeeze(),\n", " numpy.linalg.norm(fp(x_c)), numpy.linalg.norm(fpp(x_c)))\n", "\n", "print('Newton')\n", "x_c = x_0\n", "for i in range(16):\n", " _ = newton(x_c)\n", " x_c = _\n", " print(i + 1, numpy.asarray(x_c).squeeze(),\n", " numpy.asarray(f(x_c)).squeeze(),\n", " numpy.linalg.norm(fp(x_c)), numpy.linalg.norm(fpp(x_c)))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 9\n", "==========\n", "\n", "There is no value in implementing this at the moment since optimization packages\n", "provide this and the previous exercises have implemented Gauss-Newton and\n", "line-search independently.\n", "\n", "The section after (10.2.12) mentioned that the damped Gauss-Newton took ten\n", "times more iterations than Newton's method for Example 10.2.4. Since the\n", "symmetric secant method in Example 9.3.2 took twice as long as Newton's method,\n", "I hypothesize the damped Gauss-Newton will be slower than the secant\n", "approximation." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 10\n", "===========\n", "\n", "Let :math:`s = x_+ - x_c \\in \\mathbb{R}^n`.\n", "\n", ".. math::\n", "\n", " \\min_{x_+} \\left\\Vert R(x_c) + J(x_c) s \\right\\Vert_2\n", " &\\equiv \\min_{x_+} \\left(\n", " R(x_c) + J(x_c) s \\right)^\\top \\left( R(x_c) + J(x_c) s\n", " \\right)\n", " & \\quad & \\text{square root is monotonically increasing}\\\\\n", " &\\equiv \\min_{x_+} \\frac{1}{2} \\left(\n", " R(x_c)^\\top R(x_c) + 2 R(x_c)^\\top J(x_c) s +\n", " s^\\top J(x_c)^\\top J(x_c) s\n", " \\right)\n", " & \\quad & \\text{minimum point is not affected by positive scaling}\\\\\n", " &= \\min_{x_+}\n", " f(x_c) + \\nabla f(x_c)^\\top s + \\frac{1}{2} s^\\top J(x_c)^\\top J(x_c) s\n", " & \\quad & \\text{see Exercise 1}\n", "\n", "subject to :math:`\\left\\Vert s \\right\\Vert_2 \\leq \\delta_c`.\n", "See :ref:`Exercise 1 ` for more details.\n", "\n", "(10.2.13) now matches the problem format of (6.4.1), so applying Lemma\n", "6.4.1 gives (10.2.14)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 11\n", "===========\n", "\n", "Let\n", "\n", ".. math::\n", "\n", " \\begin{gather*}\n", " x_+, x_c \\in \\mathbb{R}^n,\\\\\n", " f_c = f(x_c) \\in \\mathbb{R},\\\\\n", " J_c = J(x_c) \\in \\mathbb{R}^{m \\times n},\\\\\n", " R_c = R(x_c) \\in \\mathbb{R}^m, \\text{ and}\\\\\n", " s = x_+ - x_c = -\\left( J_c^\\top J_c + \\mu_c I_n \\right)^{-1} J_c^\\top R_c.\n", " \\end{gather*}\n", "\n", "Assume that either :math:`J_c^\\top J_c` is positive definite or\n", ":math:`\\mu_c \\geq 0` is picked such that :math:`J_c^\\top J_c + \\mu_c I_n` is\n", "positive definite.\n", "\n", ".. math::\n", "\n", " \\nabla f_c^\\top s\n", " &= \\left[ R_c^\\top J_c \\right]\n", " \\left[ -\\left( J_c^\\top J_c + \\mu_c I \\right)^{-1} J_c^\\top R_c \\right]\n", " & \\quad & \\text{Exercise 1}\\\\\n", " &= -\\left( R_c^\\top J_c \\right)\n", " \\left( J_c^\\top J_c + \\mu_c I \\right)^{-1}\n", " \\left( J_c^\\top R_c \\right)\\\\\n", " &< 0\n", "\n", "satisfies the definition of a descent direction (6.2.3).\n", "See :ref:`Exercise 1 ` for more details." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 12\n", "===========\n", "\n", "Let :math:`R \\in \\mathbb{R}^m`, :math:`J \\in \\mathbb{R}^{m \\times n}`, and\n", ":math:`s \\in \\mathbb{R}^n`.\n", "\n", "Denote :math:`A = \\begin{bmatrix} J^\\top & \\mu^{1 / 2} I \\end{bmatrix}^\\top \\in\n", "\\mathbb{R}^{(m + n) \\times n}` and :math:`b =\n", "\\begin{bmatrix} R^\\top & 0 \\end{bmatrix}^\\top \\in \\mathbb{R}^{m + n}`.\n", "\n", ".. math::\n", "\n", " \\min_s \\left\\Vert As + b \\right\\Vert\n", " &\\equiv \\min_s\n", " \\frac{1}{2} \\left( As + b \\right)^\\top \\left( As + b \\right)\\\\\n", " &= \\min_s \\frac{1}{2} s^\\top A^\\top A s + b^\\top As + \\frac{1}{2} b^\\top b\n", "\n", "The solution to this unconstrained minimization problem is\n", "\n", ".. math::\n", "\n", " 0 &= \\frac{\\partial}{\\partial s} \\frac{1}{2} s^\\top A^\\top As +\n", " b^\\top As + \\frac{1}{2} b^\\top b\\\\\n", " &= A^\\top As + A^\\top b\\\\\n", " s &= -\\left( A^\\top A \\right)^{-1} A^\\top b\\\\\n", " &= -\\left( J^\\top J + \\mu I \\right)^{-1} J^\\top R." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 13\n", "===========\n", "\n", "Let :math:`J_* \\triangleq J(x_*)`, :math:`J_k \\triangleq J(x_k)`,\n", ":math:`R_* \\triangleq R(x_*)`, and :math:`R_k \\triangleq R(x_k)`.\n", "\n", "(a)\n", "---\n", "\n", "According to Theorem 3.1.4, the Levenberg-Marquardt method is well-defined if\n", "\n", ".. math::\n", "\n", " \\left\\Vert\n", " \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1}\n", " \\left( J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right)\n", " \\right\\Vert_2\n", " &\\leq \\left\\Vert \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1} \\right\\Vert_2\n", " \\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2\n", " & \\quad & \\text{(3.1.10)}\\\\\n", " &\\leq \\frac{\n", " \\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2\n", " }{\\lambda + b}\n", " & \\quad & \\text{(a.1)}\\\\\n", " &< 1,\n", "\n", "which implies that\n", ":math:`\\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2 <\n", "\\lambda + b`.\n", "\n", "Applying (3.2.20) yields\n", "\n", ".. math::\n", "\n", " \\left\\Vert\n", " \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1}\n", " \\right\\Vert_2\n", " &\\leq \\frac{\n", " \\left\\Vert \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1} \\right\\Vert_2\n", " }{\n", " 1 -\n", " \\left\\Vert\n", " \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1}\n", " \\left( J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right)\n", " \\right\\Vert_2\n", " }\\\\\n", " &\\leq \\frac{1}{\n", " \\lambda + b -\n", " \\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2\n", " }.\n", "\n", "By inspection, the lower bound on the inverse is attained when\n", ":math:`\\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2 = 0`.\n", "\n", "In order to make subsequent derivations more elegant, it is beneficial to\n", "obtain an upper bound for the inverse using the assumption that\n", ":math:`\\lambda > \\sigma \\geq 0`:\n", "\n", ".. math::\n", "\n", " \\left\\Vert\n", " \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1}\n", " \\right\\Vert_2 \\leq \\frac{c}{\\lambda + b}\n", " \\quad \\text{where} \\quad\n", " c \\in \\left( 1, \\frac{\\lambda + b}{\\sigma + b} \\right)\n", " \\text{ for } x_k \\in N(x_*, \\epsilon_1).\n", "\n", "Since the conditions of Theorem 10.2.1 are satisfied,\n", ":math:`\\left\\Vert J(x) \\right\\Vert_2 \\leq \\alpha \\, \\forall x \\in D`. Applying\n", "the results of :ref:`Exercise 3.4 `,\n", ":ref:`Exercise 3.8 `, and the proof techniques of\n", "Theorem 10.2.1 gives\n", "\n", ".. math::\n", "\n", " x_{k + 1} - x_*\n", " &= x_k - x_* - \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1} J_k^\\top R_k\\\\\n", " &= -\\left( J_k^\\top J_k + \\mu_k I \\right)^{-1}\n", " \\left[\n", " J_k^\\top R_k + (J_k^\\top J_k + \\mu_k I) (x_* - x_k)\n", " \\right]\\\\\n", " \\left\\Vert x_{k + 1} - x_* \\right\\Vert_2\n", " &= \\left\\Vert\n", " -\\left( J_k^\\top J_k + \\mu_k I \\right)^{-1}\n", " \\left[\n", " J_k^\\top R_* - J_k^\\top \\left( R_* - R_k - J_k (x_* - x_k) \\right) +\n", " \\mu_k (x_* - x_k)\n", " \\right]\n", " \\right\\Vert_2\\\\\n", " &\\leq \\left\\Vert \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1} \\right\\Vert_2\n", " \\left[\n", " \\left\\Vert J_k^\\top R_* \\right\\Vert_2 +\n", " \\left\\Vert J_k^\\top \\right\\Vert_2\n", " \\left\\Vert R_* - R_k - J_k (x_* - x_k) \\right\\Vert_2 +\n", " \\left\\Vert \\mu_k (x_* - x_k) \\right\\Vert_2\n", " \\right]\n", " & \\quad & \\text{(3.1.10) and Exercise 3.4}\\\\\n", " &\\leq \\frac{c}{\\lambda + b}\n", " \\left[\n", " (\\sigma + b) \\left\\Vert x_k - x_* \\right\\Vert_2 +\n", " \\frac{\\alpha \\gamma}{2} \\left\\Vert x_k - x_* \\right\\Vert_2^2\n", " \\right]\n", " & \\quad & \\text{(a.1), (10.2.4), (4.1.12), and Exercise 3.8}.\n", "\n", "Picking\n", "\n", ".. math::\n", "\n", " \\epsilon =\n", " \\min \\left\\{\n", " \\epsilon_1, \\frac{(\\lambda + b) - c(\\sigma + b)}{c \\alpha \\gamma}\n", " \\right\\}\n", "\n", "simplifies the preceding result to\n", "\n", ".. math::\n", "\n", " \\left\\Vert x_{k + 1} - x_* \\right\\Vert_2\n", " &\\leq \\left\\Vert x_k - x_* \\right\\Vert_2\n", " \\left[\n", " \\frac{c (\\sigma + b)}{\\lambda + b} +\n", " \\frac{c \\alpha \\gamma}{2 (\\lambda + b)}\n", " \\left\\Vert x_k - x_* \\right\\Vert_2\n", " \\right]\\\\\n", " &\\leq \\left\\Vert x_k - x_* \\right\\Vert_2\n", " \\left[\n", " \\frac{c (\\sigma + b)}{\\lambda + b} +\n", " \\frac{(\\lambda + b) - c(\\sigma + b)}{2 (\\lambda + b)}\n", " \\right]\\\\\n", " &\\leq \\left\\Vert x_k - x_* \\right\\Vert_2\n", " \\frac{c (\\sigma + b) + (\\lambda + b)}{2 (\\lambda + b)}\\\\\n", " &< \\left\\Vert x_k - x_* \\right\\Vert_2.\n", "\n", "(b)\n", "---\n", "\n", "When :math:`R_* = 0`, :math:`\\sigma` in (10.2.4) can be set to zero.\n", "Substituting this into the results of (a) shows that :math:`\\{ x_k \\}` converges\n", "to :math:`x_*`.\n", "\n", "When :math:`\\mu_k = \\mathcal{O}(\\left\\Vert J_k^\\top R_k \\right\\Vert_2)`,\n", "\n", ".. math::\n", "\n", " \\left\\Vert \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1} \\right\\Vert_2\n", " &= \\left\\Vert \\left(\n", " J_*^\\top J_* + \\mathcal{O}(\\left\\Vert J_*^\\top R_* \\right\\Vert_2) I\n", " \\right)^{-1} \\right\\Vert_2\\\\\n", " &= \\left\\Vert \\left(\n", " J_*^\\top J_*\n", " \\right)^{-1} \\right\\Vert_2\\\\\n", " &\\leq \\frac{1}{\\lambda}\n", " & \\quad & \\text{(a.1)}\n", "\n", "and consequently\n", "\n", ".. math::\n", "\n", " \\left\\Vert\n", " \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1}\n", " \\right\\Vert_2 \\leq \\frac{c}{\\lambda}\n", " \\quad \\text{where} \\quad\n", " c \\in \\left( 1, \\frac{\\lambda}{\\sigma} \\right)\n", " \\text{ for } x_k \\in N(x_*, \\epsilon_1).\n", "\n", "Applying the same procedures as in (a) gives\n", "\n", ".. math::\n", "\n", " \\left\\Vert x_{k + 1} - x_* \\right\\Vert_2\n", " &\\leq \\left\\Vert \\left( J_k^\\top J_k + \\mu_k I \\right)^{-1} \\right\\Vert_2\n", " \\left[\n", " \\left\\Vert J_k^\\top R_* \\right\\Vert_2 +\n", " \\left\\Vert J_k^\\top \\right\\Vert_2\n", " \\left\\Vert R_* - R_k - J_k (x_* - x_k) \\right\\Vert_2 +\n", " \\left\\Vert \\mu_k (x_* - x_k) \\right\\Vert_2\n", " \\right]\\\\\n", " &\\leq \\frac{c}{\\lambda}\n", " \\left[\n", " \\mathcal{O}(\\left\\Vert J_k^\\top R_k \\right\\Vert_2)\n", " \\left\\Vert x_k - x_* \\right\\Vert_2 +\n", " \\frac{\\alpha \\gamma}{2} \\left\\Vert x_k - x_* \\right\\Vert_2^2\n", " \\right].\n", "\n", "Picking :math:`\\epsilon = \\min \\{ \\epsilon_1, \\epsilon_2 \\}` so that (b.1) is\n", "satisfied\n", "(i.e. :math:`\\mathcal{O}(\\left\\Vert J_k^\\top R_k \\right\\Vert_2) \\leq 0`)\n", "proves that the convergence rate is q-quadratic.\n", "\n", "(a.1)\n", "^^^^^\n", "\n", "Assuming :math:`\\lambda > 0` is the smallest eigenvalue of\n", ":math:`J_*^\\top J_*` i.e. the spectral decomposition exists,\n", "\n", ".. math::\n", "\n", " \\left\\Vert \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1} \\right\\Vert_2\n", " &= \\left(\n", " \\min_{v \\in \\mathbb{R}^n \\setminus 0}\n", " \\frac{\n", " \\left\\Vert (J_*^\\top J_* + \\mu_* I) v \\right\\Vert_2\n", " }{\n", " \\left\\Vert v \\right\\Vert_2\n", " }\n", " \\right)^{-1}\n", " & \\quad & \\text{(3.1.18)}\\\\\n", " &\\leq \\left(\n", " \\min_{v \\in \\mathbb{R}^n \\setminus 0}\n", " \\frac{\n", " \\left\\Vert J_*^\\top J_* v \\right\\Vert_2 +\n", " \\mu_* \\left\\Vert v \\right\\Vert_2\n", " }{\n", " \\left\\Vert v \\right\\Vert_2\n", " }\n", " \\right)^{-1}\n", " & \\quad & \\text{(3.1.1c) and (3.1.1b)}\\\\\n", " &\\leq \\left(\n", " \\min_{v \\in \\mathbb{R}^n \\setminus 0}\n", " \\frac{\n", " \\left\\Vert J_*^\\top J_* v \\right\\Vert_2\n", " }{\n", " \\left\\Vert v \\right\\Vert_2\n", " } + b\n", " \\right)^{-1}\n", " & \\quad & \\{ \\mu_k \\} < b \\text{ and } b > 0\\\\\n", " &\\leq \\frac{1}{\\lambda + b}\n", " & \\quad & \\text{Exercise 3.6}.\n", "\n", "See :ref:`Exercise 3.6 ` for more details on the\n", "last inequality relation.\n", "\n", "(b.1)\n", "^^^^^\n", "\n", ".. math::\n", "\n", " \\left\\Vert\n", " \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1}\n", " \\left( J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right)\n", " \\right\\Vert_2\n", " &\\leq \\left\\Vert \\left( J_*^\\top J_* + \\mu_* I \\right)^{-1} \\right\\Vert_2\n", " \\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2\\\\\n", " &\\leq \\frac{\n", " \\left\\Vert J_k^\\top J_k + \\mu_k I - J_*^\\top J_* \\right\\Vert_2\n", " }{\\lambda}\\\\\n", " &\\leq \\frac{\n", " \\left\\Vert J_k^\\top J_k - J_*^\\top J_* \\right\\Vert_2 +\n", " \\mathcal{O}(\\left\\Vert J_k^\\top R_k \\right\\Vert_2)\n", " }{\\lambda}\\\\\n", " &\\leq 1\n", "\n", "implies that\n", "\n", ".. math::\n", "\n", " \\mathcal{O}(\\left\\Vert J_k^\\top R_k \\right\\Vert_2) \\leq\n", " \\lambda - \\left\\Vert J_k^\\top J_k - J_*^\\top J_* \\right\\Vert_2." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 14\n", "===========\n", "\n", "See :ref:`Exercise 4 ` and\n", ":ref:`Exercise 5 ` for code. The suggested\n", "secant method converges slower than (10.3.4) in\n", ":ref:`Exercise 4 ` but wins\n", "out in :ref:`Exercise 5 `.\n", "\n", "The suggested secant method did not beat Newton's method in terms of\n", "convergence rate. (10.3.4) beats Newton's method when Gauss-Newton dominated." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 15\n", "===========\n", "\n", "See :ref:`Exercise 5 ` for code." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 16\n", "===========\n", "\n", "Define :math:`S = \\{ T T^\\top \\mid T T^\\top \\in Q(y_c, s_c) \\}`.\n", "\n", ".. math::\n", "\n", " A_+ - A_c\n", " &= \\frac{\n", " (y^\\#_c - A_c s_c) {y^\\#_c}^\\top +\n", " y^\\#_c (y^\\#_c - A_c s_c)^\\top\n", " }{\n", " {y^\\#_c}^\\top s_c\n", " } -\n", " \\frac{\n", " (y^\\#_c - A_c s_c)^\\top s_c y^\\#_c {y^\\#_c}^\\top\n", " }{\n", " ({y^\\#_c}^\\top s_c)^2\n", " }\n", " & \\quad & \\text{(9.3.4) applied to (10.3.5)}\\\\\n", " &= \\frac{\n", " (y^\\#_c - A_c s_c) s_c^\\top A_+^\\top +\n", " A_+ s_c (y^\\#_c - A_c s_c)^\\top\n", " }{\n", " s_c^\\top A_+^\\top s_c\n", " } -\n", " \\frac{\n", " (A_+ s_c - A_c s_c)^\\top s_c A_+ s_c s_c^\\top A_+^\\top\n", " }{\n", " (s_c^\\top A_+^\\top s_c)^2\n", " }\n", " & \\quad & A_+ s_c = y^\\#_c\\\\\n", " &= \\frac{\n", " (y_c^\\# - A_c s_c) y_c^\\top + y_c (y_c^\\# - A_c s_c)^\\top\n", " }{\n", " y_c^\\top s_c\n", " } -\n", " \\frac{\n", " (y_c^\\# - A_c s_c)^\\top s_c y_c y_c^\\top\n", " }{\n", " (y_c^\\top s_c)^2\n", " }\n", " & \\quad & A_+ \\in Q(y^\\#_c, s_c) \\cap S" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 17\n", "===========\n", "\n", "Let :math:`x \\in \\mathbb{R}^n`, :math:`R \\in \\mathbb{R}^m`, and\n", ":math:`J \\in \\mathbb{R}^{m \\times n}` have full column rank i.e.\n", ":math:`J^\\top J` is nonsingular.\n", "\n", "(a) shows that the solution to the linear least-squares problem (b) is\n", "equivalent to the Euclidean projection of :math:`R` onto the linear subspace\n", "spanned by the columns of :math:`J`.\n", "\n", ".. math::\n", "\n", " R^\\top J (J^\\top J)^{-1} J^\\top R\n", " &= \\left\\Vert R \\right\\Vert_2\n", " \\left\\Vert J (J^\\top J)^{-1} J^\\top R \\right\\Vert_2 \\cos(\\phi)\n", " & \\quad & \\text{definition of dot product}\\\\\n", " \\cos(\\phi)\n", " &= \\frac{\n", " R^\\top J (J^\\top J)^{-1} J^\\top R\n", " }{\n", " \\left\\Vert R \\right\\Vert_2\n", " \\left\\Vert J (J^\\top J)^{-1} J^\\top R \\right\\Vert_2\n", " }\n", "\n", "(a)\n", "---\n", "\n", "The Euclidean projection of a point :math:`x_0 \\in \\mathbb{R}^n` on a set\n", ":math:`\\mathcal{S} \\subseteq \\mathbb{R}^n` is a point that achieves the smallest\n", "Euclidean distance from :math:`x_0` to the set. Symbolically this means\n", "\n", ".. math::\n", "\n", " \\min_{x \\in \\mathcal{S}} \\left\\Vert x - x_0 \\right\\Vert_2.\n", "\n", "(b)\n", "---\n", "\n", "Given\n", "\n", ".. math::\n", "\n", " \\min_x \\left\\Vert Jx - R \\right\\Vert_2^2\n", " &= \\min_x (Jx - R)^\\top (Jx - R)\n", " & \\quad & \\text{(3.1.2c)}\\\\\n", " &= \\min_x x^\\top J^\\top J x - 2 R^\\top J x + R^\\top R,\n", "\n", "the minima is\n", "\n", ".. math::\n", "\n", " 0 &= \\frac{\\partial}{\\partial x}\n", " \\left( x^\\top J^\\top J x - 2 R^\\top J x + R^\\top R \\right)\\\\\n", " &= 2 J^\\top J x - 2 J^\\top R\n", " & \\quad & \\text{Matrix Cookbook (81) and (69)}\\\\\n", " x &= (J^\\top J)^{-1} J^\\top R." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18\n", "===========\n", "\n", "Let :math:`\\alpha \\in \\mathbb{R}`, :math:`s \\in \\mathbb{R}^n`, and\n", ":math:`M \\in \\mathbb{R}^{n \\times n}`.\n", "Suppose :math:`\\alpha > 0` and :math:`M` is positive definite i.e.\n", ":math:`M \\triangleq L L^\\top`.\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\max_v \\phi(v) =\n", " \\max_v \\frac{\\alpha v^\\top s}{(v^\\top M v)^{1 / 2}}\n", " &= \\max_v \\frac{\\alpha v^\\top s}{\\left\\Vert L^\\top v \\right\\Vert_2}\\\\\n", " &\\leq \\max_v\n", " \\frac{\n", " \\alpha \\left\\Vert v \\right\\Vert_2 \\left\\Vert s \\right\\Vert_2\n", " }{\n", " \\left\\Vert L^\\top v \\right\\Vert_2\n", " }\n", " & \\quad & \\text{Cauchy-Schwarz Inequality}\\\\\n", " &\\leq \\alpha \\left\\Vert s \\right\\Vert_2 \\left\\Vert L^{-\\top} \\right\\Vert_2\n", " & \\quad & \\text{(3.1.18)}\\\\\n", " &\\leq \\alpha \\left\\Vert s \\right\\Vert_2 \\left\\Vert L^{-\\top} \\right\\Vert_F\n", " & \\quad & \\text{(3.1.12)}\n", "\n", "Applying (a.1) shows that the maximum value is\n", ":math:`\\alpha \\left( s^\\top M^{-1} s \\right)^{1 / 2}`.\n", "\n", "(a.1)\n", "^^^^^\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\tr}{tr}\n", " \\alpha \\left( s^\\top M^{-1} s \\right)^{1 / 2}\n", " &= \\alpha \\left\\Vert L^{-1} s \\right\\Vert_2\n", " & \\quad & \\text{(3.1.2c)}\\\\\n", " &\\leq \\alpha \\left\\Vert L^{-1} \\right\\Vert_F \\left\\Vert s \\right\\Vert_2\n", " & \\quad & \\text{(3.1.16)}\\\\\n", " &= \\alpha \\text{tr}(L^{-\\top} L^{-1}) \\left\\Vert s \\right\\Vert_2\n", " & \\quad & \\text{(3.1.14)}\\\\\n", " &= \\alpha \\text{tr}(L^{-1} L^{-\\top}) \\left\\Vert s \\right\\Vert_2\n", " & \\quad & \\tr(AB) = \\tr(BA)\\\\\n", " &= \\alpha \\left\\Vert L^{-\\top} \\right\\Vert_F \\left\\Vert s \\right\\Vert_2\n", "\n", "(b)\n", "---\n", "\n", "If :math:`M = (J^\\top J)^{-1}` and :math:`s = -M J^\\top R`, then\n", "\n", ".. math::\n", "\n", " \\max_v \\phi(v)\n", " &\\leq \\alpha \\left( s^\\top M^{-1} s \\right)^{1 / 2}\n", " & \\quad & \\text{(10.5.2) and (a)}\\\\\n", " &\\leq \\alpha \\left( R^\\top J M^\\top J^\\top J M J^\\top R \\right)^{1 / 2}\\\\\n", " &\\leq \\alpha \\left\\Vert J M J^\\top R \\right\\Vert_2.\n", "\n", "Let :math:`H \\in \\mathbb{R}^{n \\times n}` be :math:`\\nabla^2 f(x)` or an\n", "approximation of it. If :math:`M = H^{-1}` and :math:`s = -M J^\\top R`, then\n", "\n", ".. math::\n", "\n", " \\max_v \\phi(v)\n", " &\\leq \\alpha \\left( s^\\top M^{-1} s \\right)^{1 / 2}\n", " & \\quad & \\text{(10.5.2) and (a)}\\\\\n", " &\\leq \\alpha \\left( R^\\top J H^{-\\top} J^\\top R \\right)^{1 / 2}.\n", "\n", "It is interesting to note that (10.4.2) can be written in the form of (10.5.2)\n", "where\n", "\n", ".. math::\n", "\n", " \\begin{gather*}\n", " M = (J^\\top J)^{-1},\\\\\n", " s = -M J^\\top R,\\\\\n", " v = -J^\\top R = -\\nabla f(x), \\text{ and}\\\\\n", " \\alpha = \\left\\Vert R \\right\\Vert_2^{-1}.\n", " \\end{gather*}\n", "\n", "Applying the results in (a) shows that (10.5.4) is an upper bound for (10.4.2)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 19\n", "===========\n", "\n", "Define :math:`R \\colon \\mathbb{R}^n \\mapsto \\mathbb{R}^m` with :math:`m > n`.\n", "Let :math:`x \\in \\begin{bmatrix} u\\\\ v \\end{bmatrix}`,\n", ":math:`u \\in \\mathbb{R}^{n - p}`, :math:`v \\in \\mathbb{R}^{p}` with\n", ":math:`0 < p < n`, and :math:`R(x) = N(v) u - b` where\n", ":math:`N \\colon \\mathbb{R}^p \\rightarrow \\mathbb{R}^{m \\times (n - p)}` and\n", ":math:`b \\in \\mathbb{R}^m`.\n", "\n", ".. math::\n", "\n", " f(x) &= \\frac{1}{2} R(x)^\\top R(x)\\\\\n", " &= \\frac{1}{2} (N(v) u - b)^\\top (N(v) u - b)\\\\\n", " &= \\frac{1}{2} \\left[\n", " u^\\top N(v)^\\top N(v) u - u^\\top N(v)^\\top b -\n", " b^\\top N(v) u + b^\\top b\n", " \\right]\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " 0 &= \\frac{\\partial }{\\partial u} f(x)\\\\\n", " &= N(v)^\\top N(v) u - N(v)^\\top b\n", " & \\quad & \\text{Matrix Cookbook (81) and (69)}\\\\\n", " (V D^\\top U^\\top) (U D V^\\top) u\n", " &= V D^\\top U^\\top b\n", " & \\quad & \\text{Theorem 3.6.4}\\\\\n", " u &= V D^+ U^\\top b\n", " & \\quad & \\text{(3.6.6)}\\\\\n", " &= N(v)^+ b\n", " & \\quad & \\text{Theorem 3.6.5}\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\min_x f(x)\n", " &= \\min_v \\frac{1}{2} \\left(\n", " b^\\top {N(v)^+}^\\top N(v)^\\top N(v) N(v)^+ b -\n", " b^\\top {N(v)^+}^\\top N(v)^\\top b - b^\\top N(v) N(v)^+ b + b^\\top b\n", " \\right)\n", " & \\quad & \\text{(a)}\\\\\n", " &= \\min_v \\left\\Vert\n", " N(v) N(v)^+ b - b\n", " \\right\\Vert_2^2\n", " & \\quad & \\text{(3.1.2c)}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 20\n", "===========\n", "\n", "The answer to this problem may be located in the section after (3.5) of\n", ":cite:`golub1973differentiation` and in this book's Chapter 5. Unfortunately,\n", "my current (2015) knowledge does not allow me to comprehend this 100%; however,\n", "I do feel that it is within my reach to fully dissect this succinct and lucid\n", "paper. The first item to understand are Frechet derivatives.\n", "\n", "Two things I learned from reading this are:\n", "\n", "- The rank of the matrix must be locally constant at the point where the\n", " derivative calculated; otherwise the pseudo-inverse is not a continuous\n", " function.\n", "- Based on the convergence results, the variable projection via modified\n", " Marquardt method should definitely be used when appropriate." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 21\n", "===========\n", "\n", "There are existing Levenberg-Marquardt implementations (e.g. Ceres, Eigen), so a\n", "more interesting venture is to implement the secent approximation. However,\n", "such a thing is not of immediate interest at the moment." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. rubric:: References\n", "\n", ".. bibliography:: chapter-10.bib" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }