{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "******************************\n", "Scaling, Stopping, and Testing\n", "******************************\n", "\n", "Scaling\n", "=======\n", "\n", "- In real-world problems, the variables may have different scales.\n", "\n", " - The preferred solution is to choose units of the variable space so that\n", " each component of :math:`x` will have roughly the same magnitude.\n", " - Rescaling the independent variables such as :math:`Dx` and :math:`D F(x)`)\n", " does not affect the Newton step, but the steepest-descent direction (e.g.\n", " trust region step) is affected.\n", "\n", " - The intuition behind this is that the Newton step goes to the lowest\n", " point of a quadratic model, which is unaffected by a change in units,\n", " while steepest-descent makes a unit step without regard to the\n", " relative lengths of each variable direction.\n", "\n", " - Dynamic rescaling of the variables may be needed for some problems.\n", "\n", "Stopping Criteria\n", "\n", "- When picking a stopping criteria, use the relative rate of change with a user\n", " specified maximum limit to be scale independent as in (7.2.5) and (7.2.7)." ] }, { "cell_type": "raw", "metadata": { "collapsed": true, "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 1\n", "==========\n", "\n", "Let :math:`a = 10^6` and rewrite the original minimization problem as\n", "\n", ".. math::\n", "\n", " \\min_{x \\in \\mathbb{R}^2} f(x) =\n", " \\min_{x \\in \\mathbb{R}^2}\n", " \\left( x_1 - a \\right)^4 +\n", " \\left( x_1 - a \\right)^2 +\n", " \\left( x_2 - a^{-1} \\right)^4 +\n", " \\left( x_2 - a^{-1} \\right)^2.\n", "\n", "The algorithms discussed so far need to compute the gradient\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\typ}{typ}\n", " \\nabla f(x) =\n", " \\begin{bmatrix}\n", " 4 \\left( x_1 - a \\right)^3 + 2 \\left( x_1 - a \\right)\\\\\n", " 4 \\left( x_2 - a^{-1} \\right)^3 + 2 \\left( x_2 - a^{-1} \\right)\\\\\n", " \\end{bmatrix}.\n", "\n", "By inspection, :math:`\\left\\Vert x_+ - x_c \\right\\Vert` will be predominantly\n", "influenced by :math:`x_1`, and :math:`f(x)` is minimized when :math:`x_1 = a`\n", "and :math:`x_2 = a^{-1}`.\n", "\n", "Let :math:`\\typ x_1 = a` and :math:`\\typ x_2 = a^{-1}`. The following\n", "experiments illustrate both Newton's step and steepest-descent (6.2.5) converges\n", "within 40 iterations without any scaling; furthermore, it was done without any\n", "globally convergent modifications (e.g. backtracking line-search, locally\n", "constrained optimal hook step, double dogleg step).\n", "\n", "As shown in :ref:`Exercise 3 ` and\n", ":ref:`Exercise 4 `, no scaling is needed for the\n", "Newton step, and in this case steepest descent does not need it.\n", "\n", "A simpler way to handle the minimization is to solve it for :math:`x_1` and\n", ":math:`x_2` individually i.e. split up the problem." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy\n", "\n", "def f(x):\n", " x = x.flat\n", " return (x[0] - a)**4 + (x[0] - a)**2 + (x[1] - 1 / a)**4 + (x[1] - 1 / a)**2\n", "\n", "def fp(x):\n", " x = x.flat\n", " return numpy.asmatrix([[4 * (x[0] - a)**3 + 2 * (x[0] - a)],\n", " [4 * (x[1] - 1 / a)**3 + 2 * (x[1] - 1 / a)]])\n", "\n", "def fpp(x):\n", " x = x.flat\n", " return numpy.asmatrix([[12 * (x[0] - a)**2 + 2, 0],\n", " [0, 12 * (x[1] - 1 / a)**2 + 2]])\n", "\n", "a = 10**6\n", "x_c = numpy.asmatrix([[0],[0]])\n", "for i in range(40):\n", " H = fpp(x_c)\n", " g = fp(x_c)\n", " #s = -numpy.linalg.inv(H) * g\n", " s = -(g.T * g).flat[0] / (g.T * H * g).flat[0] * g\n", " x_k = x_c + s\n", " x_c = x_k\n", " if i % 5 == 0:\n", " print('Iteration {}: {}'.format(i, numpy.asarray(x_c).squeeze()))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def f(x):\n", " x = x.flat\n", " #Rosenbrock\n", " return 100 * (x[0]**2 - x[1])**2 + (1 - x[0])**2\n", "\n", "def fp(x):\n", " x = x.flat\n", " a = 2 * 100 * (x[0]**2 - x[1])\n", " return numpy.asmatrix([[a * 2 * x[0] + 2 * (1 - x[0]) * (-1)],\n", " [a * (-1)]])\n", "\n", "def fpp(x):\n", " x = x.flat\n", " return numpy.asmatrix([[1200 * x[0]**2 - 400 * x[1] + 2, -400 * x[0]],\n", " [-400 * x[0], -200]])\n", "\n", "alpha = 1.0\n", "x_c = numpy.asmatrix([[-1.2 / alpha],[alpha]])\n", "T = numpy.asmatrix([[1/alpha, 0],[0, alpha]])\n", "for i in range(128):\n", " H = fpp(x_c)\n", " g = fp(x_c)\n", " s = -numpy.linalg.inv(H) * g\n", " #s = -(g) / numpy.linalg.norm(g)\n", " #s = -(g.T * g).flat[0] / (g.T * H * g).flat[0] * g\n", " x_k = x_c + s\n", " x_c = x_k\n", " if i % 12 == 0:\n", " print('Iteration {}: {}'.format(i, numpy.asarray(x_c).squeeze()))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-7.2:\n", "\n", "Exercise 2\n", "==========\n", "\n", "Let :math:`f \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}` and\n", ":math:`T \\in \\mathbb{R}^{n \\times n}` be nonsingular. For any\n", ":math:`x \\in \\mathbb{R}^n`, define\n", "\n", ".. math::\n", "\n", " \\hat{x} = Tx\n", " \\quad \\text{and} \\quad\n", " \\hat{f}(\\hat{x}) = f(T^{-1} \\hat{x}) = f(x).\n", "\n", "(a)\n", "---\n", "\n", "Applying definition 4.1.1,\n", "\n", ".. math::\n", "\n", " \\nabla f(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f}{\\partial x_1}(x)\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial f}{\\partial x_n}(x)\\\\\n", " \\end{bmatrix} =\n", " \\frac{\\partial f(x)}{\\partial x},\n", "\n", "yields\n", "\n", ".. math::\n", "\n", " \\nabla \\hat{f}(\\hat{x})\n", " &= \\frac{\\partial \\hat{f}(\\hat{x})}{\\partial \\hat{x}}\\\\\n", " &= \\frac{\\partial f(x)}{\\partial \\hat{x}}\n", " & \\quad & \\hat{f}(\\hat{x}) = f(x)\\\\\n", " &= \\frac{\\partial x}{\\partial \\hat{x}} \\frac{\\partial f(x)}{\\partial x}\n", " & \\quad & \\text{denominator layout chain rule}\\\\\n", " &= \\frac{\\partial T^{-1} \\hat{x}}{\\partial \\hat{x}} \\nabla f(x)\n", " & \\quad & \\hat{x} = Tx\\\\\n", " &= T^{-\\top} \\nabla f(x)\n", " & \\quad & \\text{denominator layout vector-by-vector differentiation.}\n", "\n", "(b)\n", "---\n", "\n", "Applying definition 4.1.4,\n", "\n", ".. math::\n", "\n", " \\nabla^2 f(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial^2 f(x)}{\\partial x_1^2} & \\cdots &\n", " \\frac{\\partial^2 f(x)}{\\partial x_1 \\partial x_n}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial^2 f(x)}{\\partial x_1 \\partial x_n} & \\cdots &\n", " \\frac{\\partial^2 f(x)}{\\partial x_n^2}\\\\\n", " \\end{bmatrix} =\n", " \\nabla \\left[ \\nabla f(x) \\right]^\\top =\n", " \\frac{\\partial^2 f(x)}{\\partial x \\partial x^\\top},\n", "\n", "yields\n", "\n", ".. math::\n", "\n", " \\nabla^2 \\hat{f}(\\hat{x})\n", " &= \\frac{\\partial}{\\partial \\hat{x}} \\left(\n", " T^{-\\top} \\frac{\\partial f(x)}{\\partial x}\n", " \\right)^\\top\n", " & \\quad & \\text{(a)}\\\\\n", " &= \\left(\n", " \\frac{\\partial}{\\partial \\hat{x}}\n", " \\frac{\\partial f(x)}{\\partial x^\\top}\n", " \\right) T^{-1}\\\\\n", " &= \\left(\n", " \\frac{\\partial x}{\\partial \\hat{x}}\n", " \\frac{\\partial^2 f(x)}{\\partial x \\partial x^\\top}\n", " \\right) T^{-1}\n", " & \\quad & \\text{denominator layout chain rule}\\\\\n", " &= \\frac{\\partial T^{-1} \\hat{x}}{\\partial \\hat{x}} \\nabla^2 f(x) T^{-1}\n", " & \\quad & \\hat{x} = Tx\\\\\n", " &= T^{-\\top} \\nabla^2 f(x) T^{-1}\n", " & \\quad & \\text{denominator layout vector-by-vector differentiation.}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-7.3:\n", "\n", "Exercise 3\n", "==========\n", "\n", "Let :math:`D \\in \\mathbb{R}^{n \\times n}` denote a positive diagonal matrix, and\n", ":math:`m_c(x_c + s) = f(x_c) + g^\\top s + \\frac{1}{2} s^\\top H s` where\n", ":math:`g = \\nabla f(x_c) \\in \\mathbb{R}^n` and\n", ":math:`H \\equiv \\nabla^2 f(x_c) \\in S^n_{++}`.\n", "\n", "Recall that Lemma 6.4.1 is solving\n", "\n", ".. math::\n", "\n", " \\underset{s \\in \\mathbb{R}^n}{\\text{minimize}} \\quad m_c(x_c + s) &\\\\\n", " \\text{subject to} \\quad \\left\\Vert s \\right\\Vert_2 &\\leq \\delta_c\n", "\n", "where :math:`s(\\mu) = -\\left( H + \\mu I \\right)^{-1} g` is the solution. In\n", "order to utilize Lemma 6.4.1 to minimize the same function but constrained to\n", ":math:`\\left\\Vert Ds \\right\\Vert_2 \\leq \\delta_c`, a change of variable is\n", "needed.\n", "\n", "Define :math:`\\hat{x}_c = Dx_c`; applying the results from\n", ":ref:`Exercise 2 ` with :math:`T = D` yields\n", "\n", ".. math::\n", "\n", " \\underset{\\hat{s} \\in \\mathbb{R}^n}{\\text{minimize}} \\quad\n", " m_c(Dx_c + \\hat{s})\n", " &= f(\\hat{x}_c) + g^\\top D^{-1} \\hat{s} +\n", " \\frac{1}{2} \\hat{s}^\\top D^{-1} H D^{-1} \\hat{s}\\\\\n", " \\text{subject to} \\quad \\left\\Vert \\hat{s} \\right\\Vert_2 &\\leq \\delta_c.\n", "\n", "By inspection, :math:`\\hat{s} = Ds` because :math:`x_c` is being optimized in\n", "the scaled space. Transforming the solution back gives\n", "\n", ".. math::\n", "\n", " \\hat{s}(\\mu) &= -\\left( D^{-1} H D^{-1} + \\mu I \\right)^{-1} D^{-1} g\\\\\n", " H D^{-1} \\hat{s}(\\mu) + \\mu D \\hat{s}(\\mu) &= -g\\\\\n", " \\left( H + \\mu D^2 \\right) s(\\mu) &= -g\\\\\n", " s(\\mu) &= -\\left( H + \\mu D^2 \\right)^{-1} g." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-7.4:\n", "\n", "Exercise 4\n", "==========\n", "\n", "Let :math:`F \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^n` and\n", ":math:`T_1, T_2 \\in \\mathbb{R}^{n \\times n}` be nonsingular. For any\n", ":math:`x \\in \\mathbb{R}^n`, define\n", "\n", ".. math::\n", "\n", " \\hat{x} = T_1 x\n", " \\quad \\text{and} \\quad\n", " \\hat{F}(\\hat{x}) = T_2 F(T_1^{-1} \\hat{x}) = T_2 F(x).\n", "\n", "Applying definition 4.1.7,\n", "\n", ".. math::\n", "\n", " F'(x) = J(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_1(x)}{\\partial x_n}\\\\\n", " \\vdots & \\ddots & \\vdots\\\\\n", " \\frac{\\partial f_n(x)}{\\partial x_1} & \\cdots &\n", " \\frac{\\partial f_n(x)}{\\partial x_n}\n", " \\end{bmatrix} =\n", " \\begin{bmatrix}\n", " \\nabla f_1(x)^\\top\\\\\n", " \\vdots\\\\\n", " \\nabla f_n(x)^\\top\n", " \\end{bmatrix} =\n", " F(x) \\nabla^\\top =\n", " \\frac{\\partial F(x)}{\\partial x^\\top},\n", "\n", "yields\n", "\n", ".. math::\n", "\n", " \\hat{J}(\\hat{x})\n", " &= \\frac{\\partial \\hat{F}(\\hat{x})}{\\partial \\hat{x}^\\top}\\\\\n", " &= \\frac{\\partial T_2 F(x)}{\\partial \\hat{x}^\\top}\n", " & \\quad & \\hat{F}(\\hat{x}) = T_2 F(x)\\\\\n", " &= \\frac{\\partial T_2 F(x)}{\\partial x^\\top}\n", " \\frac{\\partial x^\\top}{\\partial \\hat{x}^\\top}\n", " & \\quad & \\text{numerator layout chain rule}\\\\\n", " &= T_2 J(x) \\frac{\\partial \\hat{x}^\\top T_1^{-\\top}}{\\partial \\hat{x}^\\top}\n", " & \\quad & \\hat{x} = T_1 x\\\\\n", " &= T_2 J(x) T_1^{-1}\n", " & \\quad & \\text{numerator layout vector-by-vector differentiation.}\n", "\n", "Using (6.5.2) to transform the Newton step for :math:`\\hat{F}` back to the\n", "original variable space gives\n", "\n", ".. math::\n", "\n", " \\hat{x}_+ &= \\hat{x}_c + \\hat{J}(\\hat{x}_c)^{-1} \\hat{F}(\\hat{x}_c)\\\\\n", " T_1 x_+ &= T_1 x_c + \\left( T_2 J(x) T_1^{-1} \\right)^{-1} T_2 F(x_c)\\\\\n", " T_1 x_+ &= T_1 x_c + T_1 J(x)^{-1} F(x_c)\\\\\n", " x_+ &= x_c + J(x)^{-1} F(x_c),\n", "\n", "which shows that scaling does not affect the Newton step. However, if a norm\n", "is used, scaling should still be applied to avoid ignoring small components." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 5\n", "==========\n", "\n", "The only thing I can come up is when the optimization problem involves some kind\n", "of direct dependency between the variables e.g. :math:`(x - y)`. Any dynamic\n", "scaling strategy would be specific to a problem domain hence not robust." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 6\n", "==========\n", "\n", "According to Theorem 4.3.2, it is a saddle point when the Hessian :math:`H` is\n", "indefinite and a maximum if :math:`H \\preceq 0`.\n", "\n", "See :cite:`benzi2005numerical` and Saddle Point Theorems for an overview on how\n", "to proceed in minimization algorithms." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7\n", "==========\n", "\n", "No significant gain in knowledge for this implementing this solution." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. rubric:: References\n", "\n", ".. bibliography:: chapter-07.bib" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }