{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "******************************\n",
    "Scaling, Stopping, and Testing\n",
    "******************************\n",
    "\n",
    "Scaling\n",
    "=======\n",
    "\n",
    "- In real-world problems, the variables may have different scales.\n",
    "\n",
    "  - The preferred solution is to choose units of the variable space so that\n",
    "    each component of :math:`x` will have roughly the same magnitude.\n",
    "  - Rescaling the independent variables such as :math:`Dx` and :math:`D F(x)`)\n",
    "    does not affect the Newton step, but the steepest-descent direction (e.g.\n",
    "    trust region step) is affected.\n",
    "\n",
    "    - The intuition behind this is that the Newton step goes to the lowest\n",
    "      point of a quadratic model, which is unaffected by a change in units,\n",
    "      while steepest-descent makes a unit step without regard to the\n",
    "      relative lengths of each variable direction.\n",
    "\n",
    "  - Dynamic rescaling of the variables may be needed for some problems.\n",
    "\n",
    "Stopping Criteria\n",
    "\n",
    "- When picking a stopping criteria, use the relative rate of change with a user\n",
    "  specified maximum limit to be scale independent as in (7.2.5) and (7.2.7)."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "collapsed": true,
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 1\n",
    "==========\n",
    "\n",
    "Let :math:`a = 10^6` and rewrite the original minimization problem as\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\min_{x \\in \\mathbb{R}^2} f(x) =\n",
    "   \\min_{x \\in \\mathbb{R}^2}\n",
    "       \\left( x_1 - a \\right)^4 +\n",
    "       \\left( x_1 - a \\right)^2 +\n",
    "       \\left( x_2 - a^{-1} \\right)^4 +\n",
    "       \\left( x_2 - a^{-1} \\right)^2.\n",
    "\n",
    "The algorithms discussed so far need to compute the gradient\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\DeclareMathOperator{\\typ}{typ}\n",
    "   \\nabla f(x) =\n",
    "   \\begin{bmatrix}\n",
    "     4 \\left( x_1 - a \\right)^3 + 2 \\left( x_1 - a \\right)\\\\\n",
    "     4 \\left( x_2 - a^{-1} \\right)^3 + 2 \\left( x_2 - a^{-1} \\right)\\\\\n",
    "   \\end{bmatrix}.\n",
    "\n",
    "By inspection, :math:`\\left\\Vert x_+ - x_c \\right\\Vert` will be predominantly\n",
    "influenced by :math:`x_1`, and :math:`f(x)` is minimized when :math:`x_1 = a`\n",
    "and :math:`x_2 = a^{-1}`.\n",
    "\n",
    "Let :math:`\\typ x_1 = a` and :math:`\\typ x_2 = a^{-1}`.  The following\n",
    "experiments illustrate both Newton's step and steepest-descent (6.2.5) converges\n",
    "within 40 iterations without any scaling; furthermore, it was done without any\n",
    "globally convergent modifications (e.g. backtracking line-search, locally\n",
    "constrained optimal hook step, double dogleg step).\n",
    "\n",
    "As shown in :ref:`Exercise 3 <dennis1996numerical-ex-7.3>` and\n",
    ":ref:`Exercise 4 <dennis1996numerical-ex-7.4>`, no scaling is needed for the\n",
    "Newton step, and in this case steepest descent does not need it.\n",
    "\n",
    "A simpler way to handle the minimization is to solve it for :math:`x_1` and\n",
    ":math:`x_2` individually i.e. split up the problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy\n",
    "\n",
    "def f(x):\n",
    "    x = x.flat\n",
    "    return (x[0] - a)**4 + (x[0] - a)**2 + (x[1] - 1 / a)**4 + (x[1] - 1 / a)**2\n",
    "\n",
    "def fp(x):\n",
    "    x = x.flat\n",
    "    return numpy.asmatrix([[4 * (x[0] - a)**3 + 2 * (x[0] - a)],\n",
    "                           [4 * (x[1] - 1 / a)**3 + 2 * (x[1] - 1 / a)]])\n",
    "\n",
    "def fpp(x):\n",
    "    x = x.flat\n",
    "    return numpy.asmatrix([[12 * (x[0] - a)**2 + 2, 0],\n",
    "                           [0, 12 * (x[1] - 1 / a)**2 + 2]])\n",
    "\n",
    "a = 10**6\n",
    "x_c = numpy.asmatrix([[0],[0]])\n",
    "for i in range(40):\n",
    "    H = fpp(x_c)\n",
    "    g = fp(x_c)\n",
    "    #s = -numpy.linalg.inv(H) * g\n",
    "    s = -(g.T * g).flat[0] / (g.T * H * g).flat[0] * g\n",
    "    x_k = x_c + s\n",
    "    x_c = x_k\n",
    "    if i % 5 == 0:\n",
    "        print('Iteration {}: {}'.format(i, numpy.asarray(x_c).squeeze()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def f(x):\n",
    "    x = x.flat\n",
    "    #Rosenbrock\n",
    "    return 100 * (x[0]**2 - x[1])**2 + (1 - x[0])**2\n",
    "\n",
    "def fp(x):\n",
    "    x = x.flat\n",
    "    a = 2 * 100 * (x[0]**2 - x[1])\n",
    "    return numpy.asmatrix([[a * 2 * x[0] + 2 * (1 - x[0]) * (-1)],\n",
    "                           [a * (-1)]])\n",
    "\n",
    "def fpp(x):\n",
    "    x = x.flat\n",
    "    return numpy.asmatrix([[1200 * x[0]**2 - 400 * x[1] + 2, -400 * x[0]],\n",
    "                           [-400 * x[0], -200]])\n",
    "\n",
    "alpha = 1.0\n",
    "x_c = numpy.asmatrix([[-1.2 / alpha],[alpha]])\n",
    "T = numpy.asmatrix([[1/alpha, 0],[0, alpha]])\n",
    "for i in range(128):\n",
    "    H = fpp(x_c)\n",
    "    g = fp(x_c)\n",
    "    s = -numpy.linalg.inv(H) * g\n",
    "    #s = -(g) / numpy.linalg.norm(g)\n",
    "    #s = -(g.T * g).flat[0] / (g.T * H * g).flat[0] * g\n",
    "    x_k = x_c + s\n",
    "    x_c = x_k\n",
    "    if i % 12 == 0:\n",
    "        print('Iteration {}: {}'.format(i, numpy.asarray(x_c).squeeze()))"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _dennis1996numerical-ex-7.2:\n",
    "\n",
    "Exercise 2\n",
    "==========\n",
    "\n",
    "Let :math:`f \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}` and\n",
    ":math:`T \\in \\mathbb{R}^{n \\times n}` be nonsingular.  For any\n",
    ":math:`x \\in \\mathbb{R}^n`, define\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\hat{x} = Tx\n",
    "   \\quad \\text{and} \\quad\n",
    "   \\hat{f}(\\hat{x}) = f(T^{-1} \\hat{x}) = f(x).\n",
    "\n",
    "(a)\n",
    "---\n",
    "\n",
    "Applying definition 4.1.1,\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\nabla f(x) =\n",
    "   \\begin{bmatrix}\n",
    "     \\frac{\\partial f}{\\partial x_1}(x)\\\\\n",
    "     \\vdots\\\\\n",
    "     \\frac{\\partial f}{\\partial x_n}(x)\\\\\n",
    "   \\end{bmatrix} =\n",
    "   \\frac{\\partial f(x)}{\\partial x},\n",
    "\n",
    "yields\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\nabla \\hat{f}(\\hat{x})\n",
    "    &= \\frac{\\partial \\hat{f}(\\hat{x})}{\\partial \\hat{x}}\\\\\n",
    "    &= \\frac{\\partial f(x)}{\\partial \\hat{x}}\n",
    "       & \\quad & \\hat{f}(\\hat{x}) = f(x)\\\\\n",
    "    &= \\frac{\\partial x}{\\partial \\hat{x}} \\frac{\\partial f(x)}{\\partial x}\n",
    "       & \\quad & \\text{denominator layout chain rule}\\\\\n",
    "    &= \\frac{\\partial T^{-1} \\hat{x}}{\\partial \\hat{x}} \\nabla f(x)\n",
    "       & \\quad & \\hat{x} = Tx\\\\\n",
    "    &= T^{-\\top} \\nabla f(x)\n",
    "       & \\quad & \\text{denominator layout vector-by-vector differentiation.}\n",
    "\n",
    "(b)\n",
    "---\n",
    "\n",
    "Applying definition 4.1.4,\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\nabla^2 f(x) =\n",
    "   \\begin{bmatrix}\n",
    "     \\frac{\\partial^2 f(x)}{\\partial x_1^2} & \\cdots &\n",
    "       \\frac{\\partial^2 f(x)}{\\partial x_1 \\partial x_n}\\\\\n",
    "     \\vdots & \\ddots & \\vdots\\\\\n",
    "     \\frac{\\partial^2 f(x)}{\\partial x_1 \\partial x_n} & \\cdots &\n",
    "       \\frac{\\partial^2 f(x)}{\\partial x_n^2}\\\\\n",
    "   \\end{bmatrix} =\n",
    "   \\nabla \\left[ \\nabla f(x) \\right]^\\top =\n",
    "   \\frac{\\partial^2 f(x)}{\\partial x \\partial x^\\top},\n",
    "\n",
    "yields\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\nabla^2 \\hat{f}(\\hat{x})\n",
    "    &= \\frac{\\partial}{\\partial \\hat{x}} \\left(\n",
    "         T^{-\\top} \\frac{\\partial f(x)}{\\partial x}\n",
    "       \\right)^\\top\n",
    "       & \\quad & \\text{(a)}\\\\\n",
    "    &= \\left(\n",
    "         \\frac{\\partial}{\\partial \\hat{x}}\n",
    "         \\frac{\\partial f(x)}{\\partial x^\\top}\n",
    "       \\right) T^{-1}\\\\\n",
    "    &= \\left(\n",
    "         \\frac{\\partial x}{\\partial \\hat{x}}\n",
    "         \\frac{\\partial^2 f(x)}{\\partial x \\partial x^\\top}\n",
    "       \\right) T^{-1}\n",
    "       & \\quad & \\text{denominator layout chain rule}\\\\\n",
    "    &= \\frac{\\partial T^{-1} \\hat{x}}{\\partial \\hat{x}} \\nabla^2 f(x) T^{-1}\n",
    "       & \\quad & \\hat{x} = Tx\\\\\n",
    "    &= T^{-\\top} \\nabla^2 f(x) T^{-1}\n",
    "       & \\quad & \\text{denominator layout vector-by-vector differentiation.}"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _dennis1996numerical-ex-7.3:\n",
    "\n",
    "Exercise 3\n",
    "==========\n",
    "\n",
    "Let :math:`D \\in \\mathbb{R}^{n \\times n}` denote a positive diagonal matrix, and\n",
    ":math:`m_c(x_c + s) = f(x_c) + g^\\top s + \\frac{1}{2} s^\\top H s` where\n",
    ":math:`g = \\nabla f(x_c) \\in \\mathbb{R}^n` and\n",
    ":math:`H \\equiv \\nabla^2 f(x_c) \\in S^n_{++}`.\n",
    "\n",
    "Recall that Lemma 6.4.1 is solving\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\underset{s \\in \\mathbb{R}^n}{\\text{minimize}} \\quad m_c(x_c + s) &\\\\\n",
    "   \\text{subject to} \\quad \\left\\Vert s \\right\\Vert_2 &\\leq \\delta_c\n",
    "\n",
    "where :math:`s(\\mu) = -\\left( H + \\mu I \\right)^{-1} g` is the solution.  In\n",
    "order to utilize Lemma 6.4.1 to minimize the same function but constrained to\n",
    ":math:`\\left\\Vert Ds \\right\\Vert_2 \\leq \\delta_c`, a change of variable is\n",
    "needed.\n",
    "\n",
    "Define :math:`\\hat{x}_c = Dx_c`; applying the results from\n",
    ":ref:`Exercise 2 <dennis1996numerical-ex-7.2>` with :math:`T = D` yields\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\underset{\\hat{s} \\in \\mathbb{R}^n}{\\text{minimize}} \\quad\n",
    "     m_c(Dx_c + \\hat{s})\n",
    "      &= f(\\hat{x}_c) + g^\\top D^{-1} \\hat{s} +\n",
    "         \\frac{1}{2} \\hat{s}^\\top D^{-1} H D^{-1} \\hat{s}\\\\\n",
    "   \\text{subject to} \\quad \\left\\Vert \\hat{s} \\right\\Vert_2 &\\leq \\delta_c.\n",
    "\n",
    "By inspection, :math:`\\hat{s} = Ds` because :math:`x_c` is being optimized in\n",
    "the scaled space.  Transforming the solution back gives\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\hat{s}(\\mu) &= -\\left( D^{-1} H D^{-1} + \\mu I \\right)^{-1} D^{-1} g\\\\\n",
    "   H D^{-1} \\hat{s}(\\mu) + \\mu D \\hat{s}(\\mu) &= -g\\\\\n",
    "   \\left( H + \\mu D^2 \\right) s(\\mu) &= -g\\\\\n",
    "   s(\\mu) &= -\\left( H + \\mu D^2 \\right)^{-1} g."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _dennis1996numerical-ex-7.4:\n",
    "\n",
    "Exercise 4\n",
    "==========\n",
    "\n",
    "Let :math:`F \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^n` and\n",
    ":math:`T_1, T_2 \\in \\mathbb{R}^{n \\times n}` be nonsingular.  For any\n",
    ":math:`x \\in \\mathbb{R}^n`, define\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\hat{x} = T_1 x\n",
    "   \\quad \\text{and} \\quad\n",
    "   \\hat{F}(\\hat{x}) = T_2 F(T_1^{-1} \\hat{x}) = T_2 F(x).\n",
    "\n",
    "Applying definition 4.1.7,\n",
    "\n",
    ".. math::\n",
    "\n",
    "   F'(x) = J(x) =\n",
    "   \\begin{bmatrix}\n",
    "     \\frac{\\partial f_1(x)}{\\partial x_1} & \\cdots &\n",
    "         \\frac{\\partial f_1(x)}{\\partial x_n}\\\\\n",
    "     \\vdots & \\ddots & \\vdots\\\\\n",
    "     \\frac{\\partial f_n(x)}{\\partial x_1} & \\cdots &\n",
    "         \\frac{\\partial f_n(x)}{\\partial x_n}\n",
    "   \\end{bmatrix} =\n",
    "   \\begin{bmatrix}\n",
    "     \\nabla f_1(x)^\\top\\\\\n",
    "     \\vdots\\\\\n",
    "     \\nabla f_n(x)^\\top\n",
    "   \\end{bmatrix} =\n",
    "   F(x) \\nabla^\\top =\n",
    "   \\frac{\\partial F(x)}{\\partial x^\\top},\n",
    "\n",
    "yields\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\hat{J}(\\hat{x})\n",
    "    &= \\frac{\\partial \\hat{F}(\\hat{x})}{\\partial \\hat{x}^\\top}\\\\\n",
    "    &= \\frac{\\partial T_2 F(x)}{\\partial \\hat{x}^\\top}\n",
    "       & \\quad & \\hat{F}(\\hat{x}) = T_2 F(x)\\\\\n",
    "    &= \\frac{\\partial T_2 F(x)}{\\partial x^\\top}\n",
    "           \\frac{\\partial x^\\top}{\\partial \\hat{x}^\\top}\n",
    "       & \\quad & \\text{numerator layout chain rule}\\\\\n",
    "    &= T_2 J(x) \\frac{\\partial \\hat{x}^\\top T_1^{-\\top}}{\\partial \\hat{x}^\\top}\n",
    "       & \\quad & \\hat{x} = T_1 x\\\\\n",
    "    &= T_2 J(x) T_1^{-1}\n",
    "       & \\quad & \\text{numerator layout vector-by-vector differentiation.}\n",
    "\n",
    "Using (6.5.2) to transform the Newton step for :math:`\\hat{F}` back to the\n",
    "original variable space gives\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\hat{x}_+ &= \\hat{x}_c + \\hat{J}(\\hat{x}_c)^{-1} \\hat{F}(\\hat{x}_c)\\\\\n",
    "   T_1 x_+ &= T_1 x_c + \\left( T_2 J(x) T_1^{-1} \\right)^{-1} T_2 F(x_c)\\\\\n",
    "   T_1 x_+ &= T_1 x_c + T_1 J(x)^{-1} F(x_c)\\\\\n",
    "   x_+ &= x_c + J(x)^{-1} F(x_c),\n",
    "\n",
    "which shows that scaling does not affect the Newton step.  However, if a norm\n",
    "is used, scaling should still be applied to avoid ignoring small components."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 5\n",
    "==========\n",
    "\n",
    "The only thing I can come up is when the optimization problem involves some kind\n",
    "of direct dependency between the variables e.g. :math:`(x - y)`.  Any dynamic\n",
    "scaling strategy would be specific to a problem domain hence not robust."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 6\n",
    "==========\n",
    "\n",
    "According to Theorem 4.3.2, it is a saddle point when the Hessian :math:`H` is\n",
    "indefinite and a maximum if :math:`H \\preceq 0`.\n",
    "\n",
    "See :cite:`benzi2005numerical` and Saddle Point Theorems for an overview on how\n",
    "to proceed in minimization algorithms."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 7\n",
    "==========\n",
    "\n",
    "No significant gain in knowledge for this implementing this solution."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. rubric:: References\n",
    "\n",
    ".. bibliography:: chapter-07.bib"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}