{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "*********************************\n", "Multivariable Calculus Background\n", "*********************************\n", "\n", "Derivatives and Multivariable Models\n", "====================================\n", "\n", "- Important results from single-variable calculus (e.g. mean value theorem) hold\n", " for functions of multiple variables because they involve the value of a\n", " real-valued function at two points :math:`x, \\bar{x} \\in \\mathbb{R}^n`, and\n", " the line connecting these points can be parameterized using one variable (e.g.\n", " gradient, Hessian).\n", "- There is no mean value theorem for continuously differentiable vector-valued\n", " functions (e.g. Jacobian)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 1\n", "==========\n", "\n", ".. math::\n", "\n", " F(x) =\n", " \\begin{bmatrix}\n", " x_1^2 + x_2^3 - x_3^4\\\\\n", " x_1 x_2 x_3\\\\\n", " 2 x_1 x_2 - 3 x_2 x_3 + x_1 x_3\\\\\n", " \\end{bmatrix}\n", "\n", ".. math::\n", "\n", " J(x) =\n", " \\begin{bmatrix}\n", " 2 x_1 & 3 x_2^2 & - 4 x_3^3\\\\\n", " x_2 x_3 & x_2 x_3 & x_1 x_2\\\\\n", " 2 x_2 + x_3 & 2 x_1 - 3 x_3 & -3 x_2 + x_1\\\\\n", " \\end{bmatrix}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 2\n", "==========\n", "\n", "Let :math:`f(x) = x_1^3 x_2^4 + \\frac{x_2}{x_1}`.\n", "\n", ".. math::\n", "\n", " \\nabla f(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial f}{\\partial x_1}(x)\\\\\n", " \\frac{\\partial f}{\\partial x_2}(x)\\\\\n", " \\end{bmatrix} =\n", " \\begin{bmatrix}\n", " 3 x_1^2 x_2^4 - x_2 (x_1)^{-2}\\\\\n", " 4 x_1^3 x_2^3 + (x_1)^{-1}\n", " \\end{bmatrix}\n", "\n", ".. math::\n", "\n", " \\nabla^2 f(x) =\n", " \\begin{bmatrix}\n", " \\frac{\\partial^2 f}{\\partial^2 x_1}(x) &\n", " \\frac{\\partial^2 f}{\\partial x_1 \\partial x_2}(x)\\\\\n", " \\frac{\\partial^2 f}{\\partial x_2 \\partial x_1}(x) &\n", " \\frac{\\partial^2 f}{\\partial^2 x_2}(x)\\\\\n", " \\end{bmatrix} =\n", " \\begin{bmatrix}\n", " 6 x_1 x_2^4 + 2 x_2 (x_1)^{-3} &\n", " 12 x_1^2 x_2^3 - (x_1)^{-2}\\\\\n", " 12 x_1^2 x_2^3 - (x_1)^{-2} &\n", " 12 x_1^3 x_2^2\n", " \\end{bmatrix}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 3\n", "==========\n", "\n", "Define :math:`x(t) = x + tp` and :math:`g(t) = f(x(t)) = f(x + tp)` where\n", ":math:`t \\in (0, 1)`.\n", "\n", "For :math:`0 \\leq \\alpha \\leq 1`, applying the chain rule twice yields\n", "\n", ".. math::\n", "\n", " \\frac{d g(t)}{d t}[\\alpha]\n", " &= \\sum_{j = 1}^n \\frac{\\partial f(x(t))}{\\partial x(t)_j}[x(\\alpha)]\n", " \\frac{d x(t)_j}{d t}[\\alpha]\n", " & \\quad & \\text{definition of total differential}\\\\\n", " &= \\sum_{j = 1}^n \\frac{\\partial}{\\partial x_j} f[x(\\alpha)] p_j\\\\\n", " &= \\nabla f[x + \\alpha p] \\cdot p\n", "\n", "and\n", "\n", ".. math::\n", "\n", " \\frac{d^2 g}{d t^2}[\\alpha]\n", " &= \\frac{d}{dt}\n", " \\left(\n", " \\sum_{j = 1}^n \\frac{\\partial}{\\partial x_j} f[x(\\alpha)] p_j\n", " \\right)\\\\\n", " &= \\sum_{i = 1}^n \\sum_{j = 1}^n\n", " \\frac{\\partial^2}{\\partial x_i \\partial x_j} f[x(\\alpha)] p_i p_j\n", " & \\quad & \\text{definition of total differential}\\\\\n", " &= p^\\top \\nabla^2 f[x(\\alpha)] p.\n", "\n", "Substituting :math:`\\alpha = 0` yields\n", ":math:`\\frac{\\partial^2 f}{\\partial p^2} = p^\\top \\nabla^2 f(x) p`.\n", "\n", "Applying the first fundamental theorem of calculus twice followed by the mean\n", "value theorem for functions of one variable yields\n", "\n", ".. math::\n", "\n", " g(1) &= g(0) + \\int_0^1 g'(t_1) dt_1\\\\\n", " &= g(0) + \\int_0^1 \\left( g'(0) + \\int_0^{t_1} g''(t_2) dt_2 \\right) dt_1\n", " & \\quad & \\text{since } g'(t_1) - g'(0) = \\int_0^{t_1} g''(t_2) dt_2\\\\\n", " &= g(0) + \\int_0^1 g'(0) dt_1 + \\int_0^1 dt_1 \\int_0^{t_1} g''(t_2) dt_2\\\\\n", " &= g(0) + g'(0) (1 - 0) + \\int_0^1 g''(t_2) dt_2 \\int_{t_2}^1 dt_1\n", " & \\quad & \\text{changed order of integration in the double integral}\\\\\n", " &= g(0) + g'(0) + \\int_0^1 (1 - t_2) g''(t_2) dt_2\\\\\n", " &= g(0) + g'(0) + g''(\\xi) \\int_0^1 (1 - t_2) dt_2\n", " & \\quad & \\text{Mean Value Theorem with } \\xi \\in (0,1)\\\\\n", " f(x + p)\n", " &= f(x) + \\nabla f(x)^\\top p + \\frac{1}{2} p^\\top \\nabla^2 f[x(\\xi)] p." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-4.4:\n", "\n", "Exercise 4\n", "==========\n", "\n", "Invoking Lemma 4.2.1 for each component function in\n", ":math:`\\left\\{ f_i \\right\\}_{i = 1}^n` yields\n", "\n", ".. math::\n", "\n", " \\lim_{t \\rightarrow 0} \\frac{f_i(x + tp) - f_i(x)}{t}\n", " &= \\nabla f_i(x)^\\top p\\\\\n", " \\lim_{t \\rightarrow 0} \\frac{f_i(x + tp) - f_i(x)}{t} - \\nabla f_i(x)^\\top p\n", " &= 0\\\\\n", " \\left\\vert \\lim_{t \\rightarrow 0}\n", " \\frac{f_i(x + tp) - f_i(x)}{t} - \\nabla f_i(x)^\\top p\n", " \\right\\vert &= 0\\\\\n", " \\lim_{t \\rightarrow 0}\n", " \\left\\vert\n", " \\frac{f_i(x + tp) - f_i(x)}{t} - \\nabla f_i(x)^\\top p\n", " \\right\\vert\n", " &= 0 & \\quad & \\lvert \\cdot \\rvert \\text{ is continuous.}\n", "\n", "Since :math:`F` is assumed to be continuously differentiable in an open set\n", ":math:`D` containing :math:`x`, the consequent property\n", ":math:`F'(x) = \\nabla F(x)^\\top = J(x)` allows the previous statement to be\n", "written more compactly as\n", "\n", ".. math::\n", "\n", " \\lim_{t \\rightarrow 0} \\frac{F(x + tp) - F(x)}{t}\n", " &= \\nabla F(x)^\\top p\\\\\n", " &= J(x) p\\\\\n", " \\lim_{t \\rightarrow 0} \\frac{F(x + tp) - F(x)}{t} - J(x) p &= \\vec{0}\\\\\n", " \\left\\Vert\n", " \\lim_{t \\rightarrow 0} \\frac{F(x + tp) - F(x) - J(x)tp}{t}\n", " \\right\\Vert &= \\left\\Vert 0 \\right\\Vert\\\\\n", " \\lim_{t \\rightarrow 0} \\frac{\\lVert F(x + tp) - F(x) - J(x)tp \\rVert}{t}\n", " &= 0 & \\quad & \\lVert \\cdot \\rVert \\text{ is continuous.}\n", "\n", "Thus :math:`J(x) = \\hat{J}(x)`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 5\n", "==========\n", "\n", "A useful fact to note is that if :math:`I \\subset \\mathbb{R}` is an interval and\n", ":math:`f \\colon I \\mapsto \\mathbb{R}` is differentiable on :math:`I`, then\n", ":math:`f` is Lipschitz on :math:`I` if and only if :math:`f'` is bounded on\n", ":math:`I`.\n", "\n", "Clearly if :math:`f` is Lipschitz on :math:`I`, then\n", ":math:`\\lvert f'(x) \\rvert \\leq \\gamma` (4.1.11).\n", "\n", "Suppose :math:`\\lvert f'(x) \\rvert \\leq M` for all :math:`x \\in I` and\n", ":math:`M \\in \\mathbb{R}`. Let :math:`x` and :math:`y` be any two distinct\n", "elements of :math:`I`. Applying the Mean Value Theorem gives\n", "\n", ".. math::\n", "\n", " f'(w) &= \\frac{f(x) - f(y)}{x - y}\\\\\n", " \\lvert f'(w) \\rvert &= \\left\\vert \\frac{f(x) - f(y)}{x - y} \\right\\vert\\\\\n", " M \\lvert x - y \\rvert &\\geq \\left\\vert f(x) - f(y) \\right\\vert\n", "\n", "where :math:`w \\in (x, y)`. Thus, :math:`f` is Lipschitz on :math:`I` with\n", ":math:`\\gamma = M`.\n", "\n", "In this exercise, :math:`f(x) = x \\sin x^{-1}`, so\n", ":math:`f'(x) = \\sin(x^{-1}) - \\frac{\\cos x^{-1}}{x}` and\n", ":math:`I \\in (-\\infty, \\infty)`. Since :math:`f(x = 0) = 0` and\n", ":math:`x^{-1}` becomes unbounded only when :math:`x = 0`, :math:`f'(x)` is\n", "bounded by a finite value over :math:`I`. Therefore, :math:`f(x)` is Lipschitz\n", "continuous.\n", "\n", "However, :math:`f` is not differentiable at :math:`x = 0` because the limit does\n", "not exist:\n", "\n", ".. math::\n", "\n", " f'(0)\n", " &= \\lim_{\\epsilon \\rightarrow 0^+} \\frac{f(0 + \\epsilon) - f(0)}{\\epsilon}\\\\\n", " &= \\lim_{\\epsilon \\rightarrow 0^+}\n", " \\frac{\\epsilon \\sin(\\epsilon^{-1})}{\\epsilon}\\\\\n", " &= \\lim_{\\epsilon \\rightarrow 0^+} \\sin \\frac{1}{\\epsilon}\\\\\n", " &\\neq \\lim_{\\epsilon \\rightarrow 0^-} \\sin \\frac{1}{\\epsilon}." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 6\n", "==========\n", "\n", "Let :math:`g(t) = f(x + tp)` where `t \\in (0, 1)` and\n", ":math:`f \\colon \\mathbb{R}^n \\mapsto \\mathbb{R}`.\n", "\n", ".. math::\n", "\n", " g(1) - g(0) &= \\int_0^1 g'(t_1) dt_1\\\\\n", " g(1) - g(0) - g'(0) &= \\int_0^1 g'(t_1) - g'(0) dt_1\\\\\n", " &= \\int_0^1 \\int_0^{t_1} g''(t_2) dt_2 dt_1\n", " & \\quad & g'(t_1) - g'(0) = \\int_0^{t_1} g''(t_2) dt_2\\\\\n", " g(1) - g(0) - g'(0) - \\int_0^1 \\int_0^{t_1} g''(0) dt_2 dt_1\n", " &= \\int_0^1 \\int_0^{t_1} g''(t_2) dt_2 dt_1 -\n", " \\int_0^1 \\int_0^{t_1} g''(0) dt_2 dt_1\\\\\n", " g(1) - g(0) - g'(0) - \\frac{1}{2} g''(0)\n", " &= \\int_0^1 \\int_0^{t_1} g''(t_2) - g''(0) dt_2 dt_1\\\\\n", " \\left\\Vert g(1) - g(0) - g'(0) - \\frac{1}{2} g''(0) \\right\\Vert\n", " &= \\left\\Vert\n", " \\int_0^1 \\int_0^{t_1} g''(t_2) - g''(0) dt_2 dt_1\n", " \\right\\Vert\\\\\n", " \\left\\vert g(1) - g(0) - g'(0) - \\frac{1}{2} g''(0) \\right\\vert\n", " &\\leq \\int_0^1 \\int_0^{t_1}\n", " \\left\\Vert g''(t_2) - g''(0) \\right\\Vert dt_2 dt_1\n", " & \\quad & \\text{double integral application of Lemma 4.1.10}\\\\\n", " \\left\\vert\n", " f(x + p) - f(x) - \\nabla f'(x)^\\top p - \\frac{1}{2} p^\\top \\nabla^2 f(x) p\n", " \\right\\vert\n", " &= \\int_0^1 \\int_0^{t_1} \\left\\Vert\n", " p^\\top \\left[ \\nabla^2 f(x + t_2 p) - \\nabla^2 f(x) \\right] p\n", " \\right\\Vert dt_2 dt_1\\\\\n", " &= \\int_0^1 \\int_0^{t_1} \\left\\Vert\n", " \\sum_i \\sigma_i p^\\top u_i v_i^\\top p\n", " \\right\\Vert dt_2 dt_1\n", " & \\quad & \\text{apply Theorem 3.6.4 to Hessian difference}\\\\\n", " &= \\int_0^1 \\int_0^{t_1} \\left\\Vert\n", " \\sum_i \\sigma_i \\sum_j \\sum_k p_j u_{ij} v_{ik} p_k\n", " \\right\\Vert dt_2 dt_1\\\\\n", " &\\leq \\left\\Vert p \\right\\Vert^2_\\infty \\int_0^1 \\int_0^{t_1}\n", " \\left\\vert\n", " \\sum_i \\sigma_i \\sum_j \\sum_k u_{ij} v_{ik}\n", " \\right\\vert dt_2 dt_1\\\\\n", " &\\leq \\left\\Vert p \\right\\Vert^2_\\infty \\int_0^1 \\int_0^{t_1} n\n", " \\left\\Vert\n", " \\nabla^2 f(x + t_2 p) - \\nabla^2 f(x)\n", " \\right\\Vert_\\infty dt_2 dt_1\\\\\n", " &\\leq \\left\\Vert p \\right\\Vert^2 \\int_0^1 \\int_0^{t_1} n\n", " \\left\\Vert \\nabla^2 f(x + t_2 p) - \\nabla^2 f(x) \\right\\Vert dt_2 dt_1\n", " & \\quad & \\text{(3.1.5)}\\\\\n", " &\\leq \\left\\Vert p \\right\\Vert^2 \\int_0^1 \\int_0^{t_1} \\gamma\n", " \\left\\Vert t_2 p \\right\\Vert dt_2 dt_1\n", " & \\quad & \\text{(4.1.11)}\\\\\n", " &= \\gamma \\left\\Vert p \\right\\Vert^3 \\int_0^1 \\int_0^{t_1} t_2 dt_2 dt_1\\\\\n", " &= \\frac{\\gamma}{6} \\left\\Vert p \\right\\Vert^3\n", "\n", "The additional factor of :math:`n` is rolled into :math:`\\gamma`. Since the\n", "definition of Lipschitz continuity only requires a norm to be defined over\n", ":math:`\\mathbb{R}^{m \\times n}`, there does not seem to be any reason against\n", "using a matrix norm. The book focuses on :math:`p`-norms where\n", ":math:`p \\geq 1`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-4.7:\n", "\n", "Exercise 7\n", "==========\n", "\n", ".. math::\n", "\n", " F(v) - F(u) &= \\int_u^v F'(z) dz & \\quad & \\text{(4.1.8)}\\\\\n", " &= \\int_0^1 J(u + t(v - u)) (v - u) dt\n", " & \\quad & \\text{change of variables where }\n", " z = u + t(v - u) \\text{ and } dz = (v - u) dt\\\\\n", " F(v) - F(u) - J(x) (v - u)\n", " &= \\int_0^1 J(u + t(v - u)) (v - u) dt - J(x) (v - u)\\\\\n", " \\left\\Vert F(v) - F(u) - J(x) (v - u) \\right\\Vert\n", " &= \\left\\Vert\n", " \\int_0^1 \\left[ J(u + t(v - u)) - J(x) \\right] (v - u) dt\n", " \\right\\Vert\\\\\n", " &\\leq \\int_0^1 \\left\\Vert\n", " \\left[ J(u + t(v - u)) - J(x) \\right] (v - u)\n", " \\right\\Vert dt\n", " & \\quad & \\text{(4.1.9)}\\\\\n", " &\\leq \\int_0^1 \\left\\Vert J(u + t(v - u)) - J(x) \\right\\Vert\n", " \\left\\Vert v - u \\right\\Vert dt\\\\\n", " &\\leq \\left\\Vert v - u \\right\\Vert\n", " \\int_0^1 \\gamma \\left\\Vert u + t(v - u) - x \\right\\Vert dt\n", " & \\quad & \\text{(4.1.11)}\\\\\n", " &= \\gamma \\left\\Vert v - u \\right\\Vert \\int_0^1\n", " \\left\\Vert u + t(v - u) - \\left[ tx + (1 - t)x \\right] \\right\\Vert dt\\\\\n", " &= \\gamma \\left\\Vert v - u \\right\\Vert \\int_0^1\n", " \\left\\Vert tv - tx + (1 - t)u - (1 - t)x \\right\\Vert dt\\\\\n", " &\\leq \\gamma \\left\\Vert v - u \\right\\Vert\n", " \\int_0^1 t \\left\\Vert v - x \\right\\Vert +\n", " (1 - t) \\left\\Vert u - x \\right\\Vert dt\\\\\n", " &= \\gamma \\left\\Vert v - u \\right\\Vert\n", " \\frac{\\left\\Vert v - x \\right\\Vert + \\left\\Vert u - x \\right\\Vert}{2}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-4.8:\n", "\n", "Exercise 8\n", "==========\n", "\n", ".. math::\n", "\n", " F(x + p) - F(x) &= \\int_0^1 J(x + tp) p dt\\\\\n", " F(x + p) - F(x) - J(x) p &= \\int_0^1 J(x + tp) p dt - J(x) p\\\\\n", " \\left\\Vert F(x + p) - F(x) - J(x) p \\right\\Vert\n", " &= \\left\\Vert \\int_0^1 \\left[ J(x + tp) - J(x) \\right] p dt \\right\\Vert\\\\\n", " &\\leq \\int_0^1 \\left\\Vert \\left[ J(x + tp) - J(x) \\right] p \\right\\Vert dt\n", " & \\quad & \\text{(4.1.9)}\\\\\n", " &\\leq \\int_0^1 \\left\\Vert J(x + tp) - J(x) \\right\\Vert\n", " \\left\\Vert p \\right\\Vert dt\n", " & \\quad & \\text{Theorem 3.1.3}\\\\\n", " &\\leq \\left\\Vert p \\right\\Vert\n", " \\int_0^1 \\gamma \\left\\Vert tp \\right\\Vert^\\alpha dt\n", " & \\quad &\n", " \\text{Lipschitz Continuous is a subset of Holder Continuous}\\\\\n", " &= \\gamma \\left\\Vert p \\right\\Vert^{1 + \\alpha} \\int_0^1 t^\\alpha dt\\\\\n", " &= \\left\\Vert p \\right\\Vert^{1 + \\alpha} \\frac{\\gamma}{1 + \\alpha}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 9\n", "==========\n", "\n", ".. math::\n", "\n", " F(x) &= \\begin{pmatrix} 3x_1^2 - 2x_2\\\\ x_2^3 - \\frac{1}{x_1} \\end{pmatrix}\\\\\n", " J(x) &= \\begin{pmatrix} 6x_1 & -2\\\\ \\frac{1}{x_1^2} & 3x_2^2 \\end{pmatrix}\n", "\n", "When :math:`x = \\begin{pmatrix} 1 & 1 \\end{pmatrix}^\\top`,\n", "\n", ".. math::\n", "\n", " J(x) = \\begin{pmatrix} 6 & -2\\\\ 1 & 3 \\end{pmatrix}.\n", "\n", "Using the forward difference approximation where :math:`h = 0.01` yields\n", "\n", ".. math::\n", "\n", " F(x) = \\begin{pmatrix} 1\\\\ 0 \\end{pmatrix}\n", " \\quad & \\quad\n", " F(x + h e_1) = \\begin{pmatrix} 1.0603\\\\ 0.0099 \\end{pmatrix}\n", " \\quad & \\quad\n", " F(x + h e_2) = \\begin{pmatrix} 0.98\\\\ 0.0303 \\end{pmatrix}\n", "\n", "resulting in :math:`A = \\begin{pmatrix} 6.03 & -2\\\\ 0.99 & 3.03 \\end{pmatrix}`.\n", "\n", "The accuracy of the approximation using an :math:`\\mathcal{L}_1`-norm is\n", ":math:`0.04`, which makes :math:`\\gamma \\geq 8`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 10\n", "===========\n", "\n", "Define\n", "\n", ".. math::\n", "\n", " \\alpha =\n", " f(x + he_i) - f(x) - h \\nabla f(x)_i - \\frac{h^2}{2} \\nabla^2 f(x)_{ii} +\n", " \\mathcal{O}(h^2)\n", "\n", "and\n", "\n", ".. math::\n", "\n", " \\beta =\n", " f(x - he_i) - f(x) + h \\nabla f(x)_i - \\frac{h^2}{2} \\nabla^2 f(x)_{ii} +\n", " \\mathcal{O}(h^2)\n", "\n", "where :math:`\\left\\vert \\cdot \\right\\vert \\leq \\frac{\\gamma h^3}{6}` (4.1.13).\n", "\n", ".. math::\n", "\n", " \\alpha + \\beta\n", " &= f(x + he_i) - 2f(x) + f(x - he_i) - h^2 \\nabla^2 f(x)_{ii}\\\\\n", " \\left\\vert \\alpha + \\beta \\right\\vert\n", " &= \\left\\vert\n", " f(x + he_i) - 2f(x) + f(x - he_i) - h^2 \\nabla^2 f(x)_{ii}\n", " \\right\\vert\\\\\n", " &\\leq \\frac{\\gamma h^3}{3} \\qquad \\text{triangle inequality}\n", "\n", "Thus, :math:`\\left| A_{ii} - \\nabla^2 f(x)_{ii} \\right| \\leq \\frac{\\gamma h}{3}`\n", "where :math:`A_{ii} = \\frac{f(x + he_i) - 2f(x) + f(x - he_i)}{h^2}`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-4.11:\n", "\n", "Exercise 11\n", "===========\n", "\n", "Substituting :math:`p = he_i + he_j, p = he_i - he_j, p = -he_i + he_j,` and\n", ":math:`p = -he_i - he_j` into Lemma 4.1.14 yields\n", "\n", ".. math::\n", "\n", " \\alpha\n", " &= f(x + he_i + he_j) - f(x) - h \\nabla f(x)_i - h \\nabla f(x)_j -\n", " \\frac{h^2}{2} \\left[\n", " \\nabla^2 f(x)_{ii} + 2 \\nabla^2 f(x)_{ij} + \\nabla^2 f(x)_{jj}\n", " \\right] + \\mathcal{O}(h^2)\\\\\n", " \\beta\n", " &= f(x + he_i - he_j) - f(x) - h \\nabla f(x)_i + h \\nabla f(x)_j -\n", " \\frac{h^2}{2} \\left[\n", " \\nabla^2 f(x)_{ii} - 2 \\nabla^2 f(x)_{ij} + \\nabla^2 f(x)_{jj}\n", " \\right] + \\mathcal{O}(h^2)\\\\\n", " \\eta\n", " &= f(x - he_i + he_j) - f(x) + h \\nabla f(x)_i - h \\nabla f(x)_j -\n", " \\frac{h^2}{2} \\left[\n", " \\nabla^2 f(x)_{ii} - 2 \\nabla^2 f(x)_{ij} + \\nabla^2 f(x)_{jj}\n", " \\right] + \\mathcal{O}(h^2)\\\\\n", " \\nu\n", " &= f(x - he_i - he_j) - f(x) + h \\nabla f(x)_i + h \\nabla f(x)_j -\n", " \\frac{h^2}{2} \\left[\n", " \\nabla^2 f(x)_{ii} + 2 \\nabla^2 f(x)_{ij} + \\nabla^2 f(x)_{jj}\n", " \\right] + \\mathcal{O}(h^2)\n", "\n", "By the triangle inequality,\n", "\n", ".. math::\n", "\n", " \\left| \\alpha - \\beta - \\eta + \\nu \\right|\n", " &\\leq \\left| \\alpha \\right| + \\left| \\beta \\right| +\n", " \\left| \\eta \\right| + \\left| \\nu \\right|\\\\\n", " &\\leq \\frac{\\gamma h^3}{6} \\left[\n", " \\left\\Vert e_i + e_j \\right\\Vert^3 +\n", " \\left\\Vert e_i - e_j \\right\\Vert^3 +\n", " \\left\\Vert -e_i + e_j \\right\\Vert^3 +\n", " \\left\\Vert -e_i - e_j \\right\\Vert^3\n", " \\right]\\\\\n", " &\\leq \\frac{16}{3} \\gamma h^3.\n", "\n", "Solving for :math:`\\nabla^2 f(x)_{ij}` yields\n", "\n", ".. math::\n", "\n", " \\left| \\alpha - \\beta - \\eta + \\nu \\right|\n", " &= \\left|\n", " f(x + he_i + he_j) - f(x + he_i - he_j) - f(x - he_i + he_j) +\n", " f(x - he_i - he_j) - 4h^2 \\nabla^2 f(x)_{ij}\n", " \\right|\\\\\n", " &\\leq \\frac{16}{3} \\gamma h^3.\n", "\n", "Hence\n", ":math:`\\left| A_{ij} - \\nabla^2 f(x)_{ij} \\right| \\leq \\frac{4}{3} \\gamma h`\n", "where\n", "\n", ".. math::\n", "\n", " A_{ij} =\n", " \\frac{f(x + he_i + he_j) - f(x + he_i - he_j) -\n", " f(x - he_i + he_j) + f(x - he_i - he_j)}{4h^2}.\n", "\n", "The number of function evaluations are still the same (unless :math:`f(x)` is\n", "already computed), but this new approximation reduces the upper bound by\n", ":math:`20\\%`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _dennis1996numerical-ex-4.12:\n", "\n", "Exercise 12\n", "===========\n", "\n", "Define :math:`A, M \\in \\mathbb{R}^{n \\times n}` where :math:`M = M^\\top` and\n", ":math:`\\hat{A} = \\frac{A + A^\\top}{2}`.\n", "\n", ".. math::\n", "\n", " \\left\\Vert M - \\hat{A} \\right\\Vert_F^2\n", " &= \\text{tr}\\left(\n", " \\left( M - \\hat{A} \\right)^\\top \\left( M - \\hat{A} \\right)\n", " \\right)\\\\\n", " &= \\text{tr}\\left(M^\\top M\\right) - \\text{tr}\\left(M^\\top \\hat{A}\\right) -\n", " \\text{tr}\\left(\\hat{A}^\\top M\\right) +\n", " \\text{tr}\\left(\\hat{A}^\\top \\hat{A}\\right)\n", " & \\quad & \\text{trace operator is a linear mapping}\\\\\n", " &= \\text{tr}\\left(M^\\top M\\right) - 2\\text{tr}\\left(M^\\top \\hat{A}\\right) +\n", " \\text{tr}\\left(\\hat{A}^\\top \\hat{A}\\right)\n", " & \\quad & \\text{trace equivalence under transpose}\\\\\n", " &= \\text{tr}\\left(M^\\top M\\right) - \\text{tr}\\left(M^\\top A\\right) -\n", " \\text{tr}\\left(M^\\top A^\\top\\right) +\n", " \\text{tr}\\left(\\hat{A}^\\top \\hat{A}\\right)\\\\\n", " &= \\text{tr}\\left(M^\\top M\\right) - \\text{tr}\\left(M^\\top A\\right) -\n", " \\text{tr}\\left(A^\\top M\\right) +\n", " \\text{tr}\\left(\\hat{A}^\\top \\hat{A}\\right)\n", " & \\quad & \\text{trace is invariant under cyclic permutations}\\\\\n", " &= \\text{tr}\\left(M^\\top M\\right) - 2 \\text{tr}\\left(M^\\top A\\right) +\n", " \\frac{1}{4} \\left\\Vert A + A^\\top \\right\\Vert_F^2\\\\\n", " &\\leq \\text{tr}\\left(M^\\top M\\right) - 2 \\text{tr}\\left(M^\\top A\\right) +\n", " \\left\\Vert A \\right\\Vert_F^2\n", " & \\quad & \\text{triangle inequality applied to matrix norm}\\\\\n", " &\\leq \\text{tr}\\left(M^\\top M\\right) - 2 \\text{tr}\\left(M^\\top A\\right) +\n", " \\text{tr}\\left(A^\\top A\\right)\\\\\n", " &= \\left\\Vert M - A \\right\\Vert_F^2\\\\\n", " \\left\\Vert M - \\hat{A} \\right\\Vert_F &\\leq \\left\\Vert M - A \\right\\Vert_F" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 13\n", "===========\n", "\n", "Proof by contradiction; assume that :math:`x` is a local minimizer of :math:`f`\n", "(i.e. :math:`\\nabla^2 f(x)` is not indefinite) and :math:`\\nabla^2 f(x)` is\n", "negative definite (i.e.\n", ":math:`\\forall p \\in \\mathbb{R}^n, p^\\top \\nabla^2 f(x) p < 0`).\n", "\n", "Since the Hessian is a real symmetric square matrix, its eigendecomposition\n", "exists as\n", "\n", ".. math::\n", "\n", " \\nabla^2 f(x) = Q \\Lambda Q^\\top = \\sum_{i = 1}^n \\lambda_i q_i q_i^\\top.\n", "\n", "Pick :math:`p = c q_i` where :math:`q_i` could be any of the eigenvectors that\n", "has a corresponding negative eigenvalue and :math:`c \\in \\mathbb{R} \\setminus 0`\n", "such that :math:`x + p \\in D`.\n", "\n", "By the continuity of :math:`\\nabla^2 f`, there exists an open convex subset\n", ":math:`D` such that for every :math:`0 \\leq t \\leq 1, x + tp \\in D`. Applying\n", "Lemma :math:`4.1.5` for some :math:`z \\in [x, x + p]` yields\n", "\n", ".. math::\n", "\n", " f(x + p) &= f(x) + \\nabla f(x)^\\top p + \\frac{1}{2} p^\\top \\nabla^2 f(x) p\\\\\n", " &= f(x) + \\frac{1}{2} p^\\top \\nabla^2 f(x) p\\\\\n", " &= f(x) + \\frac{1}{2} c^2 \\lambda_i\\\\\n", " f(x + p) &< f(x)\n", "\n", "which is a contradiction because :math:`x` is a local minimizer of :math:`f`.\n", "Thus, :math:`\\nabla^2 f(x)` is positive semidefinite.\n", "\n", "This proof can be generalized to functions that are Lipschitz continuous. If\n", ":math:`\\nabla^2 f \\in \\text{Lip}_\\gamma(D)` at :math:`x`, then there exists an\n", "open convex set :math:`D \\subset \\mathbb{R}^n` with :math:`x \\in D`, which\n", "matches the assumptions of Theorem 4.3.2." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 14\n", "===========\n", "\n", "If :math:`x_*` is a local minimizer of :math:`f`, Theorem 4.3.2 states that\n", ":math:`\\nabla^2 f(x_*)` must be positive semidefinite. Since\n", ":math:`\\nabla^2 f(x_*)` is nonsingular, none of its eigenvalues are zero, which\n", "means each of them must be positive in this case. This is equivalent to being\n", "positive definite.\n", "\n", "If :math:`\\nabla^2 f(x_*)` is positive definite, Theorem 4.3.2 states that\n", ":math:`x_*` is a local minimizer of :math:`f`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 15\n", "===========\n", "\n", "If :math:`f` has a unique minimizer :math:`x_*`, then\n", ":math:`\\nabla f(x_*) = 0 = Ax_* + b`. Since :math:`Ax_* = -b` has a unique\n", "solution, it is equivalent to saying :math:`A` has full rank and is therefore\n", "invertible. According to Theorem 4.3.2, :math:`\\nabla^2 f(x) = A` must be\n", "positive semidefinite. Hence when :math:`x_*` is the unique minimizer,\n", ":math:`A` must be positive definite in order to yield :math:`x_* = -A^{-1} b`.\n", "\n", "If :math:`A` is positive definite, Theorem 4.3.2 states that :math:`x` is a\n", "local minimizer when :math:`\\nabla f(x) = Ax + b = 0`. Setting\n", ":math:`x = -A^{-1} b` satisfies that condition. The fundamental theorem of\n", "Linear Algebra states that :math:`x` is a unique solution because the nullspace\n", "of :math:`A` is zero." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 16\n", "===========\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator*{\\argmin}{arg\\,min}\n", " \\argmin_{x \\in \\mathbb{R}^n} f(x)\n", " &= \\argmin_{x \\in \\mathbb{R}^n} \\left\\Vert Ax - b \\right\\Vert_2\\\\\n", " &= \\argmin_{x \\in \\mathbb{R}^n} \\left\\Vert Ax - b \\right\\Vert_2^2\\\\\n", " &= \\argmin_{x \\in \\mathbb{R}^n} x^\\top A^\\top Ax - 2 b^\\top Ax + b^\\top b\n", "\n", "The necessary conditions for :math:`x_*` to be a local minimizer of :math:`f`\n", "are :math:`\\nabla f(x_*) = 0` and :math:`\\nabla^2 f(x_*)` is positive\n", "semidefinite. The sufficient condition is that :math:`\\nabla^2 f(x_*)` must be\n", "positive definite. Solving for the gradient and the Hessian yields\n", "\n", ".. math::\n", "\n", " \\nabla f(x) = 2 x^\\top A^\\top A - 2 b^\\top A\n", " \\quad \\text{and} \\quad\n", " \\nabla^2 f(x) = 2 A^\\top A.\n", "\n", "Solving for :math:`x_*` indeed satisfies :math:`A^\\top A x_* = A^\\top b`;\n", "according to Theorem 4.3.2, the Hessian is positive semidefinite, so Theorem\n", "3.6.5 is needed to compute the pseudo-inverse of :math:`A`." ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }