{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "*******************************************\n", "Methods for Problems with Special Structure\n", "*******************************************\n", "\n", "Sparse Secant Methods\n", "=====================\n", "\n", "- Sparse symmetric secant updates do not necessarily preserve positive\n", " definiteness.\n", "\n", " - Experiments with these updates did not yield good results.\n", "\n", "- Feasible solutions are either the finite-difference technique of Section\n", " 11.1 or conjugate gradient.\n", "\n", " - Conjugate gradient are inefficient for small problems, but have proven to\n", " be a leading contender for large problems." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 1\n", "==========\n", "\n", "From definition :math:`4.1.7`, :math:`J(x) = F'(x) \\in \\mathbb{R}^{n \\times n}`\n", "because :math:`F \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^n`.\n", "\n", "For a banded matrix,\n", ":math:`J(x)_{ij} = F'(x)_{ij} = \\frac{\\partial f_i}{\\partial x_j}(x) = 0` if\n", ":math:`\\left\\vert i - j \\right\\vert > m` and\n", ":math:`m < \\lfloor \\frac{n}{2} \\rfloor`.\n", "\n", "As shown in (4.2.1), evaluating a finite difference along parameter\n", ":math:`j`'s dimension (:math:`h_j e_j`) is equivalent to approximating the\n", ":math:`j\\text{th}` column of :math:`J(x)`.\n", "\n", "Generalizing the technique in Section 11.1 gives\n", "\n", ".. math::\n", "\n", " \\{ j \\mid j = k \\bmod (1 + 2m) \\},\n", "\n", "which means :math:`(1 + 2m)` additional evaluations of :math:`F(x)` are required\n", "at each iteration to approximate :math:`J(x)`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def printSparseMatrix(n, m):\n", " m = min(m, n / 2)\n", " for i in range(n):\n", " for j in range(n):\n", " if abs(i - j) > m:\n", " print(' 0', end='')\n", " else:\n", " print(' 1', end='')\n", " print('\\n')\n", "\n", "printSparseMatrix(8, 3)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 2\n", "==========\n", "\n", ":math:`\\Gamma = \\varnothing` when :math:`J(x)` is not symmetric because each\n", "column will have at least two non-zero elements. Thus this technique will\n", "require the full five evaluations of :math:`F(x)` in addition to :math:`F(x_c)`\n", "to approximate :math:`J(x)`.\n", "\n", "When :math:`J(x)` is symmetric, :math:`F(x_c + h_5 e_5)` could be evaluated to\n", "indirectly approximate the bottom row of :math:`J(x)`. The left four columns of\n", ":math:`J(x)` can be evaluated simultaneously using\n", ":math:`w = F(x_c + \\sum_{j = 1}^4 h_j e_j)` while ignoring the result `w_5`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 3\n", "==========\n", "\n", "Let :math:`A_{i.}, B_{i.} \\in \\mathbb{R}^n` denote the :math:`i\\text{th}` row of\n", "each matrix and :math:`\\mathrm{b_i} = B_{i.}` be defined by (11.2.6). This\n", "minimization problem is (11.2.4) with one fewer constraint:\n", "\n", ".. math::\n", "\n", " \\left\\Vert B - A \\right\\Vert_F^2\n", " &= \\sum_{i = 1}^n \\left\\Vert B_{i.} - A_{i.} \\right\\Vert_2^2\\\\\n", " &= \\sum_{i = 1}^n \\sum_{j = 1}^n (\\mathrm{b_i}_j - A_{ij})^2\\\\\n", " &= \\sum_{i = 1}^n \\left(\n", " \\sum_{k \\in \\mathcal{Z}_{i1}} (\\mathrm{b_i}_k - A_{ik})^2 +\n", " \\sum_{l \\in \\mathcal{Z}_{i0}} A_{il}^2\n", " \\right)\n", "\n", "where :math:`\\mathcal{Z}_{i1} = \\{ j \\mid Z_{ij} = 1 \\}` and\n", ":math:`\\mathcal{Z}_{i0} = \\{ j \\mid Z_{ij} = 0 \\}`. By inspection, this is\n", "minimized when :math:`B = P_Z(A)`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 4\n", "==========\n", "\n", ".. math::\n", "\n", " \\left\\Vert A_{+_{i.}} - A_{i.} \\right\\Vert_2\n", " &= \\left\\Vert\n", " (\\mathrm{s_i}^\\top \\mathrm{s_i})^+ (y - As)_i \\mathrm{s_i}^\\top\n", " \\right\\Vert_2\n", " & \\quad & \\text{(11.2.9)}\\\\\n", " &= \\left\\Vert (y - As)_i \\right\\Vert_2\n", " \\left\\Vert\n", " (\\mathrm{s_i}^\\top \\mathrm{s_i})^+ \\mathrm{s_i}\n", " \\right\\Vert_2\n", " & \\quad & \\text{(3.1.17)}\\\\\n", " &= \\left\\Vert (B_{i.} - A_{i.})^\\top \\mathrm{s_i} \\right\\Vert_2\n", " \\left\\Vert\n", " (\\mathrm{s_i}^\\top \\mathrm{s_i})^+ \\mathrm{s_i}\n", " \\right\\Vert_2\n", " & \\quad & \\text{(11.2.8)}\\\\\n", " &\\leq \\left\\Vert (B_{i.} - A_{i.}) \\right\\Vert_2\n", " \\left\\Vert \\mathrm{s_i} \\right\\Vert_2\n", " \\left\\Vert\n", " (\\mathrm{s_i}^\\top \\mathrm{s_i})^+ \\mathrm{s_i}\n", " \\right\\Vert_2\n", " & \\quad & \\text{Cauchy-Schwarz inequality}\\\\\n", " &= \\left\\Vert (B_{i.} - A_{i.}) \\right\\Vert_2\n", " (\\mathrm{s_i}^\\top \\mathrm{s_i})^+\n", " \\left\\Vert \\mathrm{s_i} \\right\\Vert_2^2\n", " & \\quad & \\text{(3.1.1b)}\\\\\n", " &= \\left\\Vert (B_{i.} - A_{i.}) \\right\\Vert_2\n", " & \\quad & \\text{(3.6.6)}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 5\n", "==========\n", "\n", "Notice that applying lemma 4.1.9 yields\n", "\n", ".. math::\n", "\n", " y = F(x_+) - F(x) = F(x + s) - F(x) = \\int_0^1 J(x + ts) s dt.\n", "\n", "A component-by-component application of (4.1.2) (see\n", ":ref:`Exercise 4.4 `) reveals\n", "\n", ".. math::\n", "\n", " y_i =\n", " F(x_+)_i - F(x)_i =\n", " F(x + s)_i - F(x)_i =\n", " \\int_0^1 \\nabla f_i(x + ts)^\\top \\mathrm{s_i} dt\n", "\n", "where :math:`\\mathrm{s_i}` denotes the result of imposing on :math:`s` on the\n", "sparsity pattern of the :math:`i\\text{th}` row of\n", ":math:`F'(x) = J(x) \\in SP(Z)`. When :math:`\\mathrm{s_i} = 0`, the entire\n", "integral evaluates to :math:`0 = y_i`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 6\n", "==========" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "import numpy\n", "\n", "n = 5\n", "A = numpy.zeros((n, n))\n", "for i in range(n):\n", " for j in range(n):\n", " v = 4\n", " _ = abs(i - j)\n", " if _ == 1:\n", " v = 1\n", " elif _ > 1:\n", " v = 0\n", " A[i,j] = v\n", "\n", "print(A)\n", "print(numpy.linalg.inv(A))" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7\n", "==========\n", "\n", "The Thomas algorithm can be used to solve the tridiagonal system of :math:`n`\n", "unknowns, but it's not stable in general. Since :math:`A` is symmetric positive\n", "definite, another approach is to rewrite :math:`A = LL^\\top` (Cholesky\n", "decomposition) and compute the resulting quantities :cite:`bar1997fast`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8\n", "==========\n", "\n", "Let :math:`f(x) = \\rho(R(x))` where\n", ":math:`R \\colon \\mathbb{R}^n \\rightarrow \\mathbb{R}^m` and\n", ":math:`\\rho \\colon \\mathbb{R}^m \\rightarrow \\mathbb{R}`.\n", "\n", "Only :math:`\\nabla \\rho(R(x))` was derived because the derivation for\n", ":math:`\\nabla^2 \\rho(R(x))` is essentially\n", ":ref:`Exercise 6.17 ` and\n", ":ref:`Exercise 6.18 `. See (11.3.4) for a more\n", "generalized derivation.\n", "\n", "(a)\n", "---\n", "\n", "Recall that :math:`\\lvert x \\rvert = \\sqrt{x^2}` and\n", ":math:`\\frac{d}{dx} \\lvert x \\rvert = \\frac{d}{dx} \\sqrt{x^2} =\n", "\\frac{x}{\\lvert x \\rvert}`.\n", "\n", "Let :math:`\\rho(z) = \\sum_{i = 1}^m \\rho_1(z_i)` where :math:`\\rho_1` is defined\n", "as (10.4.4). Applying definition 4.1.1 with the chain rule yields\n", "\n", ".. math::\n", "\n", " \\nabla \\rho(z = R(x)) &= \\frac{d}{dx} \\sum_{i = 1}^m \\rho_1(z_i)\\\\\n", " &= \\sum_{i = 1}^m\n", " \\begin{bmatrix}\n", " \\frac{\\partial}{\\partial x_1} \\rho_1(z_i)\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial}{\\partial x_n} \\rho_1(z_i)\\\\\n", " \\end{bmatrix}\\\\\n", " &= \\sum_{i = 1}^m\n", " \\rho_1'(z_i)\n", " \\begin{bmatrix}\n", " \\frac{\\partial z_i}{\\partial x_1}\\\\\n", " \\vdots\\\\\n", " \\frac{\\partial z_i}{\\partial x_n}\\\\\n", " \\end{bmatrix}\n", " & \\quad & \\text{(a.1)}\\\\\n", " &= \\sum_{i = 1}^m \\rho_1'(z_i) \\nabla z_i.\n", "\n", "(a.1)\n", "^^^^^\n", "\n", ".. math::\n", "\n", " \\frac{\\partial}{\\partial x_j} \\rho_1(z_i)\n", " &= \\begin{cases}\n", " \\frac{\\partial}{\\partial x_j} \\frac{z_i^2}{2}, &\n", " \\text{$|z_i| \\leq k$},\\\\\n", " \\frac{\\partial}{\\partial x_j} \\left( k |z_i| - \\frac{k^2}{2} \\right), &\n", " \\text{$|z_i| > k$}.\n", " \\end{cases}\\\\\n", " &= \\begin{cases}\n", " z_i \\frac{\\partial z_i}{\\partial x_j}, & \\text{$|z_i| \\leq k$},\\\\\n", " k \\frac{z_i}{|z_i|} \\frac{\\partial z_i}{\\partial x_j}, &\n", " \\text{$|z_i| > k$}.\n", " \\end{cases}\n", " & \\quad & \\frac{\\partial}{\\partial x_j} =\n", " \\frac{\\partial}{\\partial z_i}\n", " \\frac{\\partial z_i}{\\partial x_j}\\\\\n", " &= \\rho_1'(z_i) \\frac{\\partial z_i}{\\partial x_j}\n", "\n", "(b)\n", "---\n", "\n", "Let :math:`\\rho(z) = \\sum_{i = 1}^m \\rho_2(z_i)` where :math:`\\rho_2` is defined\n", "as (10.4.5). Applying definition 4.1.1 with the chain rule yields\n", "\n", ".. math::\n", "\n", " \\nabla \\rho(z = R(x)) &= \\frac{d}{dx} \\sum_{i = 1}^m \\rho_2(z_i)\\\\\n", " &= \\sum_{i = 1}^m\n", " \\rho_2'(z_i) \\nabla z_i\n", " & \\quad & \\text{(a) and (b.1)}\n", "\n", "(b.1)\n", "^^^^^\n", "\n", ".. math::\n", "\n", " \\frac{\\partial}{\\partial x_j} \\rho_2(z_i)\n", " &= \\begin{cases}\n", " \\frac{\\partial}{\\partial x_j} \\frac{k^2}{6}\n", " \\left(\n", " 1 - \\left( 1 - \\left( \\frac{z_i}{k} \\right)^2 \\right)^3\n", " \\right), & \\text{$|z_i| \\leq k$},\\\\\n", " \\frac{\\partial}{\\partial x_j} \\frac{k^2}{6}, & \\text{$|z_i| > k$}.\n", " \\end{cases}\\\\\n", " &= \\begin{cases}\n", " z_i \\left( 1 - \\left( \\frac{z_i}{k} \\right)^2 \\right)^2\n", " \\frac{\\partial z_i}{\\partial x_j}, & \\text{$|z_i| \\leq k$},\\\\\n", " 0, & \\text{$|z_i| > k$}.\n", " \\end{cases}\n", " & \\quad & \\frac{\\partial}{\\partial x_j} =\n", " \\frac{\\partial}{\\partial z_i}\n", " \\frac{\\partial z_i}{\\partial x_j}\\\\\n", " &= \\rho_2'(z_i) \\frac{\\partial z_i}{\\partial x_j}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. rubric:: References\n", "\n", ".. bibliography:: chapter-11.bib" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }