{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "*****************\n", "Regression Models\n", "*****************\n", "\n", "Non-linear Regression\n", "=====================\n", "\n", "- Radial basis functions denote any spherically symmetric function.\n", "- Bayesian approach to nonlinear regression (8.24) is rarely used in practice\n", " because it requires computing :math:`z_i^\\top z_j`, which could be infinite\n", " length.\n", "- Mercer's Theorem avoids computing :math:`z` explicitly.\n", "\n", " - A kernel function :math:`k[x_i, x_j]` is valid when the kernel's arguments\n", " are in a measurable space and the kernel is positive semi-definite.\n", "\n", " - See (8.26), (8.27), (8.28) for examples of a linear, polynomial, and RBF\n", " kernel.\n", "\n", " - The sum and products of valid kernels are valid kernels.\n", "\n", "- The kernel trick consists of choosing a kernel function :math:`k[x_i, x_j]` to\n", " replace :math:`f[x_i]^\\top f[x_j]` without knowing what :math:`f[\\bullet]`\n", " does." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.1\n", "============\n", "\n", "Assuming a linear relation between the world state and the data, let\n", ":math:`\\boldsymbol{\\theta} = \\{ \\alpha, \\beta, \\boldsymbol{\\phi} \\}` and\n", ":math:`\\alpha = \\beta = \\boldsymbol{\\phi}^\\top \\mathbf{x}` where\n", ":math:`\\alpha, \\beta > 0` and :math:`\\boldsymbol{\\phi} \\in \\mathbb{R}^{D + 1}`.\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\GamDist}{Gam}\n", " Pr(w \\mid \\mathbf{x}, \\boldsymbol{\\theta})\n", " &= \\GamDist_w[\\alpha, \\beta]\\\\\n", " &= \\frac{\\beta^\\alpha}{\\Gamma[\\alpha]} \\exp[-\\beta w] w^{\\alpha - 1}\n", " & \\quad & \\text{(7.23)}\n", "\n", "In the maximum likelihood approach, the goal is\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator*{\\argmax}{arg\\,max}\n", " \\hat{\\boldsymbol{\\theta}}\n", " &= \\argmax_{\\boldsymbol{\\theta}}\n", " Pr(w \\mid \\mathbf{x}, \\boldsymbol{\\theta})\\\\\n", " &= \\argmax_{\\boldsymbol{\\theta}} L\n", " & \\quad & L = \\log Pr(w \\mid \\mathbf{x}, \\boldsymbol{\\theta})\\\\\n", " &= \\argmax_{\\boldsymbol{\\theta}}\n", " \\alpha \\log(\\beta) - \\log(\\Gamma[\\alpha]) -\n", " \\beta w + (\\alpha - 1) \\log(w)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.2\n", "============\n", "\n", "Assuming a linear relation between the world state and the data, let\n", ":math:`\\boldsymbol{\\theta} = \\{ \\mu, \\sigma, \\boldsymbol{\\phi} \\}` and\n", ":math:`\\mu = \\boldsymbol{\\phi}^\\top \\mathbf{x}` where\n", ":math:`\\sigma > 0` and :math:`\\boldsymbol{\\phi} \\in \\mathbb{R}^{D + 1}`.\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\StudDist}{Stud}\n", " \\DeclareMathOperator{\\NormDist}{Norm}\n", " Pr(w \\mid \\mathbf{x}, \\boldsymbol{\\theta})\n", " &= \\StudDist_w\\left[ \\mu, \\sigma^2, \\nu \\right]\\\\\n", " &= \\int \\NormDist_w\\left[ \\mu, \\sigma^2 / 2 \\right]\n", " \\GamDist_h[\\nu / 2, \\nu / 2] dh\n", " & \\quad & \\text{(7.24)}\n", "\n", "The EM algorithm can be used to fit :math:`\\boldsymbol{\\theta}` in the maximum\n", "likelihood approach." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.3\n", "============\n", "\n", ".. math::\n", "\n", " L &= \\log Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\theta})\\\\\n", " &= \\log \\NormDist_\\mathbf{w}\\left[\n", " \\mathbf{X}^\\top \\boldsymbol{\\phi}, \\sigma^2 \\mathbf{I}\n", " \\right]\\\\\n", " &= -\\frac{D}{2} \\log(2 \\pi) - \\frac{D}{2} \\log\\left( \\sigma^2 \\right) -\n", " \\frac{\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)^\\top\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)\n", " }{\n", " 2 \\sigma^2\n", " }\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\sigma}\n", " &= -\\frac{D}{\\sigma} +\n", " \\sigma^{-3}\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)^\\top\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)\\\\\n", " 0 &= -D \\sigma^2 +\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)^\\top\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)\\\\\n", " \\sigma^2\n", " &= D^{-1}\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)^\\top\n", " \\left( \\mathbf{w} - \\mathbf{X}^\\top \\boldsymbol{\\phi} \\right)\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\phi}}\n", " &= -\\frac{1}{2 \\sigma^2} \\left[\n", " \\mathbf{X}\n", " \\left( \\mathbf{X}^\\top \\boldsymbol{\\phi} - \\mathbf{w} \\right) +\n", " \\mathbf{X}\n", " \\left( \\mathbf{X}^\\top \\boldsymbol{\\phi} - \\mathbf{w} \\right)\n", " \\right]\n", " & \\quad & \\text{(C.32)}\\\\\n", " 0 &= -\\mathbf{X} \\mathbf{X}^\\top \\boldsymbol{\\phi} + \\mathbf{X} \\mathbf{w}\\\\\n", " \\boldsymbol{\\phi} &= (\\mathbf{X} \\mathbf{X}^\\top)^{-1} \\mathbf{X} \\mathbf{w}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.4\n", "============\n", "\n", ".. math::\n", "\n", " Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X}, \\mathbf{w})\n", " &= \\frac{\n", " Pr(\\boldsymbol{\\phi}, \\mathbf{X}, \\mathbf{w})\n", " }{\n", " Pr(\\mathbf{X}, \\mathbf{w})\n", " }\\\\\n", " &= \\frac{\n", " Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\phi})\n", " Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X})\n", " Pr(\\mathbf{X})\n", " }{\n", " Pr(\\mathbf{w} \\mid \\mathbf{X}) Pr(\\mathbf{X})\n", " }\\\\\n", " &= \\frac{\n", " Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\phi})\n", " Pr(\\boldsymbol{\\phi})\n", " }{\n", " Pr(\\mathbf{w} \\mid \\mathbf{X})\n", " }\n", "\n", "The foregoing implies that (8.8) assumes\n", ":math:`Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X}) = Pr(\\boldsymbol{\\phi})` i.e.\n", "the data does not affect the prior on the parameters (8.7).\n", "\n", "Furthermore, (8.9) assumes that :math:`\\sigma^2` is known, so\n", ":math:`Pr(\\mathbf{w} \\mid \\mathbf{X},\n", "\\boldsymbol{\\theta} = \\left\\{ \\boldsymbol{\\phi}, \\sigma^2 \\right\\}) =\n", "Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\phi})`.\n", "\n", ".. math::\n", "\n", " Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X}, \\mathbf{w})\n", " &= \\frac{1}{Pr(\\mathbf{w} \\mid \\mathbf{X})}\n", " \\NormDist_{\\mathbf{w}}\\left[\n", " \\mathbf{X}^\\top \\boldsymbol{\\phi}, \\sigma^2 \\mathbf{I}_I\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\boldsymbol{0}, \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right]\\\\\n", " &= \\frac{\\kappa_1}{Pr(\\mathbf{w} \\mid \\mathbf{X})}\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1}\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\boldsymbol{0}, \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right]\n", " & \\quad & \\text{(a)}\\\\\n", " &= \\frac{\\kappa_1 \\kappa_2}{Pr(\\mathbf{w} \\mid \\mathbf{X})}\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\frac{1}{\\sigma^2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w},\n", " \\mathbf{A}^{-1}\n", " \\right]\n", " & \\quad & \\text{(b)}\\\\\n", " &= \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\frac{1}{\\sigma^2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w},\n", " \\mathbf{A}^{-1}\n", " \\right]\n", " & \\quad & \\text{(c)}\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\mathbf{A} =\n", " \\frac{1}{\\sigma^2} \\mathbf{X} \\mathbf{X}^\\top +\n", " \\frac{1}{\\sigma_p^2} \\mathbf{I}_{D + 1}.\n", "\n", "(a)\n", "---\n", "\n", "See :ref:`Exercise 5.10 ` for more details.\n", "\n", ".. math::\n", "\n", " \\NormDist_{\\mathbf{w}}\\left[\n", " \\mathbf{X}^\\top \\boldsymbol{\\phi}, \\sigma^2 \\mathbf{I}_I\n", " \\right]\n", " &= \\kappa_1 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\sigma^{-2} \\right)^{-1}\n", " \\sigma^{-2} \\mathbf{X} \\mathbf{w},\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\sigma^{-2} \\right)^{-1}\n", " \\right]\\\\\n", " &= \\kappa_1 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1}\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\kappa_1\n", " &= \\frac{\n", " \\left\\vert\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1}\n", " \\right\\vert^{1 / 2}\n", " }{\n", " \\left\\vert \\sigma^2 \\mathbf{I}_I \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " \\mathbf{w}^\\top\n", " \\left(\n", " \\sigma^{-2} \\mathbf{I}_I -\n", " \\sigma^{-2} \\mathbf{I}_I \\mathbf{X}^\\top\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X}\n", " \\sigma^{-2} \\mathbf{I}_I\n", " \\right)^{-1}\n", " \\mathbf{w}\n", " \\right]^{-0.5}\\\\\n", " &= \\frac{1}{\n", " \\sigma^{I + D + 1}\n", " \\left\\vert \\mathbf{X} \\mathbf{X}^\\top \\right\\vert^{1/2}\n", " }\n", " \\exp\\left[\n", " \\mathbf{w}^\\top\n", " \\left(\n", " \\sigma^{-2} \\mathbf{I}_I -\n", " \\sigma^{-2} \\mathbf{I}_I \\mathbf{X}^\\top\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X}\n", " \\sigma^{-2} \\mathbf{I}_I\n", " \\right)^{-1}\n", " \\mathbf{w}\n", " \\right]^{-0.5}\\\\\n", " &= \\frac{1}{\n", " \\sigma^{I + D + 1}\n", " \\left\\vert \\mathbf{X} \\mathbf{X}^\\top \\right\\vert^{1/2}\n", " }\n", " \\exp\\left[\n", " \\mathbf{w}^\\top\n", " \\left(\n", " \\sigma^{-2} \\mathbf{I}_I -\n", " \\sigma^{-2} \\mathbf{X}^\\top\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X}\n", " \\right)^{-1}\n", " \\mathbf{w}\n", " \\right]^{-0.5}\n", "\n", "(b)\n", "---\n", "\n", "See :ref:`Exercise 5.7 ` and\n", ":ref:`Exercise 5.9 ` for more details.\n", "\n", ".. math::\n", "\n", " & \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1}\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\boldsymbol{0}, \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right]\\\\\n", " &= \\kappa_2 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left(\n", " \\frac{1}{\\sigma^2} \\mathbf{X} \\mathbf{X}^\\top +\n", " \\frac{1}{\\sigma_p^2} \\mathbf{I}_{D + 1}\n", " \\right)^{-1}\n", " \\sigma^{-2} \\mathbf{X} \\mathbf{X}^\\top\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\left(\n", " \\frac{1}{\\sigma^2} \\mathbf{X} \\mathbf{X}^\\top +\n", " \\frac{1}{\\sigma_p^2} \\mathbf{I}_{D + 1}\n", " \\right)^{-1}\n", " \\right]\\\\\n", " &= \\kappa_2 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\frac{1}{\\sigma^2}\n", " \\left(\n", " \\frac{1}{\\sigma^2} \\mathbf{X} \\mathbf{X}^\\top +\n", " \\frac{1}{\\sigma_p^2} \\mathbf{I}_{D + 1}\n", " \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\left(\n", " \\frac{1}{\\sigma^2} \\mathbf{X} \\mathbf{X}^\\top +\n", " \\frac{1}{\\sigma_p^2} \\mathbf{I}_{D + 1}\n", " \\right)^{-1}\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\kappa_2\n", " &= \\NormDist_{\\boldsymbol{0}}\\left[\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w},\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} +\n", " \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right]\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{(D + 1) / 2}\n", " \\left\\vert\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} +\n", " \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " \\left(\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w}\n", " \\right)^\\top\n", " \\left(\n", " \\sigma^2 \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} +\n", " \\sigma_p^2 \\mathbf{I}_{D + 1}\n", " \\right)^{-1}\n", " \\left(\n", " \\left( \\mathbf{X} \\mathbf{X}^\\top \\right)^{-1} \\mathbf{X} \\mathbf{w}\n", " \\right)\n", " \\right]^{-0.5}\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{w} \\mid \\mathbf{X})\n", " &= \\int Pr(\\boldsymbol{\\phi}, \\mathbf{w} \\mid \\mathbf{X})\n", " d\\boldsymbol{\\phi}\\\\\n", " &= \\int Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\phi})\n", " Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X}) d\\boldsymbol{\\phi}\\\\\n", " &= \\int Pr(\\mathbf{w} \\mid \\mathbf{X}, \\boldsymbol{\\phi})\n", " Pr(\\boldsymbol{\\phi}) d\\boldsymbol{\\phi}\\\\\n", " &= \\int \\kappa_1 \\kappa_2\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\frac{1}{\\sigma^2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w},\n", " \\mathbf{A}^{-1}\n", " \\right] d\\boldsymbol{\\phi}\\\\\n", " &= \\kappa_1 \\kappa_2" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.5\n", "============\n", "\n", "Let :math:`w^\\ast, \\sigma \\in \\mathbb{R}`;\n", ":math:`\\mathbf{x}^\\ast, \\boldsymbol{\\phi} \\in \\mathbb{R}^{D + 1}`;\n", ":math:`\\mathbf{A} \\in \\mathbb{R}^{(D + 1) \\times (D + 1)}`;\n", ":math:`\\mathbf{X} \\in \\mathbb{R}^{(D + 1) \\times I}`;\n", ":math:`\\mathbf{w} \\in \\mathbb{R}^I`.\n", "\n", ".. math::\n", "\n", " Pr(w^\\ast \\mid \\mathbf{x}^\\ast, \\mathbf{X}, \\mathbf{w})\n", " &= \\int Pr(\\boldsymbol{\\phi}, w^\\ast \\mid\n", " \\mathbf{x}^\\ast, \\mathbf{X}, \\mathbf{w}) d\\boldsymbol{\\phi}\\\\\n", " &= \\int Pr(w^\\ast \\mid \\mathbf{x}^\\ast, \\boldsymbol{\\phi})\n", " Pr(\\boldsymbol{\\phi} \\mid \\mathbf{X}, \\mathbf{w}) d\\boldsymbol{\\phi}\n", " & \\quad & \\text{conditional independence}\\\\\n", " &= \\int\n", " \\NormDist_{w^\\ast}\\left[\n", " \\boldsymbol{\\phi}^\\top \\mathbf{x}^\\ast, \\sigma^2\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}, \\mathbf{A}^{-1}\n", " \\right] d\\boldsymbol{\\phi}\n", " & \\quad & \\text{(8.2) and (8.10)}\\\\\n", " &= \\int \\kappa_1\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast,\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}, \\mathbf{A}^{-1}\n", " \\right] d\\boldsymbol{\\phi}\n", " & \\quad & \\text{(a)}\\\\\n", " &= \\int \\kappa_1 \\kappa_2\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\boldsymbol{\\Sigma} \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast w^\\ast +\n", " \\sigma^{-2} \\mathbf{X} \\mathbf{w}\n", " \\right),\n", " \\boldsymbol{\\Sigma}\n", " \\right] d\\boldsymbol{\\phi}\n", " & \\quad & \\text{(b)}\\\\\n", " &= \\kappa_1 \\kappa_2\\\\\n", " &= \\NormDist_{w^\\ast}\\left[\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w},\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right]\n", " & \\quad & \\text{(c)}\n", "\n", "(a)\n", "---\n", "\n", "See :ref:`Exercise 5.10 ` for more details.\n", "\n", ".. math::\n", "\n", " \\NormDist_{w^\\ast}\\left[\n", " {\\mathbf{x}^\\ast}^\\top \\boldsymbol{\\phi}^, \\sigma^2\n", " \\right]\n", " &= \\kappa_1 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast \\sigma^{-2},\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right]\\\\\n", " &= \\kappa_1 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast,\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\kappa_1\n", " &= (2 \\pi)^{D / 2} \\frac{\n", " \\left\\vert\n", " \\left(\n", " \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\right)^{-1}\n", " \\right\\vert^{1 / 2}\n", " }{\n", " \\left\\vert \\sigma^2 \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " w^\\ast\n", " \\left(\n", " \\sigma^{-2} -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\left(\n", " \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\right)^{-1} \\mathbf{x}^\\ast \\sigma^{-2}\n", " \\right)\n", " w^\\ast\n", " \\right]^{-0.5}\\\\\n", " &= (2 \\pi)^{D / 2} \\sigma^{-1} \\left\\vert\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right\\vert^{1 / 2}\n", " \\exp\\left[\n", " w^\\ast\n", " \\left(\n", " \\sigma^{-2} -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast\n", " \\right)\n", " w^\\ast\n", " \\right]^{-0.5}\n", "\n", "(b)\n", "---\n", "\n", "See :ref:`Exercise 5.7 ` and\n", ":ref:`Exercise 5.9 ` for more details.\n", "\n", ".. math::\n", "\n", " & \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast,\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right]\n", " \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}, \\mathbf{A}^{-1}\n", " \\right]\\\\\n", " &= \\kappa_2 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top + \\mathbf{A}\n", " \\right)^{-1}\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast +\n", " \\mathbf{A} \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " \\right),\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top + \\mathbf{A}\n", " \\right)^{-1}\n", " \\right]\\\\\n", " &= \\kappa_2 \\NormDist_{\\boldsymbol{\\phi}}\\left[\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top + \\mathbf{A}\n", " \\right)^{-1}\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast w^\\ast +\n", " \\sigma^{-2} \\mathbf{X} \\mathbf{w}\n", " \\right),\n", " \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top + \\mathbf{A}\n", " \\right)^{-1}\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\kappa_2\n", " &= \\NormDist_{\\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}}\\left[\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast,\n", " \\left(\n", " \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\right)^{-1} + \\mathbf{A}^{-1}\n", " \\right]\\\\\n", " &= \\frac{\n", " \\exp\\left[\n", " \\left(\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}'\n", " \\left(\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast\n", " \\right)\n", " \\right]^{-1 / 2}\n", " }{\n", " \\left\\vert\n", " 2 \\pi \\left(\n", " \\sigma^2\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1} +\n", " \\mathbf{A}^{-1}\n", " \\right)\n", " \\right\\vert^{1 / 2}\n", " }\n", " & \\quad & \\text{(b.1)}\n", "\n", "(b.1)\n", "-----\n", "\n", "According to Sherman-Morrison Identity in Matrix Cookbook (160),\n", "\n", ".. math::\n", "\n", " \\boldsymbol{\\Sigma}\n", " &= \\left(\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top + \\mathbf{A}\n", " \\right)^{-1}\\\\\n", " &= \\mathbf{A}^{-1} -\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " }{\n", " 1 + \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\mathbf{A}^{-1} -\n", " \\frac{\n", " \\mathbf{A}^{-1} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }.\n", "\n", "According to Searle Set of Identities in Matrix Cookbook (163),\n", "\n", ".. math::\n", "\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1} +\n", " \\mathbf{A}^{-1}\n", " &= \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{A}^{-1}\\\\\n", " &= \\mathbf{A}^{-1}\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1}.\n", "\n", "Furthermore, applying (165) gives\n", "\n", ".. math::\n", "\n", " \\boldsymbol{\\Sigma}'\n", " &= \\mathbf{A} \\boldsymbol{\\Sigma}\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)\\\\\n", " &= \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)\n", " \\boldsymbol{\\Sigma} \\mathbf{A}\\\\\n", " &= \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top -\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }.\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " \\kappa_1 \\kappa_2 =\n", " \\frac{\n", " \\exp\\left[\n", " \\left(\n", " w^\\ast -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{X} \\mathbf{w}\n", " \\right)^\\top\n", " \\left(\n", " {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast + \\sigma^2\n", " \\right)^{-1}\n", " \\left(\n", " w^\\ast -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{X} \\mathbf{w}\n", " \\right)\n", " \\right]^{-0.5}\n", " }{\n", " \\left\\vert\n", " 2 \\pi \\left(\n", " {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast + \\sigma^2\n", " \\right)\n", " \\right\\vert^{1 / 2}\n", " }\n", " \\qquad \\text{(c.1) and (c.2)}\n", "\n", "(c.1)\n", "-----\n", "\n", ".. math::\n", "\n", " & \\frac{\n", " (2 \\pi)^{D / 2}\n", " \\left\\vert\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\right\\vert^{1 / 2}\n", " }{\n", " \\left\\vert \\sigma^2 \\right\\vert^{1 / 2}\n", " }\n", " \\frac{1}{\n", " (2 \\pi)^{(D + 1) / 2}\n", " \\left\\vert\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1} +\n", " \\mathbf{A}^{-1}\n", " \\right\\vert^{1 / 2}\n", " }\\\\\n", " &= \\left[\n", " (2 \\pi)^{1 / 2}\n", " \\left\\vert \\sigma^2 \\right\\vert^{1 / 2}\n", " \\left\\vert\n", " \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\right\\vert^{1 / 2}\n", " \\left\\vert\n", " \\sigma^2 \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1} +\n", " \\mathbf{A}^{-1}\n", " \\right\\vert^{1 / 2}\n", " \\right]^{-1}\\\\\n", " &= \\left[\n", " (2 \\pi)\n", " \\left\\vert \\sigma^2 \\right\\vert\n", " \\left\\vert\n", " \\mathbf{I}_{D + 1} +\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\sigma^{-2} \\mathbf{A}^{-1}\n", " \\right\\vert\n", " \\right]^{-1 / 2}\\\\\n", " &= \\left[\n", " (2 \\pi)\n", " \\left\\vert \\sigma^2 \\right\\vert\n", " \\left(\n", " 1 +\n", " {\\mathbf{x}^\\ast}^\\top \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right)\n", " \\right]^{-1 / 2}\n", " & \\quad & \\text{Matrix determinant lemma and Matrix Cookbook (24)}\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{1 / 2}\n", " \\left(\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right)^{1 / 2}\n", " }\n", "\n", "(c.2)\n", "-----\n", "\n", ".. math::\n", "\n", " & \\exp\\left[\n", " {w^\\ast}^2\n", " \\left(\n", " \\sigma^{-2} -\n", " \\sigma^{-2}\n", " {\\mathbf{x}^\\ast}^\\top\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast\n", " \\right)\n", " \\right]^{-1 / 2}\n", " \\exp\\left[\n", " \\left(\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}'\n", " \\left(\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast\n", " \\right)\n", " \\right]^{-1 / 2}\\\\\n", " &= \\exp\\left[\n", " \\left(\n", " w^\\ast -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{X} \\mathbf{w}\n", " \\right)^\\top\n", " \\left(\n", " {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast + \\sigma^2\n", " \\right)^{-1}\n", " \\left(\n", " w^\\ast -\n", " \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{X} \\mathbf{w}\n", " \\right)\n", " \\right]^{-0.5}\n", "\n", ".. math::\n", "\n", " \\text{(1)} & \\quad &\n", " {w^\\ast}^2 \\sigma^{-2} -\n", " {w^\\ast}^2 \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top\n", " \\left(\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\right)^{-1} \\mathbf{x}^\\ast\\\\\\\\\n", " \\text{(2)} & \\quad &\n", " \\sigma^{-4} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\left( \\mathbf{x}^\\ast \\sigma^{-2} {\\mathbf{x}^\\ast}^\\top \\right)\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " \\sigma^{-4} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\\\\\\\\\n", " \\text{(3)} & \\quad &\n", " -\\left(\n", " \\sigma^{-4} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast w^\\ast -\n", " \\sigma^{-2} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\n", " \\mathbf{x}^\\ast w^\\ast\n", " \\right)\\\\\\\\\n", " \\text{(4)} & \\quad &\n", " -\\left(\n", " w^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\sigma^{-4} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w} -\n", " w^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\n", " \\sigma^{-2} \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " \\right)\\\\\\\\\n", " \\text{(5)} & \\quad &\n", " w^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\sigma^{-2} \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)^{-1}\n", " \\mathbf{x}^\\ast w^\\ast -\n", " w^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\frac{\n", " \\sigma^{-2} \\mathbf{A}^{-1}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\n", " \\mathbf{x}^\\ast w^\\ast\n", "\n", ".. math::\n", "\n", " \\text{(1) + (5)}\n", " &= \\frac{\n", " {w^\\ast}^2 \\sigma^{-2}\n", " \\left(\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right) -\n", " {w^\\ast}^2 \\sigma^{-2}\n", " {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\frac{\n", " {w^\\ast}^2\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\\\\\\\\\n", " \\text{(2)}\n", " &= \\frac{\n", " \\sigma^{-6} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\left( \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top \\right)\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " \\left(\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right) -\n", " \\sigma^{-6} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\frac{\n", " \\sigma^{-4} \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1}\n", " \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\\\\\\\\\n", " \\text{(4)}\n", " &= \\frac{\n", " -\\sigma^{-4} w^\\ast {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{X}\n", " \\mathbf{w}\n", " \\left(\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " \\right) +\n", " \\sigma^{-4} w^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1} \\mathbf{x}^\\ast {\\mathbf{x}^\\ast}^\\top\n", " \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\frac{\n", " -\\sigma^{-2} w^\\ast\n", " {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{X} \\mathbf{w}\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\frac{\n", " -\\sigma^{-2} w^\\ast\n", " \\mathbf{w}^\\top \\mathbf{X}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }{\n", " \\sigma^2 + {\\mathbf{x}^\\ast}^\\top \\mathbf{A}^{-1} \\mathbf{x}^\\ast\n", " }\\\\\n", " &= \\text{(3)}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.6\n", "============\n", "\n", ".. math::\n", "\n", " \\mathbf{A}^{-1}\n", " &= \\left(\n", " \\sigma^{-2} \\mathbf{X} \\mathbf{X}^\\top + \\sigma_p^{-2} \\mathbf{I}_D\n", " \\right)^{-1}\\\\\n", " &= \\left(\n", " \\sigma_p^{-2} \\mathbf{I}_D +\n", " \\mathbf{X} \\left( \\sigma^{-2} \\mathbf{I}_I \\right) \\mathbf{X}^\\top\n", " \\right)^{-1}\\\\\n", " &= \\sigma_p^2 \\mathbf{I}_D -\n", " \\sigma_p^2 \\mathbf{I}_D \\mathbf{X}\n", " \\left(\n", " \\mathbf{X}^\\top \\sigma_p^2 \\mathbf{I}_D \\mathbf{X} +\n", " \\sigma^2 \\mathbf{I}_I\n", " \\right)^{-1}\n", " \\mathbf{X}^\\top \\sigma_p^2 \\mathbf{I}_D\\\\\n", " &= \\sigma_p^2 \\mathbf{I}_D -\n", " \\sigma_p^4 \\mathbf{X} \\sigma_p^{-2}\n", " \\left(\n", " \\mathbf{X}^\\top \\mathbf{I}_D \\mathbf{X} +\n", " \\frac{\\sigma^2}{\\sigma_p^2} \\mathbf{I}_I\n", " \\right)^{-1}\n", " \\mathbf{X}^\\top\\\\\n", " &= \\sigma_p^2 \\mathbf{I}_D -\n", " \\sigma_p^2 \\mathbf{X}\n", " \\left(\n", " \\mathbf{X}^\\top \\mathbf{I}_D \\mathbf{X} +\n", " \\frac{\\sigma^2}{\\sigma_p^2} \\mathbf{I}_I\n", " \\right)^{-1}\n", " \\mathbf{X}^\\top" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.7\n", "============\n", "\n", "Recall that\n", "\n", ".. math::\n", "\n", " \\frac{\\partial}{\\partial \\sigma} \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right) = 2 \\sigma \\mathbf{I}.\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\tr}{\\mathrm{tr}}\n", " 0 &= \\frac{\\partial}{\\partial \\sigma}\n", " \\log Pr(\\mathbf{w} \\mid \\mathbf{X}, \\sigma^2)\\\\\n", " &= \\frac{\\partial}{\\partial \\sigma} \\log\n", " \\NormDist_{\\mathbf{w}}\\left[\n", " \\boldsymbol{0},\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}_I\n", " \\right]\\\\\n", " &= \\frac{\\partial}{\\partial \\sigma} \\left(\n", " -\\frac{I}{2} \\log[2 \\pi] -\n", " \\frac{1}{2} \\log\\left\\vert\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right\\vert -\n", " \\frac{1}{2} \\mathbf{w}^\\top\n", " \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-1} \\mathbf{w}\n", " \\right)\\\\\n", " &= -\\frac{1}{2} \\tr\\left[\n", " \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-1}\n", " 2 \\sigma \\mathbf{I}\n", " \\right] -\n", " \\frac{1}{2} \\mathbf{w}^\\top\n", " \\left[\n", " -\\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-1}\n", " 2 \\sigma \\mathbf{I}\n", " \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-1}\n", " \\right] \\mathbf{w}\n", " & \\quad & \\text{(C.36) and (C.39)}\\\\\n", " &= \\mathbf{w}^\\top\n", " \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-2} \\mathbf{w} -\n", " \\tr\\left[\n", " \\left(\n", " \\sigma_p^2 \\mathbf{X}^\\top \\mathbf{X} + \\sigma^2 \\mathbf{I}\n", " \\right)^{-1}\n", " \\right]" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.8\n", "============\n", "\n", ".. math::\n", "\n", " q(\\phi) = \\max_h L =\n", " \\max_h \\NormDist_{\\phi}\\left[ 0, h^{-1} \\right] \\GamDist_h[\\nu / 2, \\nu / 2]\n", "\n", "is an unconstrained optimization problem where\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial h}\n", " &= \\frac{\\partial}{\\partial h} \\log\\left[\n", " \\frac{1}{\\sqrt{2 \\pi \\det h^{-1}}}\n", " \\exp\\left( -0.5 \\phi^2 h\\right)\n", " \\frac{\\left( \\frac{\\nu}{2} \\right)^{\\nu / 2}}{\\Gamma[\\nu / 2]}\n", " \\exp\\left( -\\frac{\\nu}{2} h \\right) h^{\\nu / 2 - 1}\n", " \\right]\\\\\n", " 0 &= \\frac{\\partial}{\\partial h} \\left[\n", " \\frac{1}{2} \\log(h) - \\frac{1}{2} \\log(2 \\pi) +\n", " \\left( -0.5 \\phi^2 h \\right) +\n", " \\log\\left(\n", " \\frac{\\left( \\frac{\\nu}{2} \\right)^{\\nu / 2}}{\\Gamma[\\nu / 2]}\n", " \\right) +\n", " \\left( -0.5 \\nu h \\right) +\n", " \\left( \\frac{\\nu}{2} - 1 \\right) \\log(h)\n", " \\right]\\\\\n", " &= \\frac{1}{2h} - \\frac{1}{2} \\phi^2 - \\frac{1}{2} \\nu +\n", " \\left( \\frac{\\nu}{2} - 1 \\right) h^{-1}\\\\\n", " h^{-1} \\frac{\\nu - 1}{2} &= \\frac{\\phi^2 + \\nu}{2}\\\\\n", " h &= \\frac{\\nu - 1}{\\phi^2 + \\nu}." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy\n", "import scipy.special\n", "\n", "def q(phi):\n", " h = (nu - 1) / (phi**2 + nu)\n", " d_normal = 1 / numpy.sqrt(2 * numpy.pi / h) * numpy.exp(-.5 * phi**2 * h)\n", " _ = numpy.power(nu / 2, nu / 2) / scipy.special.gamma(nu / 2)\n", " d_gamma = _ * numpy.exp(-.5 * nu * h) * numpy.power(h, nu / 2 - 1)\n", " return d_normal * d_gamma\n", "\n", "nu = 2.0\n", "phi = numpy.linspace(-10, 10, 100)\n", "plt.plot(phi, q(phi))\n", "plt.title('$q(h)$ where $v = 2$')\n", "plt.show()\n", "\n", "nu = 2.0\n", "_ = numpy.linspace(-10, 10, 100)\n", "phi_1, phi_2 = numpy.meshgrid(_, _)\n", "plt.imshow(q(phi_1) * q(phi_2))\n", "plt.title('$q(h_1) q(h_2)$ where $v = 2$')\n", "plt.xlabel('$h_1$')\n", "plt.ylabel('$h_2$')\n", "plt.show()" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.9\n", "============\n", "\n", "Replace the nonlinear transformation in :math:`x` with\n", ":math:`z = \\begin{bmatrix} 1 & x & x^2 & x^3 \\end{bmatrix}^\\top` as in (8.19).\n", "\n", "This enables reuse of the linear model since (8.17) is linear in :math:`z`.\n", "\n", "The maximum likelihood learning algorithm (8.6) is replaced with :math:`Z`\n", "where the columns of :math:`Z` contains the transformed vectors\n", ":math:`\\left\\{ z_i \\right\\}_{i = 1}^I`.\n", "\n", "The Bayesian linear regression inference algorithm (8.14) can be reused via\n", "replacing :math:`x` and :math:`z` with :math:`z` and :math:`Z` respectively." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.10\n", "=============\n", "\n", "Dual linear regression should only be used when :math:`I < D`, otherwise the\n", "computation gets unnecessarily more expensive." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 8.11\n", "=============\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\psi}\n", " &= -\\frac{1}{2 \\sigma^2} \\frac{\\partial}{\\partial \\psi}\n", " \\left(\n", " w^\\top w - w^\\top X^\\top X \\psi - \\psi^\\top X^\\top X w +\n", " \\psi^\\top X^\\top X X^\\top X \\psi\n", " \\right)\\\\\n", " 0 &= -\\frac{1}{2 \\sigma^2} \\left(\n", " -2 X^\\top X w + 2 X^\\top X X^\\top X \\psi\n", " \\right)\n", " & \\quad & \\text{(C.27), (C.28), (C.33)}\\\\\n", " X^\\top X X^\\top X \\psi &= X^\\top X w\\\\\n", " \\psi &= \\left( X^\\top X \\right)^{-1} w" ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }