{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "*****************************\n", "Models for Style and Identity\n", "*****************************\n", "\n", "Bayesian model selection is a valid way to compare models with different numbers\n", "of parameters as long as they are marginalized out of the final solution." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.1\n", "=============\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\NormDist}{Norm}\n", " Pr(\\mathbf{h}_i \\mid \\mathbf{x}_{i \\cdot})\n", " &= \\frac{\n", " Pr(\\mathbf{h}_i, \\mathbf{x}_{i1}, \\mathbf{x}_{i2}, \\ldots,\n", " \\mathbf{x}_{iJ})\n", " }{\n", " \\int Pr(\\mathbf{x}_{i1}, \\mathbf{x}_{i2}, \\ldots, \\mathbf{x}_{iJ},\n", " \\mathbf{h}_i) d\\mathbf{h}_i\n", " }\n", " & \\quad & \\text{(2.1), (2.4) and }\n", " \\mathbf{x}_{i \\cdot} = \\{ \\mathbf{x}_{ij} \\}_{j = 1}^J\\\\\n", " &= \\frac{\n", " \\prod_{j = 1}^J Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i)\n", " \\cdot\n", " Pr(\\mathbf{h}_i)\n", " }{\n", " \\int Pr(\\mathbf{h}_i) \\prod_{j = 1}^J\n", " Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i) d\\mathbf{h}_i\n", " }\n", " & \\quad & \\text{(2.6), (2.10)}\\\\\n", " &= \\frac{\n", " \\kappa \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{\\mu}', \\boldsymbol{\\Sigma}'\n", " \\right]\n", " }{\n", " \\kappa \\int \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{\\mu}', \\boldsymbol{\\Sigma}'\n", " \\right] d\\mathbf{h}_i\n", " }\n", " & \\quad & \\text{(a)}\\\\\n", " &= \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{\\mu}', \\boldsymbol{\\Sigma}'\n", " \\right]\n", " & \\quad & \\text{multivariate normal distribution integrates to one}\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " & \\prod_{j = 1}^J Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i)\n", " \\cdot Pr(\\mathbf{h}_i)\\\\\n", " &= \\prod_{j = 1}^J\n", " \\NormDist_{\\mathbf{x}_{ij}}\\left[\n", " \\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\mathbf{h}_i,\n", " \\boldsymbol{\\Sigma}\n", " \\right] \\cdot\n", " \\NormDist_{\\mathbf{h}_i}\\left[ \\boldsymbol{0}, \\mathbf{I} \\right]\n", " & \\quad & \\text{(18.7)}\\\\\n", " &= \\frac{1}{\n", " (2\\pi)^{JD / 2} \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\n", " \\exp\\left[\n", " \\sum_{j = 1}^J\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)\n", " \\right]^{-0.5}\n", " \\frac{1}{(2\\pi)^{k / 2}}\n", " \\exp\\left[ \\mathbf{h}_i^\\top \\mathbf{h}_i \\right]^{-0.5}\\\\\n", " &= \\frac{\n", " \\exp\\left[\n", " \\mathbf{h}_i^\\top \\mathbf{h}_i +\n", " \\sum_{j = 1}^J\n", " \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij} -\n", " \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij} +\n", " \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij} +\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right]^{-0.5}\n", " }{\n", " (2\\pi)^{(JD + k) / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\\\\\n", " &= \\frac{\n", " \\exp\\left[\n", " \\mathbf{h}_i^\\top \\mathbf{h}_i +\n", " J \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " \\left(\n", " 2 \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\sum_{j = 1}^J \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right) +\n", " \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right]^{-0.5}\n", " }{\n", " (2\\pi)^{(JD + k) / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\\\\\n", " &= \\frac{\n", " \\exp\\left[\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Sigma}'^{-1} \\mathbf{h}_i -\n", " \\left(\n", " 2 \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\sum_{j = 1}^J \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right) +\n", " \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right]^{-0.5}\n", " }{\n", " (2\\pi)^{(JD + k) / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\n", " & \\quad & \\boldsymbol{\\Sigma}' =\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\\\\\n", " &= \\frac{\n", " \\exp\\left[\n", " \\left( \\mathbf{h}_i - \\boldsymbol{\\mu}' \\right)^\\top\n", " \\boldsymbol{\\Sigma}'^{-1}\n", " \\left( \\mathbf{h}_i - \\boldsymbol{\\mu}' \\right) -\n", " \\boldsymbol{\\mu}'^\\top \\boldsymbol{\\Sigma}'^{-1} \\boldsymbol{\\mu}' +\n", " \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right]^{-0.5}\n", " }{\n", " (2\\pi)^{(JD + k) / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\n", " & \\quad & \\boldsymbol{\\mu}' =\n", " \\boldsymbol{\\Sigma}' \\boldsymbol{\\Phi}^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\\\\\n", " &= \\kappa \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{\\mu}', \\boldsymbol{\\Sigma}'\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\kappa =\n", " \\frac{\n", " \\left\\vert \\boldsymbol{\\Sigma}' \\right\\vert^{1 / 2}\n", " }{\n", " (2\\pi)^{JD / 2} \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{J / 2}\n", " }\n", " \\exp\\left[\n", " -\\boldsymbol{\\mu}'^\\top \\boldsymbol{\\Sigma}'^{-1} \\boldsymbol{\\mu}' +\n", " \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\right]^{-0.5}." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.2\n", "=============\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator*{\\argmax}{arg\\,max}\n", " \\DeclareMathOperator{\\E}{\\mathrm{E}}\n", " \\hat{\\theta}\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " \\sum_{i = 1}^I\n", " \\int q_i(\\mathbf{h}_i)\n", " \\log Pr(\\mathbf{x}_{i\\cdot}, \\mathbf{h}_i \\mid \\theta)\n", " d\\mathbf{h}_i\n", " & \\quad & \\text{(7.51)}\\\\\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " \\sum_{i = 1}^I\n", " \\int q_i(\\mathbf{h}_i)\n", " \\left[\n", " \\log Pr(\\mathbf{x}_{i\\cdot} \\mid \\mathbf{h}_i, \\theta) +\n", " \\log Pr(\\mathbf{h}_i \\mid \\theta)\n", " \\right] d\\mathbf{h}_i\\\\\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " \\sum_{i = 1}^I\n", " \\E\\left[\n", " \\log Pr(\\mathbf{x}_{i\\cdot} \\mid \\mathbf{h}_i, \\theta)\n", " \\right] +\n", " \\E\\left[ \\log Pr(\\mathbf{h}_i \\mid \\theta) \\right]\\\\\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " \\E\\left[ Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i, \\theta) \\right]\n", " & \\quad & Pr(\\mathbf{h}_i \\mid \\theta) \\perp \\theta\\\\\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " -\\frac{1}{2} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " D \\log 2 \\pi +\n", " \\log \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert +\n", " \\E\\left[\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\right]\\\\\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " -\\frac{1}{2} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " D \\log(2\\pi) + \\log\\left\\vert \\boldsymbol{\\Sigma} \\right\\vert +\n", " \\E\\left[\n", " \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij} +\n", " \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i +\n", " 2 \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " 2 \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " 2 \\mathbf{x}_{ij}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i \n", " \\right]\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\mu}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\sum_{j = 1}^J \\E\\left[\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij}\n", " \\right]\n", " & \\quad & \\text{(C.33), (C.27), (C.28)}\\\\\n", " 0 &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i] +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij}\n", " & \\quad & \\E[] \\text{ is a linear operator}\\\\\n", " &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\left(\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\sum_{k = 1}^J \\left( \\mathbf{x}_{ik} - \\boldsymbol{\\mu} \\right)\n", " \\right) +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij}\n", " & \\quad & \\text{(18.10)}\\\\\n", " &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " J \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right) \\mathbf{x}_{ij} -\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " J \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right) \\boldsymbol{\\mu}\n", " & \\quad & \\text{(a.1)}\\\\\n", " &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " \\left(\n", " \\boldsymbol{\\Sigma} + J \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right)^{-1} \\mathbf{x}_{ij} -\n", " \\left(\n", " \\boldsymbol{\\Sigma} + J \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right)^{-1} \\boldsymbol{\\mu}\n", " & \\quad & \\text{Sherman-Morrison-Woodbury formula}\\\\\n", " \\boldsymbol{\\mu}\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J \\mathbf{x}_{ij}\n", "\n", "(a.1)\n", "-----\n", "\n", "Notice that\n", "\n", ".. math::\n", "\n", " \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\sum_{k = 1}^J \\mathbf{x}_{ik} =\n", " \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left(\n", " J \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\left(\n", " J \\mathbf{x}_{ij} - J \\mathbf{x}_{ij} + \\sum_{k = 1}^J \\mathbf{x}_{ik}\n", " \\right)\n", "\n", "and\n", "\n", ".. math::\n", "\n", " \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " J \\mathbf{x}_{ij} - \\sum_{k = 1}^J \\mathbf{x}_{ik}\n", " &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " J \\mathbf{x}_{ij} -\n", " \\left(\n", " \\mathbf{x}_{i1} + \\mathbf{x}_{i2} + \\cdots + \\mathbf{x}_{iJ}\n", " \\right)\\\\\n", " &= \\sum_{i = 1}^I\n", " \\left( J \\sum_{j = 1}^J \\mathbf{x}_{ij} \\right) -\n", " J \\left(\n", " \\mathbf{x}_{i1} + \\mathbf{x}_{i2} + \\cdots + \\mathbf{x}_{iJ}\n", " \\right)\\\\\n", " &= 0.\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\Phi}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\sum_{j = 1}^J \\E\\left[\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\mathbf{h}_i \\mathbf{h}_i^\\top +\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} \\mathbf{h}_i^\\top -\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij} \\mathbf{h}_i^\\top\n", " \\right]\n", " & \\quad & \\text{(C.34), (C.29)}\\\\\n", " 0 &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right] -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu}\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right] +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_{ij}\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right]\n", " & \\quad & \\E[] \\text{ is a linear operator}\\\\\n", " \\boldsymbol{\\Phi} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right]\n", " &= \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu} \\right)\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right]\\\\\n", " \\boldsymbol{\\Phi}\n", " &= \\left(\n", " \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right]\n", " \\right)\n", " \\left(\n", " \\sum_{i = 1}^I J \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right]\n", " \\right)^{-1}\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\diag}{\\mathrm{diag}}\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\Sigma}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\sum_{j = 1}^J \\E\\left[\n", " \\boldsymbol{\\Sigma}^{-\\top} -\n", " \\boldsymbol{\\Sigma}^{-\\top}\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-\\top}\n", " \\right]\n", " & \\quad & \\text{(C.38) and Matrix Cookbook (61)}\\\\\n", " \\boldsymbol{\\Sigma}\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J \\E\\left[\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)\n", " \\left(\n", " \\mathbf{x}_{ij} - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\right]\n", " & \\quad & \\E[] \\text{ is a linear operator}\\\\\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})^\\top +\n", " \\E\\left[\n", " 2 \\boldsymbol{\\Phi} \\mathbf{h}_i \\boldsymbol{\\mu}^\\top -\n", " 2 \\boldsymbol{\\Phi} \\mathbf{h}_i \\mathbf{x}_{ij}^\\top +\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\right]\\\\\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})^\\top -\n", " 2 \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu}\\right)^\\top +\n", " \\boldsymbol{\\Phi}\n", " \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right]\n", " \\boldsymbol{\\Phi}^\\top\\\\\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})^\\top -\n", " \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu}\\right)^\\top\n", " & \\quad & \\text{substitute in results from (b)}\\\\\n", " &= \\frac{1}{IJ} \\sum_{i = 1}^I \\sum_{j = 1}^J \\diag\\left[\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_{ij} - \\boldsymbol{\\mu})^\\top -\n", " \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_{ij} - \\boldsymbol{\\mu}\\right)^\\top\n", " \\right]\n", " & \\quad & \\boldsymbol{\\Sigma} \\text{ diagonal constraint}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.3\n", "=============\n", "\n", "As shown in (17.25),\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{x}_{ij})\n", " &= \\int Pr(\\mathbf{x}_{ij}, \\mathbf{h}_i) d\\mathbf{h}_i\\\\\n", " &= \\int Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i) Pr(\\mathbf{h}_i)\n", " d\\mathbf{h}_i\\\\\n", " &= \\NormDist_{\\mathbf{x}_{ij}}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\sigma \\mathbf{I}\n", " \\right]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{h}_i)\n", " &= \\NormDist_{\\mathbf{h}_i}\\left[\\boldsymbol{0}, \\mathbf{I} \\right],\n", " \\\\\\\\\n", " Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i)\n", " &= \\NormDist_{\\mathbf{x}_{ij}}\\left[\n", " \\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\mathbf{h}_i,\n", " \\sigma \\mathbf{I}\n", " \\right].\n", "\n", "This model looks very similar to the PPCA. Instead of stacking landmark\n", "points, the subspace identity model uses images with varied lighting and poses.\n", "Since the dimensionality of the data is very high compared to the number of\n", "training examples, (17.29) is a more efficient way of estimating\n", ":math:`\\boldsymbol{\\Phi}` and :math:`\\sigma`. Furthermore,\n", ":math:`\\boldsymbol{\\mu}` can still be estimated using (17.26)." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.4\n", "=============\n", "\n", "Face clustering can be viewed as a set partition problem. The number of ways a\n", "set of :math:`n` elements can be partitioned into non-empty subsets is called a\n", "Bell number\n", "\n", ".. math::\n", "\n", " B_n = \\sum_{k = 0}^{n - 1} {n - 1 \\choose x} B_k\n", "\n", "where :math:`B_0 = B_1 = 1`." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.5\n", "=============\n", "\n", "Suppose :math:`Pr(\\mathbf{x}_1, \\mathbf{x}_2 \\mid w)` is defined as (18.17) and\n", "(18.18); see :ref:`Exercise 7.8 ` for the derivation.\n", "\n", "The marginal and conditional distributions can be derived using (5.11) and\n", "(5.13) respectively. It is interesting to note that any combination of these\n", "four equations will result in exactly\n", ":math:`\\{ Pr(\\mathbf{x}_1), Pr(\\mathbf{x}_2),\n", "Pr(\\mathbf{x}_1 \\mid \\mathbf{x}_2), Pr(\\mathbf{x}_2 \\mid \\mathbf{x}_1) \\}`.\n", "\n", "When the prior is uniform over the world state, (18.12) can be evaluated using\n", "the foregoing expressions." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.6\n", "=============\n", "\n", "The marginalization technique used to combine the t-distribution with factor\n", "analyzers in :cite:`khan2004robust` could be applied to the subspace identity\n", "model to make it robust to outliers:\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{x}_{ij})\n", " &= \\iint Pr(\\mathbf{x}_{ij}, \\mathbf{h}_i, u_i) d\\mathbf{h}_i du_i\\\\\n", " &= \\iint Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i, u_i)\n", " Pr(\\mathbf{h}_i \\mid u_i)\n", " Pr(u_i) d\\mathbf{h}_i du_i\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\GamDist}{Gam}\n", " Pr(u_i) &= \\GamDist_{u_i}[\\nu / 2, \\nu / 2],\n", " \\\\\\\\\\\\\n", " Pr(\\mathbf{h}_i \\mid u_i)\n", " &= \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{0}, \\frac{\\mathbf{I}}{u_i}\n", " \\right],\n", " \\\\\\\\\\\\\n", " Pr(\\mathbf{x}_{ij} \\mid \\mathbf{h}_i, u_i)\n", " &= \\NormDist_{\\mathbf{x}_{ij}}\\left[\n", " \\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\mathbf{h}_i,\n", " \\frac{\\boldsymbol{\\Sigma}}{u_i}\n", " \\right]." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.7\n", "=============\n", "\n", "Imitating (18.17), define\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{x}_\\delta \\mid w) =\n", " \\NormDist_{\\mathbf{x}_\\delta}\\left[\n", " \\boldsymbol{\\mu}_w,\n", " \\boldsymbol{\\Phi}_w \\boldsymbol{\\Phi}^\\top_w + \\boldsymbol{\\Sigma}_w\n", " \\right]\n", "\n", "and\n", "\n", ".. math::\n", "\n", " Pr(w \\mid \\mathbf{x}_\\delta) =\n", " \\frac{Pr(\\mathbf{x}_\\delta \\mid w) Pr(w)}{Pr(\\mathbf{x}_\\delta)}.\n", "\n", "See :cite:`moghaddam2000bayesian` for a discussion of this model's\n", "disadvantages." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.8\n", "=============\n", "\n", "Combining (18.20) and (18.30) yields\n", "\n", ".. math::\n", "\n", " \\mathbf{x}_{ijs}\n", " &= \\boldsymbol{\\mu}_s + \\boldsymbol{\\Phi}_s \\mathbf{h}_i +\n", " \\boldsymbol{\\Psi}_s \\mathbf{s}_{ij} + \\boldsymbol{\\epsilon}_{ijs}\\\\\n", " &= \\boldsymbol{\\mu}_s +\n", " \\begin{bmatrix} \\boldsymbol{\\Phi}_s & \\boldsymbol{\\Psi}_s \\end{bmatrix}\n", " \\begin{bmatrix} \\mathbf{h}_i\\\\ \\mathbf{s}_{ij} \\end{bmatrix} +\n", " \\boldsymbol{\\epsilon}_{ijs}\\\\\n", " &= \\boldsymbol{\\mu}_s + \\boldsymbol{\\Phi}'_s \\mathbf{h}_{ij} +\n", " \\boldsymbol{\\epsilon}_{ijs}\n", "\n", "where :math:`\\boldsymbol{\\mu}_s` is the mean vector associated with the\n", ":math:`s\\text{th}` style, :math:`\\boldsymbol{\\Phi}_s` describes the\n", "between-individual variation associated with the :math:`s\\text{th}` style,\n", ":math:`\\boldsymbol{\\Psi}_s` describes the within-individual variation associated\n", "with the :math:`s\\text{th}` style, and the diagonal noise term\n", ":math:`\\boldsymbol{\\epsilon}_{ijs}` explains any remaining variation associated\n", "with the :math:`s\\text{th}` style.\n", "\n", "The generative model can be written in probabilistic terms as a combination of\n", "(18.21) and (18.31):\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\CatDist}{Cat}\n", " Pr(s) &= \\CatDist_s\\left[\\boldsymbol{\\lambda}\\right]\\\\\n", " Pr(\\mathbf{h}_i)\n", " &= \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{0}, \\mathbf{I}_{D_{\\mathbf{h}_i}}\n", " \\right]\\\\\n", " Pr(\\mathbf{s}_{ij})\n", " &= \\NormDist_{\\mathbf{s}_{ij}}\\left[\n", " \\boldsymbol{0}, \\mathbf{I}_{D_{\\mathbf{s}_{ij}}}\n", " \\right]\\\\\n", " Pr(\\mathbf{x}_{ijs} \\mid \\mathbf{h}_i, \\mathbf{s}_{ij}, s)\n", " &= \\NormDist_{\\mathbf{x}_{ijs}}\\left[\n", " \\boldsymbol{\\mu}_s + \\boldsymbol{\\Phi}_s \\mathbf{h}_i +\n", " \\boldsymbol{\\Psi}_s \\mathbf{s}_{ij},\n", " \\boldsymbol{\\Sigma}_s\n", " \\right]\n", "\n", "where :math:`\\boldsymbol{\\lambda}` describes the probability of observing data\n", "in each style.\n", "\n", "The entire model can be reshaped into the standard factor analysis model\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{h}_{ij})\n", " &= \\NormDist_{\\mathbf{h}_{ij}}\\left[\n", " \\boldsymbol{0}, \\mathbf{I}_{D_{\\mathbf{h}_i} + D_{\\mathbf{s}_{ij}}}\n", " \\right]\\\\\n", " Pr(\\mathbf{x}_{ijs} \\mid \\mathbf{h}_{ij}, s)\n", " &= \\NormDist_{\\mathbf{x}_{ijs}}\\left[\n", " \\boldsymbol{\\mu}_s + \\boldsymbol{\\Phi}'_s \\mathbf{h}_{ij},\n", " \\boldsymbol{\\Sigma}_s\n", " \\right].\n", "\n", "Marginalizing over the hidden variables gives\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{x}_{ijs})\n", " &= \\sum_{s = 1}^S\n", " \\iint Pr(\\mathbf{x}_{ijs}, \\mathbf{h}_i, \\mathbf{s}_{ij}, s)\n", " d\\mathbf{h}_i d\\mathbf{s}_{ij}\\\\\n", " &= \\sum_{s = 1}^S\n", " \\iint Pr(\\mathbf{x}_{ijs} \\mid \\mathbf{h}_i, \\mathbf{s}_{ij}, s)\n", " Pr(\\mathbf{h}_i) Pr(\\mathbf{s}_{ij}) Pr(s)\n", " d\\mathbf{h}_i d\\mathbf{s}_{ij}\n", " & \\quad & \\text{the priors are independent of each other}\\\\\n", " &= \\sum_{s = 1}^S\n", " \\int Pr(\\mathbf{x}_{ijs} \\mid \\mathbf{h}_{ij}, s)\n", " Pr(\\mathbf{h}_{ij}) Pr(s) d\\mathbf{h}_{ij}\\\\\n", " &= \\sum_{s = 1}^S\n", " \\lambda_s \\NormDist_{\\mathbf{x}_{ijs}}\\left[\n", " \\boldsymbol{\\mu}_s,\n", " {\\boldsymbol{\\Phi}'}_s {\\boldsymbol{\\Phi}'}_s^\\top +\n", " \\boldsymbol{\\Sigma}_s\n", " \\right]\n", " & \\quad & \\text{Exercise 7.8}\\\\\n", " &= \\sum_{s = 1}^S\n", " \\lambda_s \\NormDist_{\\mathbf{x}_{ijs}}\\left[\n", " \\boldsymbol{\\mu}_s,\n", " \\boldsymbol{\\Phi}_s \\boldsymbol{\\Phi}_s^\\top +\n", " \\boldsymbol{\\Psi}_s \\boldsymbol{\\Psi}_s^\\top +\n", " \\boldsymbol{\\Sigma}_s\n", " \\right]\n", "\n", "which is essentially the non-linear identity model (18.27). See\n", ":ref:`Exercise 7.8 ` for more derivation details.\n", "The compound generative equation for :math:`\\mathbf{x}_{i\\cdot\\cdot}` is\n", "\n", ".. math::\n", "\n", " \\begin{bmatrix}\n", " \\mathbf{x}_{i11}\\\\ \\vdots\\\\ \\mathbf{x}_{i1S}\\\\\n", " \\vdots\\\\\n", " \\mathbf{x}_{iJ1}\\\\ \\vdots\\\\ \\mathbf{x}_{iJS}\n", " \\end{bmatrix}\n", " &= \\begin{bmatrix}\n", " \\boldsymbol{\\mu}_1\\\\ \\vdots\\\\ \\boldsymbol{\\mu}_S\\\\\n", " \\vdots\\\\\n", " \\boldsymbol{\\mu}_1\\\\ \\vdots\\\\ \\boldsymbol{\\mu}_S\n", " \\end{bmatrix} +\n", " \\begin{bmatrix}\n", " \\boldsymbol{\\Phi}_1 & \\boldsymbol{\\Psi}_1 &\n", " \\boldsymbol{0} & \\cdots & \\boldsymbol{0}\\\\\n", " \\vdots & \\vdots & \\vdots & \\ddots & \\vdots\\\\\n", " \\boldsymbol{\\Phi}_S & \\boldsymbol{\\Psi}_S &\n", " \\boldsymbol{0} & \\cdots & \\boldsymbol{0}\\\\\n", " \\vdots & \\vdots & \\vdots & \\vdots & \\vdots\\\\\n", " \\boldsymbol{\\Phi}_1 & \\boldsymbol{0} & \\boldsymbol{0} &\n", " \\cdots & \\boldsymbol{\\Psi}_1\\\\\n", " \\vdots & \\vdots & \\vdots & \\ddots & \\vdots\\\\\n", " \\boldsymbol{\\Phi}_S & \\boldsymbol{0} & \\boldsymbol{0} &\n", " \\cdots & \\boldsymbol{\\Psi}_S\\\\\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " \\mathbf{h}_{i}\\\\ \\mathbf{s}_{i1}\\\\ \\vdots\\\\ \\mathbf{s}_{iJ}\n", " \\end{bmatrix} +\n", " \\begin{bmatrix}\n", " \\boldsymbol{\\epsilon}_{i11}\\\\ \\vdots\\\\ \\boldsymbol{\\epsilon}_{i1S}\\\\\n", " \\vdots\\\\\n", " \\boldsymbol{\\epsilon}_{iJ1}\\\\ \\vdots\\\\ \\boldsymbol{\\epsilon}_{iJS}\n", " \\end{bmatrix}\n", "\n", "(18.9) and (18.10) can be reused to compute the E-step; the M-step can be\n", "updated according to (18.35). If :math:`\\boldsymbol{\\lambda}` is unknown and\n", "needs to be estimated, one can proceed as in :cite:`ghahramani1996algorithm`.\n", "The inference in section 18.2.2 and 18.4.2 are still applicable." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 18.9\n", "=============\n", "\n", "The posterior distribution of some observed data :math:`\\mathbf{x}` over style\n", ":math:`s` can be computed using\n", "\n", ".. math::\n", "\n", " Pr(s \\mid \\mathbf{x}) =\n", " \\frac{Pr(\\mathbf{x} \\mid s) Pr(s)}{Pr(\\mathbf{x})}\n", "\n", "where the numerator and denominator are defined as (18.32), (18.36), and\n", "(18.37).\n", "\n", "Now that the two examples each have its own distribution, arbitrary metrics\n", "(e.g. :math:`L_\\infty`-norm, KL divergence) can be applied to determine whether\n", "the styles match or not." ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }