{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "*******************************\n", "Modeling Complex Data Densities\n", "*******************************\n", "\n", "Hidden Variables\n", "================\n", "\n", "Hidden (latent) variables describe the target density :math:`Pr(x)` as the\n", "marginalization of :math:`Pr(x, h)` such that\n", "\n", ".. math::\n", "\n", " Pr(x \\mid \\theta) = \\int Pr(x, h \\mid \\theta) dh.\n", "\n", "The joint density :math:`Pr(x, h)` can be chosen to be much simpler than the\n", "target density :math:`Pr(x)` while producing complex marginal distributions\n", "when integrated.\n", "\n", "The left term can be formulated as a constrained nonlinear optimization\n", "problem (7.7). The right term can be maximized using EM, which could lead to a\n", "closed form solution (7.8).\n", "\n", "Expectation Maximization\n", "========================\n", "\n", "Figure 7.5 is a beautiful illustration of EM algorithm.\n", "\n", "Factor Analysis\n", "===============\n", "\n", "The factor analyzer describes a linear subspace with a full covariance model as\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\NormDist}{Norm}\n", " Pr(\\mathbf{x}) =\n", " \\NormDist_{\\mathbf{x}}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}\n", " \\right].\n", "\n", "Probabilistic principal component analysis is an instance of the factor\n", "analyzer where :math:`\\boldsymbol{\\Sigma}` models spherical covariance. It\n", "has slightly fewer parameters and can be fit in closed form." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.1\n", "============\n", "\n", "The world state :math:`w \\in \\{0, 1\\}` is a discrete label indicating whether\n", "the orange is ripe or not.\n", "\n", "The observed data :math:`\\mathbf{x} \\in \\mathbb{R}^3` describes the averaged RGB\n", "color of a segmented orange.\n", "\n", "Suppose the given labeled training data\n", ":math:`\\left\\{ \\mathbf{x}_i, w_i \\right\\}_{i = 1}^I` describes the multimodal\n", "colorspace of ripe and unripe oranges. Since the pixels have been\n", "post-processed, outliers can be ignored. The input dimension is low, so no need\n", "to use factor analysis.\n", "\n", "One way to model this is to separately fit a mixture of Gaussians for\n", "each discrete label as\n", "\n", ".. math::\n", "\n", " Pr(\\mathbf{x} \\mid w) =\n", " \\sum_{k = 1}^K \\lambda_{wk}\n", " \\NormDist_\\mathbf{x}\\left[\n", " \\boldsymbol{\\mu}_{wk}, \\boldsymbol{\\Sigma}_{wk}\n", " \\right].\n", "\n", "There are no prior knowledge about whether oranges are ripe or unripe\n", "when the system is used, hence :math:`Pr(w = 0) = Pr(w = 1) = 0.5`.\n", "\n", "Applying Bayes' rule to compute the posterior over :math:`w` gives\n", "\n", ".. math::\n", "\n", " Pr\\left( w^\\ast = 1 \\mid \\mathbf{x}^\\ast \\right) =\n", " \\frac{\n", " Pr\\left( w^\\ast = 1, \\mathbf{x}^\\ast \\right)\n", " }{\n", " Pr\\left( \\mathbf{x}^\\ast \\right)\n", " }." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.2\n", "============\n", "\n", "Erroneous training labels can be viewed as outliers so a mixture of multivariate\n", "t-distributions can be used instead." ] }, { "cell_type": "raw", "metadata": { "collapsed": true, "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.3\n", "============\n", "\n", "Rewriting (7.18) with Lagrange multipliers gives\n", "\n", ".. math::\n", "\n", " L &= \\sum_{i = 1}^I \\sum_{k = 1}^K r_{ik} \\log\\left(\n", " \\lambda_k \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu}_k, \\boldsymbol{\\Sigma}_k\n", " \\right]\n", " \\right) +\n", " \\nu \\left( \\sum_{k = 1}^K \\lambda_k - 1 \\right)\\\\\n", " &= \\sum_{i = 1}^I \\sum_{k = 1}^K r_{ik}\n", " \\left(\n", " \\log \\lambda_k -\n", " \\frac{D}{2} \\log 2 \\pi -\n", " \\frac{1}{2} \\log \\lvert \\boldsymbol{\\Sigma}_k \\rvert -\n", " \\frac{1}{2}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)^\\top\n", " \\boldsymbol{\\Sigma}_k^{-1} (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)\n", " \\right) +\n", " \\nu \\left( \\sum_{k = 1}^K \\lambda_k - 1 \\right).\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\lambda_k}\n", " &= \\sum_{i = 1}^I\n", " \\frac{\\partial}{\\partial \\lambda_k} r_{ik} \\log \\lambda_k +\n", " \\nu\\\\\n", " 0 &= \\sum_{i = 1}^I \\frac{r_{ik}}{\\lambda_k} + \\nu\n", " & \\quad & r_{ik} = q_i(h_i) \\text{ is a constant in the M-step}\\\\\n", " \\lambda_k &= -\\frac{\\sum_{i = 1}^I r_{ik}}{\\nu}\\\\\n", " &= \\frac{\\sum_{i = 1}^I r_{ik}}{\\sum_{j = 1}^K \\sum_{i = 1}^I r_{ij}}\n", "\n", "since\n", "\n", ".. math::\n", "\n", " 0 &= \\sum_{i = 1}^I r_{ik} + \\nu \\lambda_k\\\\\n", " -\\nu \\sum_{j = 1}^K \\lambda_j &= \\sum_{j = 1}^K \\sum_{i = 1}^I r_{ij}\\\\\n", " \\nu &= -\\sum_{j = 1}^K \\sum_{i = 1}^I r_{ij}\n", " & \\quad & \\sum_k \\lambda_k = 1.\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\mu}_k}\n", " &= \\sum_{i = 1}^I\n", " \\frac{\\partial}{\\partial \\boldsymbol{\\mu}_k}\n", " \\frac{-r_{ik}}{2}\n", " \\left(\n", " \\mathbf{x}_i^\\top \\boldsymbol{\\Sigma}_k^{-1} \\mathbf{x}_i -\n", " 2 \\mathbf{x}_i^\\top \\boldsymbol{\\Sigma}_k^{-1} \\boldsymbol{\\mu}_k +\n", " \\boldsymbol{\\mu}_k^\\top \\boldsymbol{\\Sigma}_k^{-1} \\boldsymbol{\\mu}_k\n", " \\right)\\\\\n", " \\boldsymbol{0}\n", " &= \\sum_{i = 1}^I\n", " \\frac{r_{ik}}{2}\n", " \\left(\n", " 2 \\boldsymbol{\\Sigma}_k^{-1} \\mathbf{x}_i -\n", " 2 \\boldsymbol{\\Sigma}_k^{-1} \\boldsymbol{\\mu}_k\n", " \\right)\n", " & \\quad & \\text{(C.28) and (C.33)}\\\\\n", " \\boldsymbol{\\Sigma}_k^{-1} \\boldsymbol{\\mu}_k \\sum_{i = 1}^I r_{ik}\n", " &= \\boldsymbol{\\Sigma}_k^{-1} \\sum_{i = 1}^I r_{ik} \\mathbf{x}_i\\\\\n", " \\boldsymbol{\\mu}_k\n", " &= \\frac{\n", " \\sum_{i = 1}^I r_{ik} \\mathbf{x}_i\n", " }{\n", " \\sum_{i = 1}^I r_{ik}\n", " }\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\Sigma}_k}\n", " &= \\sum_{i = 1}^I\n", " \\frac{\\partial}{\\partial \\boldsymbol{\\Sigma}_k} \\frac{-r_{ik}}{2}\n", " \\left(\n", " \\log \\left\\vert \\boldsymbol{\\Sigma}_k \\right\\vert +\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)^\\top\n", " \\boldsymbol{\\Sigma}_k^{-1} (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)\n", " \\right)\\\\\n", " \\boldsymbol{0}\n", " &= \\sum_{i = 1}^I\n", " \\frac{r_{ik}}{2}\n", " \\left[\n", " -\\boldsymbol{\\Sigma}_k^{-\\top} +\n", " \\boldsymbol{\\Sigma}_k^{-\\top}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)^\\top\n", " \\boldsymbol{\\Sigma}_k^{-\\top}\n", " \\right]\n", " & \\quad & \\text{(C.38) and Matrix Cookbook Section 2.2}\\\\\n", " \\boldsymbol{\\Sigma}_k^{-1} \\sum_{i = 1}^I r_{ik}\n", " &= \\boldsymbol{\\Sigma}_k^{-1}\n", " \\left(\n", " \\sum_{i = 1}^I r_{ik}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)^\\top\n", " \\right)\n", " \\boldsymbol{\\Sigma}_k^{-1}\\\\\n", " \\boldsymbol{\\Sigma}_k\n", " &= \\frac{\n", " \\sum_{i = 1}^I r_{ik}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}_k)^\\top\n", " }{\n", " \\sum_{i = 1}^I r_{ik}\n", " }" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.4\n", "============\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\BetaDist}{Beta}\n", " Pr(x \\mid \\boldsymbol{\\theta}) =\n", " \\sum_{k = 1}^K \\lambda_k \\BetaDist_x[\\alpha_k, \\beta_k]\n", "\n", "is a mixture of beta distributions where :math:`x \\in [0, 1]` is univariate and\n", ":math:`\\boldsymbol{\\theta} = \\{ \\alpha_k, \\beta_k, \\lambda_k \\}_{k = 1}^K`.\n", "\n", "Define a discrete hidden variable :math:`h \\in \\{1 \\ldots K\\}` such that\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\CatDist}{Cat}\n", " Pr(x \\mid h, \\boldsymbol{\\theta})\n", " &= \\BetaDist_x[\\alpha_h, \\beta_h]\\\\\n", " Pr(h \\mid \\boldsymbol{\\theta}) &= \\CatDist_h[\\boldsymbol{\\lambda}]\n", "\n", "where :math:`\\boldsymbol{\\lambda} = \\{ \\lambda_1, \\ldots, \\lambda_K\\}` are the\n", "parameters of the categorical distribution.\n", "\n", "The original density can be expressed as the marginalization of the previous\n", "terms:\n", "\n", ".. math::\n", "\n", " Pr(x \\mid \\boldsymbol{\\theta})\n", " &= \\sum_{h} Pr(x, h \\mid \\boldsymbol{\\theta})\\\\\n", " &= \\sum_{h}\n", " Pr(x \\mid h, \\boldsymbol{\\theta})\n", " Pr(h \\mid \\boldsymbol{\\theta})\\\\\n", " &= \\sum_{k = 1}^K \\lambda_k \\BetaDist_x[\\alpha_k, \\beta_k].\n", "\n", "In the E-step, the goal is to compute the probability that the\n", ":math:`k\\text{th}` beta distribution was responsible for the\n", ":math:`i\\text{th}` data point:\n", "\n", ".. math::\n", "\n", " q_i(h_i = k)\n", " &= Pr(h_i \\mid x_i, \\boldsymbol{\\theta})\n", " & \\quad & \\text{(7.11)}\\\\\n", " &= \\frac{\n", " Pr(h_i, x_i, \\boldsymbol{\\theta})\n", " }{\n", " Pr(x_i, \\boldsymbol{\\theta})\n", " }\\\\\n", " &= \\frac{\n", " Pr(x_i \\mid h_i, \\boldsymbol{\\theta}) Pr(h_i \\mid \\boldsymbol{\\theta})\n", " }{\n", " Pr(x_i \\mid \\boldsymbol{\\theta})\n", " }\\\\\n", " &= \\frac{\n", " \\lambda_k \\BetaDist_{x_i}[\\alpha_k, \\beta_k]\n", " }{\n", " \\sum_{j = 1}^K \\lambda_j \\BetaDist_{x_i}[\\alpha_j, \\beta_j]\n", " }\\\\\n", " &= r_{ik}.\n", "\n", "In the M-step, the goal is to maximize the lower bound approximation with\n", "respect to :math:`\\boldsymbol{\\theta}` such that\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator*{\\argmax}{arg\\,max}\n", " \\hat{\\boldsymbol{\\theta}}\n", " &= \\argmax_{\\boldsymbol{\\theta}}\n", " \\sum_{i = 1}^I \\sum_{k = 1}^k\n", " q_i(h_i = k)\n", " \\log Pr(x_i, h_i = k \\mid \\boldsymbol{\\theta})\n", " & \\quad & \\text{(7.12)}\\\\\n", " &= \\argmax_{\\boldsymbol{\\theta}}\n", " \\sum_{i = 1}^I \\sum_{k = 1}^k\n", " r_{ik} \\log \\lambda_k \\BetaDist_{x_i}[\\alpha_k, \\beta_k]." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.5\n", "============\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\GamDist}{Gam}\n", " \\DeclareMathOperator{\\StudDist}{Stud}\n", " Pr(x) &= \\int Pr(x, h) dh\\\\\n", " &= \\int Pr(x \\mid h) Pr(h) dh\\\\\n", " &= \\int_0^\\infty\n", " \\NormDist_x\\left[ \\mu, \\sigma^2 / h \\right]\n", " \\GamDist_h[\\nu / 2, \\nu / 2] dh\n", " & \\quad & \\text{Gamma distribution is defined on } (0, \\infty)\\\\\n", " &= \\int_0^\\infty\n", " \\frac{\n", " h^{1/2}\n", " }{\n", " \\sqrt{2 \\pi \\sigma^2}\n", " }\n", " \\exp\\left[\n", " -\\frac{(x - \\mu)^2}{2 \\sigma^2} h\n", " \\right]\n", " \\frac{\n", " \\left( \\frac{\\nu}{2} \\right)^{\\nu / 2}\n", " }{\n", " \\Gamma\\left[ \\frac{\\nu}{2} \\right]\n", " }\n", " \\exp\\left[ -\\frac{\\nu}{2} h \\right] h^{\\nu / 2 - 1} dh\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2}\n", " }{\n", " 2^{\\nu / 2} \\sqrt{2 \\pi \\sigma^2} \\Gamma\\left[ \\frac{\\nu}{2} \\right]\n", " }\n", " \\int_0^\\infty \\exp\\left[\n", " -\\left(\n", " \\frac{\\nu}{2} + \\frac{(x - \\mu)^2}{2 \\sigma^2}\n", " \\right) h\n", " \\right] h^{(\\nu - 1) / 2} dh\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2}\n", " }{\n", " 2^{\\nu / 2} \\sqrt{2 \\pi \\sigma^2} \\Gamma[\\frac{\\nu}{2}]\n", " }\n", " \\left(\n", " \\frac{\\nu}{2} + \\frac{(x - \\mu)^2}{2 \\sigma^2}\n", " \\right)^{-(v + 1) / 2}\n", " \\Gamma\\left[ \\frac{\\nu + 1}{2} \\right]\n", " & \\quad & \\int_0^\\infty x^n \\exp\\left[ -a x^b \\right] dx =\n", " b^{-1} a^{-(n + 1) / b} \\Gamma\\left[ \\frac{n + 1}{b} \\right]\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2}\n", " }{\n", " \\sqrt{\\pi \\sigma^2} \\Gamma\\left[ \\frac{\\nu}{2} \\right]\n", " }\n", " \\left(\n", " \\nu + \\frac{(x - \\mu)^2}{\\sigma^2}\n", " \\right)^{-(v + 1) / 2}\n", " \\Gamma\\left[ \\frac{\\nu + 1}{2} \\right]\\\\\n", " &= \\frac{1}{\n", " \\sqrt{\\nu \\pi \\sigma^2} \\Gamma\\left[ \\frac{\\nu}{2} \\right]\n", " }\n", " \\left(\n", " 1 + \\frac{(x - \\mu)^2}{\\nu \\sigma^2}\n", " \\right)^{-(v + 1) / 2}\n", " \\Gamma\\left[ \\frac{\\nu + 1}{2} \\right]\\\\\n", " &= \\frac{\n", " \\Gamma\\left[ \\frac{\\nu + 1}{2} \\right]\n", " }{\n", " \\sqrt{\\nu \\pi \\sigma^2}\n", " \\Gamma\\left[ \\frac{\\nu}{2} \\right]\n", " }\n", " \\left(\n", " 1 + \\frac{(x - \\mu)^2}{\\nu \\sigma^2}\n", " \\right)^{-\\frac{\\nu + 1}{2}}\\\\\n", " &= \\StudDist_x\\left[ \\mu, \\sigma^2, \\nu \\right]." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.6\n", "============\n", "\n", ".. math::\n", "\n", " \\frac{\\partial}{\\partial z} \\GamDist_z[\\alpha, \\beta]\n", " &= \\frac{\\partial}{\\partial z}\n", " \\frac{\\beta^\\alpha}{\\Gamma[\\alpha]}\n", " \\exp[-\\beta z] z^{\\alpha - 1}\\\\\n", " 0 &= \\frac{\\beta^\\alpha}{\\Gamma[\\alpha]}\n", " \\left(\n", " -\\beta \\exp[-\\beta z] z^{\\alpha - 1} +\n", " (\\alpha - 1) \\exp[-\\beta z] z^{\\alpha - 2}\n", " \\right)\\\\\n", " \\beta z^{\\alpha - 1} &= (\\alpha - 1) z^{\\alpha - 2}\\\\\n", " z &= \\frac{\\alpha - 1}{\\beta}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.7\n", "============\n", "\n", ".. math::\n", "\n", " & \\NormDist_{\\mathbf{x}}[\\boldsymbol{\\mu}, \\boldsymbol{\\Sigma} / h]\n", " \\GamDist_{h}[\\nu / 2, \\nu / 2]\\\\\n", " &= \\frac{\n", " h^{D / 2}\n", " }{\n", " (2 \\pi)^{D / 2} \\lvert \\boldsymbol{\\Sigma} \\rvert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " -\\frac{\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " (\\mathbf{x} - \\boldsymbol{\\mu})\n", " }{2} h\n", " \\right]\n", " \\frac{\\nu^{\\nu / 2}}{2^{\\nu / 2} \\Gamma[\\nu / 2]}\n", " \\exp\\left[ -\\frac{\\nu}{2} h \\right] h^{\\nu / 2 - 1}\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2}\n", " }{\n", " (2 \\pi)^{D / 2}\n", " \\lvert \\boldsymbol{\\Sigma} \\rvert^{1 / 2}\n", " \\Gamma[\\nu / 2]\n", " 2^{\\nu / 2}\n", " }\n", " \\exp\\left[\n", " -\\frac{\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\boldsymbol{\\Sigma}^{-1} (\\mathbf{x} - \\boldsymbol{\\mu}) +\n", " \\nu\n", " }{2} h\n", " \\right]\n", " h^{(\\nu + D) / 2 - 1}\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2}\n", " }{\n", " (2 \\pi)^{D / 2}\n", " \\lvert \\boldsymbol{\\Sigma} \\rvert^{1 / 2}\n", " \\Gamma[\\nu / 2]\n", " 2^{\\nu / 2}\n", " }\n", " \\exp[-\\beta h] h^{\\alpha - 1}\\\\\n", " &= \\kappa \\GamDist_h[\\alpha, \\beta]\n", "\n", "where\n", "\n", ".. math::\n", "\n", " \\alpha &= \\frac{\\nu + D}{2}\\\\\\\\\n", " \\beta &= \\frac{\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\boldsymbol{\\Sigma}^{-1} (\\mathbf{x} - \\boldsymbol{\\mu}) +\n", " \\nu\n", " }{2}\\\\\\\\\n", " \\kappa\n", " &= \\frac{\n", " \\nu^{\\nu / 2} \\Gamma[\\alpha]\n", " }{\n", " (2 \\pi)^{D / 2} \\lvert \\boldsymbol{\\Sigma} \\rvert^{1 / 2}\n", " \\Gamma[\\nu / 2] 2^{\\nu / 2} \\beta^\\alpha\n", " }\\\\\n", " &= \\frac{\n", " \\nu^{\\nu / 2} \\Gamma[\\alpha]\n", " }{\n", " (2 \\pi)^{D / 2} \\lvert \\boldsymbol{\\Sigma} \\rvert^{1 / 2}\n", " \\Gamma[\\nu / 2] 2^{\\nu / 2}\n", " }\n", " \\left( \\frac{\\nu}{2} \\right)^{-\\alpha}\n", " \\left(\n", " \\frac{\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " (\\mathbf{x} - \\boldsymbol{\\mu})\n", " }{\\nu} + 1\n", " \\right)^{-\\alpha}\\\\\n", " &= \\StudDist_{\\mathbf{x}}[\\boldsymbol{\\mu}, \\boldsymbol{\\Sigma}, \\nu]." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _prince2012computer-ex-7.8:\n", "\n", "Exercise 7.8\n", "============\n", "\n", "Let :math:`\\mathbf{x}_i =\n", "\\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\mathbf{h}_i + \\boldsymbol{\\epsilon}_i`\n", "where\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\E}{\\mathrm{E}}\n", " \\DeclareMathOperator{\\Cov}{\\mathrm{Cov}}\n", " \\begin{gather*}\n", " \\E[\\mathbf{h}_i] = \\E[\\boldsymbol{\\epsilon}_i] = \\boldsymbol{0}\\\\\n", " \\Cov(\\mathbf{h}_i, \\mathbf{h}_i) =\n", " \\E\\left[\n", " (\\mathbf{h}_i - \\E[\\mathbf{h}_i])\n", " (\\mathbf{h}_i - \\E[\\mathbf{h}_i])^\\top\n", " \\right] =\n", " \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right] = I\\\\\n", " \\Cov(\\boldsymbol{\\epsilon}_i, \\boldsymbol{\\epsilon}_i) =\n", " \\E\\left[\n", " (\\boldsymbol{\\epsilon}_i - \\E[\\boldsymbol{\\epsilon}_i])\n", " (\\boldsymbol{\\epsilon}_i - \\E[\\boldsymbol{\\epsilon}_i])^\\top\n", " \\right] =\n", " \\E\\left[ \\boldsymbol{\\epsilon}_i \\boldsymbol{\\epsilon}_i^\\top \\right] =\n", " \\boldsymbol{\\Sigma}\\\\\n", " \\Cov(\\mathbf{h}_i, \\boldsymbol{\\epsilon}_i) =\n", " \\E\\left[\n", " (\\mathbf{h}_i - \\E[\\mathbf{h}_i])\n", " (\\boldsymbol{\\epsilon}_i - \\E[\\boldsymbol{\\epsilon}_i])^\\top\n", " \\right] =\n", " \\E\\left[ \\mathbf{h}_i \\boldsymbol{\\epsilon}_i^\\top \\right] =\n", " \\boldsymbol{0}\\\\\n", " \\Cov(\\boldsymbol{\\epsilon}_i, \\mathbf{h}_i) =\n", " \\Cov(\\mathbf{h}_i, \\boldsymbol{\\epsilon}_i)^\\top = \\boldsymbol{0}\n", " \\end{gather*}\n", "\n", "(1)\n", "---\n", "\n", ".. math::\n", "\n", " \\E[\\mathbf{x}_i]\n", " &= \\E[\\boldsymbol{\\mu}] +\n", " \\E[\\boldsymbol{\\Phi} \\mathbf{h}_i] +\n", " \\E[\\boldsymbol{\\epsilon}_i]\n", " & \\quad & \\text{(2.16)}\\\\\n", " &= \\boldsymbol{\\mu} +\n", " \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i] +\n", " \\boldsymbol{0}\n", " & \\quad & \\text{(2.14), (2.15), (2.16)}\\\\\n", " &= \\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\boldsymbol{0}\\\\\n", " &= \\boldsymbol{\\mu}\n", "\n", "(2)\n", "---\n", "\n", ".. math::\n", "\n", " \\E\\left[\n", " (\\mathbf{x}_i - \\E[\\mathbf{x}_i])\n", " (\\mathbf{x}_i - \\E[\\mathbf{x}_i])^\\top\n", " \\right]\n", " &= \\E\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top\n", " \\right]\\\\\n", " &= \\E\\left[\n", " (\\boldsymbol{\\Phi} \\mathbf{h}_i + \\boldsymbol{\\epsilon}_i)\n", " (\\boldsymbol{\\Phi} \\mathbf{h}_i + \\boldsymbol{\\epsilon}_i)^\\top\n", " \\right]\\\\\n", " &= \\E\\left[\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\right] +\n", " \\E\\left[\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i \\boldsymbol{\\epsilon}_i^\\top\n", " \\right] +\n", " \\E\\left[\n", " \\boldsymbol{\\epsilon}_i \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\right] +\n", " \\E\\left[ \\boldsymbol{\\epsilon}_i \\boldsymbol{\\epsilon}_i^\\top \\right]\n", " & \\quad & \\text{(2.16)}\\\\\n", " &= \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top +\n", " \\boldsymbol{\\Phi} \\E\\left[\n", " \\mathbf{h}_i \\boldsymbol{\\epsilon}_i^\\top\n", " \\right] +\n", " \\E\\left[ \\boldsymbol{\\epsilon}_i \\mathbf{h}_i^\\top \\right]\n", " \\boldsymbol{\\Phi}^\\top +\n", " \\boldsymbol{\\Sigma}\n", " & \\quad & \\text{(2.15), (2.16)}\\\\\n", " &= \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. _prince2012computer-ex-7.9:\n", "\n", "Exercise 7.9\n", "============\n", "\n", ".. math::\n", "\n", " q_i(\\mathbf{h}_i)\n", " &= Pr(\\mathbf{h}_i \\mid \\mathbf{x}_i, \\boldsymbol{\\theta})\n", " & \\quad & \\text{(7.11)}\\\\\n", " &= \\frac{\n", " Pr(\\mathbf{x}_i \\mid \\mathbf{h}_i, \\boldsymbol{\\theta})\n", " Pr(\\mathbf{h}_i \\mid \\boldsymbol{\\theta})\n", " }{\n", " Pr(\\mathbf{x}_i \\mid \\boldsymbol{\\theta})\n", " }\\\\\n", " &= \\frac{\n", " \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu} + \\boldsymbol{\\Phi} \\mathbf{h}_i,\n", " \\boldsymbol{\\Sigma}\n", " \\right]\n", " \\NormDist_{\\mathbf{h}_i}\\left[ \\boldsymbol{0}, \\mathbf{I} \\right]\n", " }{\n", " \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}\n", " \\right]\n", " }\\\\\n", " &= \\frac{\n", " \\kappa_1 \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}',\n", " \\boldsymbol{\\Sigma}'\n", " \\right]\n", " \\NormDist_{\\mathbf{h}_i}\\left[ \\boldsymbol{0}, \\mathbf{I} \\right]\n", " }{\n", " \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}\n", " \\right]\n", " }\n", " & \\quad & \\text{(a), (5.17), and Exercise 5.10}\\\\\n", " &= \\frac{\n", " \\kappa_1 \\kappa_2 \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\hat{\\boldsymbol{\\mu}}, \\hat{\\boldsymbol{\\Sigma}}\n", " \\right]\n", " }{\n", " \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}\n", " \\right]\n", " }\n", " & \\quad & \\text{(b), (5.14), (5.15), and Exercise 5.7 & 5.9}\\\\\n", " &= \\NormDist_{\\mathbf{h}_i}\\left[\n", " \\hat{\\boldsymbol{\\mu}}, \\hat{\\boldsymbol{\\Sigma}}\n", " \\right]\n", " & \\quad & \\text{(c)}\n", "\n", "See :ref:`Exercise 5.7 `,\n", ":ref:`Exercise 5.9 `, and\n", ":ref:`Exercise 5.10 ` for details.\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\boldsymbol{\\Sigma}' &=\n", " \\left(\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\right)^{-1}\\\\\\\\\n", " \\boldsymbol{\\Phi}' &=\n", " \\boldsymbol{\\Sigma}' \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\\\\\\\\\n", " \\boldsymbol{\\mu}' &=\n", " -\\boldsymbol{\\Sigma}'\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu}\\\\\\\\\n", " \\kappa_1 &=\n", " \\frac{\n", " \\left\\vert \\boldsymbol{\\Sigma}' \\right\\vert^{1 / 2}\n", " }{\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\boldsymbol{\\Sigma}' \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\hat{\\boldsymbol{\\Sigma}}\n", " &= \\left( \\boldsymbol{\\Sigma}'^{-1} + \\mathbf{I}^{-1} \\right)^{-1}\\\\\n", " &= \\left(\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\\\\\\\\\n", " \\hat{\\boldsymbol{\\mu}}\n", " &= \\hat{\\boldsymbol{\\Sigma}}\n", " \\left(\n", " \\boldsymbol{\\Sigma}'^{-1} \\left(\n", " \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}'\n", " \\right) +\n", " \\mathbf{I}^{-1} \\boldsymbol{0}\n", " \\right)\\\\\n", " &= \\hat{\\boldsymbol{\\Sigma}}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\\\\\\\\\n", " \\kappa_2\n", " &= \\NormDist_{\\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}'}\\left[\n", " \\boldsymbol{0}, \\boldsymbol{\\Sigma}' + \\mathbf{I}\n", " \\right]\\\\\n", " &= \\NormDist_{\\boldsymbol{0}}\\left[\n", " \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}',\n", " \\boldsymbol{\\Sigma}' + \\mathbf{I}\n", " \\right]\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{D / 2} \\lvert \\boldsymbol{\\Sigma}' + \\mathbf{I} \\rvert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " \\left( \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}' \\right)^\\top\n", " \\left( \\boldsymbol{\\Sigma}' + \\mathbf{I} \\right)^{-1}\n", " \\left( \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}' \\right)\n", " \\right]^{-0.5}\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{D / 2} \\lvert \\boldsymbol{\\Sigma}' + \\mathbf{I} \\rvert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\boldsymbol{\\Sigma}'\n", " \\left( \\boldsymbol{\\Sigma}' + \\mathbf{I} \\right)^{-1}\n", " \\boldsymbol{\\Sigma}' \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " \\kappa_1 \\kappa_2\n", " &= \\frac{\n", " \\left\\vert \\boldsymbol{\\Sigma}' \\right\\vert^{1 / 2}\n", " }{\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\boldsymbol{\\Sigma}'\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\n", " \\NormDist_{\\boldsymbol{0}}\\left[\n", " \\boldsymbol{\\Phi}' \\mathbf{x}_i + \\boldsymbol{\\mu}',\n", " \\boldsymbol{\\Sigma}' + \\mathbf{I}\n", " \\right]\\\\\n", " &= \\frac{\n", " \\left\\vert \\boldsymbol{\\Sigma}' \\right\\vert^{1 / 2}\n", " }{\n", " (2 \\pi)^{D / 2} \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{1 / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma}' + \\mathbf{I} \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\boldsymbol{\\Sigma}'\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} +\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\boldsymbol{\\Sigma}'\n", " \\left( \\boldsymbol{\\Sigma}' + \\mathbf{I} \\right)^{-1}\n", " \\boldsymbol{\\Sigma}' \\boldsymbol{\\Phi}^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " \\right)\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{D / 2} \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert^{1 / 2}\n", " \\left\\vert \\boldsymbol{\\Sigma}'^{-1} + \\mathbf{I} \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left(\n", " \\boldsymbol{\\Sigma}' -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\boldsymbol{\\Sigma}' + \\mathbf{I} \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'\n", " \\right)\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right)\n", " (\\mathbf{x} - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{D / 2}\n", " \\left\\vert\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}'^{-1} \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right)\n", " (\\mathbf{x} - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\n", " & \\quad & \\text{(c.1) and (d)}\\\\\n", " &= \\frac{1}{\n", " (2 \\pi)^{D / 2}\n", " \\left\\vert\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right\\vert^{1 / 2}\n", " }\n", " \\exp\\left[\n", " (\\mathbf{x} - \\boldsymbol{\\mu})^\\top\n", " \\left(\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right)^{-1}\n", " (\\mathbf{x} - \\boldsymbol{\\mu})\n", " \\right]^{-0.5}\n", " & \\quad & \\text{Sherman–Morrison–Woodbury formula}\\\\\n", " &= \\NormDist_{\\mathbf{x}_i}\\left[\n", " \\boldsymbol{\\mu},\n", " \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top + \\boldsymbol{\\Sigma}\n", " \\right]\n", "\n", "(c.1)\n", "-----\n", "\n", ".. math::\n", "\n", " \\left(\n", " \\boldsymbol{\\Sigma}' -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}'\\right)^{-1} \\boldsymbol{\\Sigma}'\n", " \\right)\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}'^{-1} \\right)\n", " &= \\boldsymbol{\\Sigma}' + \\mathbf{I} -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}' -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\\\\\n", " &= \\mathbf{I} +\n", " \\boldsymbol{\\Sigma}' \\left[\n", " \\mathbf{I} - \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\right] -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'\\\\\n", " &= \\mathbf{I}\n", " & \\quad & \\text{(c.2)}\\\\\n", " \\left(\n", " \\boldsymbol{\\Sigma}' -\n", " \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'\n", " \\right)\n", " &= \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}'^{-1} \\right)^{-1}\n", "\n", "(c.2)\n", "-----\n", "\n", ".. math::\n", "\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right) \\boldsymbol{\\Sigma}'^{-1}\n", " &= \\boldsymbol{\\Sigma}'^{-1} + \\mathbf{I}\n", " & \\quad & \\text{Exercise 5.9}\\\\\n", " \\boldsymbol{\\Sigma}'^{-1}\n", " &= \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\left( \\boldsymbol{\\Sigma}'^{-1} + \\mathbf{I} \\right)\\\\\n", " \\boldsymbol{\\Sigma}'^{-1} -\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'^{-1}\n", " &= \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\\\\\n", " \\mathbf{I} - \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " &= \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'\\\\\n", " \\boldsymbol{\\Sigma}' \\left[\n", " \\mathbf{I} - \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\right]\n", " &= \\boldsymbol{\\Sigma}'\n", " \\left( \\mathbf{I} + \\boldsymbol{\\Sigma}' \\right)^{-1}\n", " \\boldsymbol{\\Sigma}'\n", "\n", "(d)\n", "---\n", "\n", ".. math::\n", "\n", " \\left(\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert\n", " \\left\\vert \\boldsymbol{\\Sigma}'^{-1} + \\mathbf{I} \\right\\vert\n", " \\right)^{-1 / 2}\n", " &= \\left(\n", " \\left\\vert \\boldsymbol{\\Sigma} \\right\\vert\n", " \\left\\vert\n", " \\mathbf{I} +\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\right\\vert\n", " \\right)^{-1 / 2}\\\\\n", " &= \\left\\vert\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right\\vert^{-1 / 2}\n", " & \\quad & \\text{Sylvester's determinant theorem}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Exercise 7.10\n", "=============\n", "\n", ".. math::\n", "\n", " \\hat{\\boldsymbol{\\theta}}\n", " &= \\argmax_\\boldsymbol{\\theta}\n", " \\sum_{i = 1}^I \\sum_{k = 1}^k q_i(\\mathbf{h}_i)\n", " \\log Pr(\\mathbf{x}_i, \\mathbf{h}_i \\mid \\boldsymbol{\\theta})\n", " & \\quad & \\text{(7.12)}\\\\\n", " &= \\argmax_\\boldsymbol{\\theta} \\sum_{i = 1}^I\n", " \\E\\left[\n", " \\log Pr(\\mathbf{x}_i, \\mathbf{h}_i \\mid \\boldsymbol{\\theta})\n", " \\right]\n", " & \\quad & \\text{(7.36)}\\\\\n", " &= \\argmax_\\boldsymbol{\\theta} L(\\boldsymbol{\\theta})\n", "\n", "where :math:`L` is the log-likelihood given by\n", "\n", ".. math::\n", "\n", " L\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\E\\left[\n", " D \\log(2\\pi) + \\log \\lvert \\boldsymbol{\\Sigma} \\rvert +\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i)^\\top\n", " \\boldsymbol{\\Sigma}^{-1}\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i)\n", " \\right]\n", " & \\quad & \\text{(7.37)}\\\\\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\E\\left[\n", " D \\log(2\\pi) + \\log \\lvert \\boldsymbol{\\Sigma} \\rvert +\n", " \\mathbf{x}_i^\\top \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i +\n", " \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i +\n", " 2 \\boldsymbol{\\mu}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " 2 \\mathbf{x}_i^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " 2 \\mathbf{x}_i^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right].\n", "\n", "(a)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\mu}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I \\E\\left[\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} +\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\mathbf{h}_i -\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i\n", " \\right]\n", " & \\quad & \\text{(C.33), (C.27), (C.28)}\\\\\n", " 0\n", " &= \\sum_{i = 1}^I\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i] +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i\n", " & \\quad & \\E \\text{ is a linear operator}\\\\\n", " &= \\sum_{i = 1}^I\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\left(\n", " \\left(\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\left( \\mathbf{x}_i - \\boldsymbol{\\mu} \\right)\n", " \\right) +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i\n", " & \\quad & \\text{(7.35)}\\\\\n", " &= \\sum_{i = 1}^I\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\left(\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right) \\mathbf{x}_i -\n", " \\left(\n", " \\boldsymbol{\\Sigma}^{-1} -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} \\left(\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi} +\n", " \\mathbf{I}\n", " \\right)^{-1}\n", " \\boldsymbol{\\Phi}^\\top \\boldsymbol{\\Sigma}^{-1}\n", " \\right) \\boldsymbol{\\mu}\\\\\n", " &= \\sum_{i = 1}^I\n", " \\left(\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right)^{-1} \\mathbf{x}_i -\n", " \\left(\n", " \\boldsymbol{\\Sigma} + \\boldsymbol{\\Phi} \\boldsymbol{\\Phi}^\\top\n", " \\right)^{-1} \\boldsymbol{\\mu}\n", " & \\quad & \\text{Sherman-Morrison-Woodbury formula}\\\\\n", " \\boldsymbol{\\mu} &= \\frac{1}{I} \\sum_{i = 1}^I \\mathbf{x}_i\n", "\n", "(b)\n", "---\n", "\n", ".. math::\n", "\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\Phi}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I\n", " \\E\\left[\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\mathbf{h}_i \\mathbf{h}_i^\\top +\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu} \\mathbf{h}_i^\\top -\n", " 2 \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i \\mathbf{h}_i^\\top\n", " \\right]\n", " & \\quad & \\text{(C.34), (C.29)}\\\\\n", " 0\n", " &= \\sum_{i = 1}^I\n", " -\\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\Phi}\n", " \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right] -\n", " \\boldsymbol{\\Sigma}^{-1} \\boldsymbol{\\mu}\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right] +\n", " \\boldsymbol{\\Sigma}^{-1} \\mathbf{x}_i\n", " \\E\\left[ \\mathbf{h}_i^\\top \\right]\n", " & \\quad & \\E \\text{ is a linear operator}\\\\\n", " \\boldsymbol{\\Phi}\n", " &= \\left(\n", " \\sum_{i = 1}^I\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu}) \\E\\left[ \\mathbf{h}_i^\\top \\right]\n", " \\right)\n", " \\left(\n", " \\sum_{i = 1}^I \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right]\n", " \\right)^{-1}\n", "\n", "(c)\n", "---\n", "\n", ".. math::\n", "\n", " \\DeclareMathOperator{\\diag}{\\mathrm{diag}}\n", " \\frac{\\partial L}{\\partial \\boldsymbol{\\Sigma}}\n", " &= -\\frac{1}{2} \\sum_{i = 1}^I\n", " \\E\\left[\n", " \\boldsymbol{\\Sigma}^{-\\top} -\n", " \\boldsymbol{\\Sigma}^{-\\top}\n", " \\left(\n", " \\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)\n", " \\left(\n", " \\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\boldsymbol{\\Sigma}^{-\\top}\n", " \\right]\n", " & \\quad & \\text{(C.38) and Matrix Cookbook (61)}\\\\\n", " \\boldsymbol{\\Sigma}\n", " &= \\frac{1}{I} \\sum_{i = 1}^I\n", " \\E\\left[\n", " \\left(\n", " \\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)\n", " \\left(\n", " \\mathbf{x}_i - \\boldsymbol{\\mu} - \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\right)^\\top\n", " \\right]\n", " & \\quad & \\E \\text{ is a linear operator}\\\\\n", " &= \\frac{1}{I} \\sum_{i = 1}^I\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top +\n", " \\E\\left[\n", " 2 \\boldsymbol{\\Phi} \\mathbf{h}_i \\boldsymbol{\\mu}^\\top -\n", " 2 \\boldsymbol{\\Phi} \\mathbf{h}_i \\mathbf{x}_i^\\top +\n", " \\boldsymbol{\\Phi} \\mathbf{h}_i\n", " \\mathbf{h}_i^\\top \\boldsymbol{\\Phi}^\\top\n", " \\right]\n", " & \\quad & \\text{expanded version of (7.37)}\\\\\n", " &= \\frac{1}{I} \\sum_{i = 1}^I\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top -\n", " 2 \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_i - \\boldsymbol{\\mu}\\right)^\\top +\n", " \\boldsymbol{\\Phi} \\E\\left[ \\mathbf{h}_i \\mathbf{h}_i^\\top \\right]\n", " \\boldsymbol{\\Phi}^\\top\\\\\n", " &= \\frac{1}{I} \\sum_{i = 1}^I\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top -\n", " \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_i - \\boldsymbol{\\mu} \\right)^\\top\n", " & \\quad & \\text{(b)}\\\\\n", " &= \\frac{1}{I} \\sum_{i = 1}^I \\diag\\left[\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})\n", " (\\mathbf{x}_i - \\boldsymbol{\\mu})^\\top -\n", " \\boldsymbol{\\Phi} \\E[\\mathbf{h}_i]\n", " \\left( \\mathbf{x}_i - \\boldsymbol{\\mu}\\right)^\\top\n", " \\right]\n", " & \\quad & \\boldsymbol{\\Sigma} \\text{ diagonal constraint.}" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }