{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "***************************\n",
    "Models for Chains and Trees\n",
    "***************************\n",
    "\n",
    "MAP Inference for Chains\n",
    "========================\n",
    "\n",
    "- Any minimization problem of the form (11.9) can be solved using the Viterbi\n",
    "  algorithm (see Figure 11.3).\n",
    "\n",
    "  - Reformulate the problem as finding the minimum cost path.\n",
    "  - Factorizing the joint probability distribution (using conditional\n",
    "    independence structure) leads to efficient inference.\n",
    "\n",
    "    - See (11.1) and (11.2).\n",
    "    - The forward-backward algorithm uses conditional independence instead\n",
    "      of factorization."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.1\n",
    "=============\n",
    "\n",
    "Notice that only the costs for :math:`\\{ V_{n, 2} \\}_{n = 1}^N` have changed\n",
    "compared to the example in the book.  This exercise is a bit too tedious for\n",
    "those that have existing experience with dynamic programmming."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _prince2012computer-ex-11.2:\n",
    "\n",
    "Exercise 11.2\n",
    "=============\n",
    "\n",
    "The Viterbi algorithm takes :math:`\\mathcal{O}\\left( N K^2 \\right)` for a model\n",
    "with :math:`N` variables, each of which takes on :math:`K` possible values.\n",
    "Reformulating this as a graph :math:`G`, there are :math:`|V| = NK` nodes and\n",
    ":math:`|E| = (N - 1) K^2` directed edges.\n",
    "\n",
    "Dijkstra's algorithm finds the shortest paths between nodes in a graph.  As\n",
    "illustrated in Figure 11.23, :math:`G` needs to be modified to have an\n",
    "additional two nodes and :math:`2K` edges i.e. :math:`G' = (V', E')` where\n",
    ":math:`\\left\\vert V' \\right\\vert = NK + 2` and\n",
    ":math:`\\left\\vert E' \\right\\vert = (N - 1) K^2 + 2K`.\n",
    "\n",
    "Sleator-Tarjan have shown that using a Fibonacci heap achieves a worst-case\n",
    "analysis of\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\mathcal{O}\\left(\n",
    "     \\left\\vert E' \\right\\vert +\n",
    "     \\left\\vert V' \\right\\vert \\log \\left\\vert V' \\right\\vert\n",
    "   \\right) =\n",
    "   \\mathcal{O}\\left( (N - 1) K^2 + 2K + (NK + 2) \\log(NK + 2) \\right).\n",
    "\n",
    "For DAGs, performing a topological sort and looping over the result relaxing\n",
    "the edge weights takes\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\mathcal{O}\\left(\n",
    "     \\left\\vert E' \\right\\vert + \\left\\vert V' \\right\\vert\n",
    "   \\right) =\n",
    "   \\mathcal{O}\\left( (N - 1) K^2 + 2K + (NK + 2) \\right).\n",
    "\n",
    "SSSP outperforms dynamic programming when :math:`K \\gg N`."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.3\n",
    "=============\n",
    "\n",
    "The overall joint probability is\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(\\mathbf{x}_{1 \\ldots 6}, w_{1 \\ldots 6})\n",
    "    &= Pr(\\mathbf{x}_{1 \\ldots 6} \\mid w_{1 \\ldots 6})\n",
    "       Pr(w_{1 \\ldots 6})\\\\\n",
    "    &= \\prod_{i = 1}^6 Pr(\\mathbf{x}_i \\mid w_i) \\cdot\n",
    "       Pr(w_6) Pr(w_5 \\mid w_6) Pr(w_2 \\mid w_5) Pr(w_4 \\mid w_5)\n",
    "         Pr(w_1 \\mid w_2) Pr(w_3 \\mid w_4).\n",
    "\n",
    "The cost function for the MAP estimation is\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\DeclareMathOperator*{\\argmax}{arg\\,max}\\\\\n",
    "   \\DeclareMathOperator*{\\argmin}{arg\\,min}\\\\\n",
    "   \\argmax_{\\mathbf{w}}\n",
    "     Pr(w_{1 \\ldots 6} \\mid \\mathbf{x}_{1 \\ldots 6})\n",
    "    &= \\argmin_{\\mathbf{w}}\n",
    "         -\\log Pr(\\mathbf{x}_{1 \\ldots 6}, w_{1 \\ldots 6})\n",
    "       & \\quad & \\text{(11.7)}\\\\\n",
    "    &= \\argmin_{\\mathbf{w}}\n",
    "         \\left( \\sum_{n = 1}^6 U_n(w_n) \\right) -\n",
    "         \\log Pr(w_6) + P_5(w_5, w_6) + P_2(w_2, w_5) + P_4(w_4, w_5) +\n",
    "         P_1(w_1, w_2) + P_3(w_3, w_4)\n",
    "       & \\quad & \\text{(11.10).}\n",
    "\n",
    "Notice how reversing the graphical model removed the three-wise cost\n",
    ":math:`T_5` i.e. the largest number of incoming connections is reduced to one."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.4\n",
    "=============\n",
    "\n",
    "This exercise is a bit too tedious for those that have existing experience with\n",
    "dynamic programmming.  The only novel item is (11.19), but that's just a 3D\n",
    "lookup table."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.5\n",
    "=============\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\newcommand\\qqquad{\\hskip3em}\n",
    "   \\begin{align}\n",
    "     \\hat{w}_N\n",
    "      &= \\argmax_{w_N} \\log Pr(\\mathbf{x}_N \\mid w_N) +\n",
    "           \\max_{w_1} \\log Pr(\\mathbf{x}_1 \\mid w_1) +\n",
    "             \\max_{w_2} \\log Pr(\\mathbf{x}_2 \\mid w_2) +\n",
    "                 \\log Pr(w_2 \\mid w_1) +\\\\\n",
    "      &\\qquad  \\max_{w_3} \\log Pr(\\mathbf{x}_3 \\mid w_3) +\n",
    "                   \\log Pr(w_3 \\mid w_2) + \\ldots\\\\\n",
    "      &\\qqquad   \\max_{w_{N - 1}} \\log Pr(\\mathbf{x}_{N - 1} \\mid w_{N - 1}) +\n",
    "                     \\log Pr(w_{N - 1} \\mid w_{N - 2}) +\n",
    "                     \\log Pr(w_N \\mid w_{N - 1})\\\\\n",
    "      &= \\argmax_{w_N} \\log Pr(\\mathbf{x}_N \\mid w_N) +\n",
    "           \\max_{w_{N - 1}} \\log Pr(w_N \\mid w_{N - 1}) +\n",
    "               \\log Pr(\\mathbf{x}_{N - 1} \\mid w_{N - 1}) +\\\\\n",
    "      &\\qquad \\max_{w_{N - 2}} \\log Pr(w_{N - 1} \\mid w_{N - 2}) +\n",
    "                 \\log Pr(\\mathbf{x}_{N - 2} \\mid w_{N - 2}) + \\ldots\\\\\n",
    "      &\\qqquad \\max_{w_2} \\log Pr(w_3 \\mid w_2) +\n",
    "                   \\log Pr(\\mathbf{x}_2 \\mid w_2) +\n",
    "                 \\max_{w_1} \\log Pr(w_2 \\mid w_1) +\n",
    "                     \\log Pr(\\mathbf{x}_1 \\mid w_1)\\\\\n",
    "      &= \\argmax_{w_N} f_N[w_N]\n",
    "   \\end{align}\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   f_n[w_n] =\n",
    "   \\log Pr(\\mathbf{x}_n \\mid w_n) +\n",
    "       \\max_{w_{n - 1}} \\log Pr(w_n \\mid w_{n - 1}) + f_{n - 1}[w_{n - 1}]\n",
    "\n",
    "and\n",
    "\n",
    ".. math::\n",
    "\n",
    "   f_1[w_1] = \\log Pr(\\mathbf{x}_1 \\mid w_1)."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.6\n",
    "=============\n",
    "\n",
    "Given the joint probability\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(\\mathbf{x}_{1 \\ldots N}, w_{1 \\ldots N}) =\n",
    "   \\prod_{n = 1}^N Pr(\\mathbf{x}_n \\mid w_n) \\cdot\n",
    "   Pr(w_1) \\prod_{n = 2}^N Pr(w_n \\mid w_{n - 1}),\n",
    "\n",
    "the marginal distribution for any arbitrary variable :math:`w_n` is\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\begin{align}\n",
    "     Pr(w_n \\mid \\mathbf{x}_{1 \\ldots N})\n",
    "      &= \\frac{\n",
    "           Pr(w_n, \\mathbf{x}_{1 \\ldots N})\n",
    "         }{\n",
    "           Pr(\\mathbf{x}_{1 \\ldots N})\n",
    "         }\\\\\n",
    "      &\\propto Pr(w_n, \\mathbf{x}_{1 \\ldots N})\\\\\n",
    "      &= \\sum_{w_1} \\cdots \\sum_{w_{n - 1}} \\sum_{w_{n + 1}} \\cdots \\sum_{w_N}\n",
    "           Pr(\\mathbf{x}_{1 \\ldots N}, w_{1 \\ldots N})\\\\\n",
    "      &= Pr(\\mathbf{x}_n \\mid w_n)\n",
    "         \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_1)\n",
    "           \\sum_{w_2} Pr(\\mathbf{x}_2 \\mid w_2) Pr(w_2 \\mid w_1) \\cdots\\\\\n",
    "      &\\qquad \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "                  Pr(w_{n - 1} \\mid w_{n - 2}) Pr(w_n \\mid w_{n - 1})\n",
    "                \\sum_{w_{n + 1}} Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1})\n",
    "                    Pr(w_{n + 1} \\mid w_n) \\cdots\\\\\n",
    "      &\\qqquad    \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N) Pr(w_N \\mid w_{N - 1})\\\\\n",
    "      &= Pr(\\mathbf{x}_n \\mid w_n) \\alpha_{n - 1}[w_n] \\beta_{n + 1}[w_n]\n",
    "      & \\quad & \\text{(a) and (b)}\n",
    "   \\end{align}\n",
    "\n",
    "(a)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   & \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_1)\n",
    "       \\sum_{w_2} Pr(\\mathbf{x}_2 \\mid w_2) Pr(w_2 \\mid w_1) \\cdots\n",
    "         \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "             Pr(w_{n - 1} \\mid w_{n - 2}) Pr(w_n \\mid w_{n - 1})\\\\\n",
    "   &= \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "          Pr(w_n \\mid w_{n - 1})\n",
    "        \\sum_{w_{n - 2}} Pr(\\mathbf{x}_{n - 2} \\mid w_{n - 2})\n",
    "            Pr(w_{n - 1} \\mid w_{n - 2}) \\cdots\n",
    "          \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_2 \\mid w_1) Pr(w_1)\\\\\n",
    "   &= \\alpha_{n - 1}[w_n]\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\alpha_{n - 1}[w_n] =\n",
    "   \\sum_{w_{n - 1}}\n",
    "     Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1}) Pr(w_n \\mid w_{n - 1})\n",
    "     \\alpha_{n - 2}[w_{n - 1}]\n",
    "\n",
    "and\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\alpha_1[w_2] =\n",
    "   \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_2 \\mid w_1) Pr(w_1).\n",
    "\n",
    "(b)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\sum_{w_{n + 1}} Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1}) Pr(w_{n + 1} \\mid w_n)\n",
    "     \\sum_{w_{n + 2}} Pr(\\mathbf{x}_{n + 2} \\mid w_{n + 2})\n",
    "         Pr(w_{n + 2} \\mid w_{n + 1}) \\cdots\n",
    "       \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N) Pr(w_N \\mid w_{N - 1}) =\n",
    "   \\beta_{n + 1}[w_n]\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\beta_{n + 1}[w_n] =\n",
    "   \\sum_{w_{n + 1}}\n",
    "     Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1}) Pr(w_{n + 1} \\mid w_n)\n",
    "     \\beta_{n + 2}[w_{n + 1}]\n",
    "\n",
    "and\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\beta_N[w_{N - 1}] =\n",
    "   \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N) Pr(w_N \\mid w_{N - 1})."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.7\n",
    "=============\n",
    "\n",
    "Given the joint probability\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(\\mathbf{x}_{1 \\ldots N}, w_{1 \\ldots N}) =\n",
    "   \\prod_{n = 1}^N Pr(\\mathbf{x}_n \\mid w_n) \\cdot\n",
    "   Pr(w_1) \\prod_{n = 2}^N Pr(w_n \\mid w_{n - 1}),\n",
    "\n",
    "the marginal distribution for any arbitrary variable pairs :math:`w_m` and\n",
    ":math:`w_n` where :math:`1 \\leq m < n \\leq N` is\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_m, w_n \\mid \\mathbf{x}_{1 \\ldots N})\n",
    "    &= \\frac{\n",
    "         Pr(w_m, w_n, \\mathbf{x}_{1 \\ldots N})\n",
    "       }{\n",
    "         Pr(\\mathbf{x}_{1 \\ldots N})\n",
    "       }\\\\\n",
    "    &\\propto Pr(w_m, w_n, \\mathbf{x}_{1 \\ldots N})\\\\\n",
    "    &= \\sum_{w_1} \\cdots \\sum_{w_{m - 1}} \\sum_{w_{m + 1}} \\cdots\n",
    "         \\sum_{w_{n - 1}} \\sum_{w_{n + 1}} \\cdots\n",
    "           \\sum_{w_N} Pr(\\mathbf{x}_{1 \\ldots N}, w_{1 \\ldots N})\\\\\n",
    "    &= Pr(\\mathbf{x}_m \\mid w_m) Pr(\\mathbf{x}_n \\mid w_n)\n",
    "       \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_1)\n",
    "         \\sum_{w_2} Pr(\\mathbf{x}_2 \\mid w_2) Pr(w_2 \\mid w_1) \\cdots\\\\\n",
    "    &\\qquad \\sum_{w_{m - 1}} Pr(\\mathbf{x}_{m - 1} \\mid w_{m - 1})\n",
    "                Pr(w_{m - 1} \\mid w_{m - 2}) Pr(w_m \\mid w_{m - 1})\n",
    "              \\sum_{w_{m + 1}} Pr(\\mathbf{x}_{m + 1} \\mid w_{m + 1})\n",
    "                  Pr(w_{m + 1} \\mid w_m) \\cdots\\\\\n",
    "    &\\qqquad    \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "                    Pr(w_{n - 1} \\mid w_{n - 2}) Pr(w_n \\mid w_{n - 1})\n",
    "                  \\sum_{w_{n + 1}} Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1})\n",
    "                      Pr(w_{n + 1} \\mid w_n) \\cdots\n",
    "                    \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N)\n",
    "                               Pr(w_N \\mid w_{N - 1})\\\\\n",
    "    &= Pr(\\mathbf{x}_m \\mid w_m) Pr(\\mathbf{x}_n \\mid w_n)\n",
    "       \\alpha_{m - 1}[w_m] \\gamma[w_m, w_n] \\beta_{n + 1}[w_n]\n",
    "       & \\quad & \\text{(a), (b), and (c)}\n",
    "\n",
    "(a)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   & \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_1)\n",
    "       \\sum_{w_2} Pr(\\mathbf{x}_2 \\mid w_2) Pr(w_2 \\mid w_1) \\cdots\n",
    "         \\sum_{w_{m - 1}} Pr(\\mathbf{x}_{m - 1} \\mid w_{m - 1})\n",
    "             Pr(w_{m - 1} \\mid w_{m - 2}) Pr(w_m \\mid w_{m - 1})\\\\\n",
    "    &= \\sum_{w_{m - 1}} Pr(\\mathbf{x}_{m - 1} \\mid w_{m - 1})\n",
    "           Pr(w_m \\mid w_{m - 1})\n",
    "         \\sum_{w_{m - 2}} Pr(\\mathbf{x}_{m - 2} \\mid w_{m - 2})\n",
    "             Pr(w_{m - 1} \\mid w_{m - 2}) \\cdots\n",
    "           \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_2 \\mid w_1) Pr(w_1)\\\\\n",
    "    &= \\alpha_{m - 1}[w_m]\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\alpha_{m - 1}[w_m]\n",
    "    &= \\sum_{w_{m - 1}} Pr(\\mathbf{x}_{m - 1} \\mid w_{m - 1})\n",
    "                        Pr(w_m \\mid w_{m - 1}) \\alpha_{m - 2}[w_{m - 1}],\\\\\\\\\\\\\n",
    "   \\alpha_1[w_2]\n",
    "    &= \\sum_{w_1} Pr(\\mathbf{x}_1 \\mid w_1) Pr(w_2 \\mid w_1) Pr(w_1),\\\\\\\\\\\\\n",
    "   \\alpha_0[w_1] &= Pr(w_1).\n",
    "\n",
    "(b)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\beta_{n + 1}[w_n] =\n",
    "   \\sum_{w_{n + 1}} Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1})\n",
    "       Pr(w_{n + 1} \\mid w_n) \\cdots\n",
    "     \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N) Pr(w_N \\mid w_{N - 1})\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\beta_{n + 1}[w_n]\n",
    "    &= \\sum_{w_{n + 1}} Pr(\\mathbf{x}_{n + 1} \\mid w_{n + 1})\n",
    "                        Pr(w_{n + 1} \\mid w_n) \\beta_{n + 2}[w_{n + 1}],\\\\\\\\\\\\\n",
    "   \\beta_N[w_{N - 1}]\n",
    "    &= \\sum_{w_N} Pr(\\mathbf{x}_N \\mid w_N) Pr(w_N \\mid w_{N - 1}),\\\\\\\\\\\\\n",
    "   \\beta_{N + 1}[w_N] &= 1.\n",
    "\n",
    "(c)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\gamma[w_m, w_n] =\n",
    "   \\sum_{w_{m + 1}} Pr(\\mathbf{x}_{m + 1} \\mid w_{m + 1})\n",
    "       Pr(w_{m + 1} \\mid w_m) \\cdots\n",
    "     \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "                      Pr(w_{n - 1} \\mid w_{n - 2}) Pr(w_n \\mid w_{n - 1})\n",
    "\n",
    "Notice that\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_m, w_n \\mid \\mathbf{x}_{1 \\ldots N})\n",
    "    &= Pr(w_m \\mid \\mathbf{x}_{1 \\ldots N})\n",
    "       Pr(w_n \\mid w_m, \\mathbf{x}_{1 \\ldots N})\\\\\n",
    "    &= Pr(w_m \\mid \\mathbf{x}_{1 \\ldots N})\n",
    "       Pr(w_n \\mid w_m, \\mathbf{x}_{(m + 1) \\ldots N})\n",
    "\n",
    "is reduced to :math:`Pr(w_m \\mid \\mathbf{x}_{1 \\ldots N})` when\n",
    ":math:`m = n` i.e. :math:`\\gamma = 1`.\n",
    "\n",
    "When :math:`n = m + 1`, the expression reduces to\n",
    ":math:`\\gamma = Pr(w_n \\mid w_m)` because :math:`w_n` and :math:`w_m` are not\n",
    "included in the marginalization.\n",
    "\n",
    "The derivation of the last case when :math:`n > m + 1` is similar to (a):\n",
    "\n",
    ".. math::\n",
    "\n",
    "   & \\sum_{w_{m + 1}} Pr(\\mathbf{x}_{m + 1} \\mid w_{m + 1})\n",
    "         Pr(w_{m + 1} \\mid w_m) \\cdots\n",
    "       \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "                        Pr(w_{n - 1} \\mid w_{n - 2}) Pr(w_n \\mid w_{n - 1})\\\\\n",
    "    &= \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "           Pr(w_n \\mid w_{n - 1})\n",
    "         \\sum_{w_{n - 2}} Pr(\\mathbf{x}_{n - 2} \\mid w_{n - 2})\n",
    "             Pr(w_{n - 1} \\mid w_{n - 2}) \\cdots\n",
    "           \\sum_{w_{m + 1}} Pr(\\mathbf{x}_{m + 1} \\mid w_{m + 1})\n",
    "               Pr(w_{m + 2} \\mid w_{m + 1}) Pr(w_{m + 1} \\mid w_m)\\\\\n",
    "    &= \\gamma_{n - 1}[w_n]\n",
    "\n",
    "where\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\gamma_{n - 1}[w_n] =\n",
    "   \\sum_{w_{n - 1}} Pr(\\mathbf{x}_{n - 1} \\mid w_{n - 1})\n",
    "                    Pr(w_n \\mid w_{n - 1}) \\gamma_{n - 2}[w_{n - 1}]\n",
    "\n",
    "and\n",
    "\n",
    ".. math::\n",
    "\n",
    "   \\gamma_{m + 1}[w_{m + 2}] =\n",
    "   \\sum_{w_{m + 1}} Pr(\\mathbf{x}_{m + 1} \\mid w_{m + 1})\n",
    "                    Pr(w_{m + 2} \\mid w_{m + 1}) Pr(w_{m + 1} \\mid w_m)."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.8\n",
    "=============\n",
    "\n",
    "One can conclude that factor graphs enable one to determine the maximal cliques\n",
    "in undirected graphical model.\n",
    "\n",
    "(1)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(x_1, x_2, x_3) =\n",
    "   \\frac{1}{Z_1} \\phi_{12}[x_1, x_2] \\phi_{23}[x_2, x_3] \\phi_{31}[x_3, x_1]\n",
    "\n",
    "(1.a)\n",
    "^^^^^\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (x_1) \\leftrightarrow (x_2)\\\\\n",
    "   (x_2) \\leftrightarrow (x_3)\\\\\n",
    "   (x_3) \\leftrightarrow (x_1)\n",
    "\n",
    "(1.b)\n",
    "^^^^^\n",
    ".. math::\n",
    "\n",
    "   (x_1) -[A]- (x_2)\\\\\n",
    "   (x_2) -[B]- (x_3)\\\\\n",
    "   (x_3) -[C]- (x_1)\n",
    "\n",
    "(2)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(x_1, x_2, x_3) =\n",
    "   \\frac{1}{Z_2} \\phi_{123}[x_1, x_2, x_3]\n",
    "\n",
    "(2.a)\n",
    "^^^^^\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (x_1) \\leftrightarrow (x_2) \\leftrightarrow (x_3)\n",
    "\n",
    "(2.b)\n",
    "^^^^^\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (x_1) -[A]-\\\\\n",
    "   (x_2) -[A]-\\\\\n",
    "   (x_3) -[A]-"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.9\n",
    "=============\n",
    "\n",
    ":cite:`brownmpugb` was very useful in answering how to define the factors of a\n",
    "PDF for undirected graphical models.  In short, anything goes as long as the set\n",
    "of potential functions :math:`\\phi` for a particular factorization account for\n",
    "the structure of the graph (e.g. edges being shared between cliques).\n",
    "\n",
    "According to :cite:`kschischang2001factor`, factor graphs of single components\n",
    "are allowed.  This makes sense because the independence of the nodes are lost\n",
    "otherwise.\n",
    "\n",
    "(a) and (d) are the only ones that do not have a cycle i.e. takes the form of a\n",
    "chain.\n",
    "\n",
    "(a)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_{1 \\ldots 6}) =\n",
    "   Pr(w_2) Pr(w_6) Pr(w_1 | w_2) Pr(w_5 | w_2, w_6) Pr(w_4 | w_5) Pr(w_3 | w_4)\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (w_2) -[A]\\\\\n",
    "   (w_6) -[B]\\\\\n",
    "   (w_1) -[C]- (w_2)\\\\\n",
    "   (w_5) -[D]-\\\\\n",
    "   (w_2) -[D]-\\\\\n",
    "   (w_6) -[D]-\\\\\n",
    "   (w_4) -[E]- (w_5)\\\\\n",
    "   (w_3) -[F]- (w_4)\n",
    "\n",
    "(b)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_{1 \\ldots 6}) =\n",
    "   \\frac{1}{Z} \\phi_1[w_1, w_2, w_4] \\phi_2[w_1, w_3, w_4]\n",
    "     \\phi_3[w_2, w_4, w_5] \\phi_4[w_5, w_6]\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (w_1) -[A]-\\\\\n",
    "   (w_2) -[A]-\\\\\n",
    "   (w_4) -[A]-\\\\\n",
    "   (w_1) -[B]-\\\\\n",
    "   (w_3) -[B]-\\\\\n",
    "   (w_4) -[B]-\\\\\n",
    "   (w_2) -[C]-\\\\\n",
    "   (w_4) -[C]-\\\\\n",
    "   (w_5) -[C]-\\\\\n",
    "   (w_5) -[D]- (w_6)\n",
    "\n",
    "(c)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_{1 \\ldots 6}) =\n",
    "   Pr(w_2) Pr(w_6) Pr(w_1 | w_2) Pr(w_5 | w_2, w_6)\n",
    "     Pr(w_4 | w_1, w_5) Pr(w_3 | w_4)\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (w_2) -[A]-\\\\\n",
    "   (w_6) -[B]-\\\\\n",
    "   (w_1) -[C]- (w_2)\\\\\n",
    "   (w_5) -[D]-\\\\\n",
    "   (w_2) -[D]-\\\\\n",
    "   (w_6) -[D]-\\\\\n",
    "   (w_4) -[E]-\\\\\n",
    "   (w_1) -[E]-\\\\\n",
    "   (w_5) -[E]-\\\\\n",
    "   (w_3) -[F]- (w_4)\n",
    "\n",
    "(d)\n",
    "---\n",
    "\n",
    ".. math::\n",
    "\n",
    "   Pr(w_{1 \\ldots 6}) =\n",
    "   \\frac{1}{Z} \\phi_1[w_1, w_3, w_4] \\phi_2[w_2, w_4, w_5] \\phi_3[w_5, w_6]\n",
    "\n",
    ".. math::\n",
    "\n",
    "   (w_1) -[A]-\\\\\n",
    "   (w_2) -[A]-\\\\\n",
    "   (w_4) -[A]-\\\\\n",
    "   (w_2) -[B]-\\\\\n",
    "   (w_4) -[B]-\\\\\n",
    "   (w_5) -[B]-\\\\\n",
    "   (w_5) -[C]- (w_6)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.10\n",
    "==============\n",
    "\n",
    "Notice that each variable :math:`w_i` is connected to :math:`w_{i + 1}` and\n",
    ":math:`w_{i + 2}`.  To reduce the chain model to one dependent predecessor,\n",
    "define :math:`w'_j = \\{ w_{2j - 1}, w_{2j} \\}` for all\n",
    ":math:`j = [1, 2, \\ldots, N / 2]`.\n",
    "\n",
    "According to :ref:`Exercise 11.2 <prince2012computer-ex-11.2>`, a HMM with\n",
    ":math:`N` variables each taking :math:`K` values has a time complexity of\n",
    ":math:`\\mathcal{O}\\left( N K^2 \\right)` using the Viterbi algorithm.\n",
    "\n",
    "The combined chain model will have :math:`N / 2` variables each taking\n",
    ":math:`K^2` values hence the overall time complexity is\n",
    ":math:`\\mathcal{O}\\left( N K^4 \\right)`."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.11\n",
    "==============\n",
    "\n",
    "Instead of organizing each scanline into a chain model, a :math:`n \\times n`\n",
    "grid model could be imposed over each pixel in the scanline of each image."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Exercise 11.12\n",
    "==============\n",
    "\n",
    "Run belief propagation on the factor graph :cite:`weiss2000correctness`."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. rubric:: References\n",
    "\n",
    ".. bibliography:: chapter-11.bib"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}