{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cee75ce4",
   "metadata": {},
   "source": [
    "# Etymology of Entropy\n",
    "\n",
    "This lecture describes and compares several notions of entropy.\n",
    "\n",
    "Among the senses of entropy, we’ll encounter these\n",
    "\n",
    "- A measure of **uncertainty** of a random variable advanced by Claude Shannon [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)]  \n",
    "- A key object governing thermodynamics  \n",
    "- Kullback and Leibler’s measure of the statistical divergence between two probability distributions  \n",
    "- A measure of the volatility of stochastic discount factors that appear in asset pricing theory  \n",
    "- Measures of unpredictability that occur in classical Wiener-Kolmogorov linear prediction theory  \n",
    "- A frequency domain criterion for constructing robust decision rules  \n",
    "\n",
    "\n",
    "The concept of entropy plays an important role in robust control formulations described in this lecture\n",
    "[Risk and Model Uncertainty](https://python-advanced.quantecon.org/five_preferences.html) and in this lecture\n",
    "[Robustness](https://python-advanced.quantecon.org/robustness.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4321433a",
   "metadata": {},
   "source": [
    "## Information Theory\n",
    "\n",
    "In information theory [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)],  entropy is a measure of the unpredictability of a random variable.\n",
    "\n",
    "To illustrate\n",
    "things, let $ X $ be a discrete random variable taking values $ x_1, \\ldots, x_n $\n",
    "with probabilities $ p_i = \\textrm{Prob}(X = x_i) \\geq 0, \\sum_i p_i =1 $.\n",
    "\n",
    "Claude Shannon’s [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)] definition  of entropy is\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon1'></a>\n",
    "$$\n",
    "H(p) = \\sum_i p_i \\log_b (p_i^{-1}) = - \\sum_i p_i \\log_b (p_i) . \\tag{26.1}\n",
    "$$\n",
    "\n",
    "where $ \\log_b $ denotes the log function with base $ b $.\n",
    "\n",
    "Inspired by the limit\n",
    "\n",
    "$$\n",
    "\\lim_{p\\downarrow 0} p \\log p = \\lim_{p \\downarrow 0} \\frac{\\log p}{p^{-1}} = \\lim_{p \\downarrow 0}p = 0,\n",
    "$$\n",
    "\n",
    "we set $ p \\log p = 0 $ in equation [(26.1)](#equation-eq-shannon1).\n",
    "\n",
    "Typical bases for the logarithm are $ 2 $,  $ e $, and $ 10 $.\n",
    "\n",
    "In the information theory literature, logarithms of base $ 2 $, $ e $, and $ 10 $ are associated with units of information\n",
    "called bits, nats, and dits, respectively.\n",
    "\n",
    "Shannon typically used base $ 2 $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8cd1ea7",
   "metadata": {},
   "source": [
    "## A Measure of Unpredictability\n",
    "\n",
    "For a discrete random variable $ X $ with probability density $ p = \\{p_i\\}_{i=1}^n $,   the **surprisal**\n",
    "for state $ i $ is  $ s_i = \\log\\left(\\frac{1}{p_i}\\right) $.\n",
    "\n",
    "The quantity $ \\log\\left(\\frac{1}{p_i}\\right) $ is called the **surprisal** because it is inversely related to the likelihood that state\n",
    "$ i $ will occur.\n",
    "\n",
    "Note that entropy $ H(p) $ equals the **expected surprisal**\n",
    "\n",
    "$$\n",
    "H(p) = \\sum_i p_i s_i = \\sum_i p_i \\log\\left(\\frac{1}{p_i} \\right) = -  \\sum_i p_i \\log\\left(p_i \\right).\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe94dc24",
   "metadata": {},
   "source": [
    "### Example\n",
    "\n",
    "Take a possibly unfair coin, so $ X = \\{0,1\\} $ with $ p = {\\rm Prob}(X=1) = p \\in [0,1] $.\n",
    "\n",
    "Then\n",
    "\n",
    "$$\n",
    "H(p) = -(1-p)\\log (1-p) - p \\log p.\n",
    "$$\n",
    "\n",
    "Evidently,\n",
    "\n",
    "$$\n",
    "H'(p) = \\log(1-p) - \\log p = 0\n",
    "$$\n",
    "\n",
    "at $ p=.5 $ and $ H''(p) = -\\frac{1}{1-p} -\\frac{1}{p} < 0 $ for $ p\\in (0,1) $.\n",
    "\n",
    "So $ p=.5 $ maximizes entropy, while entropy is minimized at $ p=0 $ and $ p=1 $.\n",
    "\n",
    "Thus, among all coins,  a fair coin is the most unpredictable.\n",
    "\n",
    "See Fig. 26.1\n",
    "\n",
    "![MyGraph5.png](MyGraph5.png)\n",
    "\n",
    "Entropy as a function of $ \\hat \\pi_1 $ when $ \\pi_1 = .5 $.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75cc50d3",
   "metadata": {},
   "source": [
    "### Example\n",
    "\n",
    "Take an $ n $-sided possibly unfair die with  a probability distribution $ \\{p_i\\}_{i=1}^n $.\n",
    "The die is fair if $ p_i = \\frac{1}{n} \\forall i $.\n",
    "\n",
    "Among all dies, a fair die  maximizes entropy.\n",
    "\n",
    "For a fair die,\n",
    "entropy equals $ H(p) = - n^{-1} \\sum_i \\log \\left( \\frac{1}{n} \\right) = \\log(n) $.\n",
    "\n",
    "To specify the expected number of bits needed to isolate the outcome of one roll of a fair $ n $-sided die requires $ \\log_2 (n) $ bits of information.\n",
    "\n",
    "For example,\n",
    "if $ n=2 $, $ \\log_2(2) =1 $.\n",
    "\n",
    "For $ n=3 $, $ \\log_2(3) = 1.585 $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c46418a",
   "metadata": {},
   "source": [
    "## Mathematical Properties of Entropy\n",
    "\n",
    "For a discrete random variable with probability vector $ p $, entropy $ H(p) $ is\n",
    "a function that satisfies\n",
    "\n",
    "- $ H $ is *continuous*.  \n",
    "- $ H $ is *symmetric*: $ H(p_1, p_2, \\ldots, p_n) = H(p_{r_1}, \\ldots, p_{r_n}) $ for any permutation $ r_1, \\ldots, r_n $ of $ 1,\\ldots, n $.  \n",
    "- A uniform distribution maximizes $ H(p) $: $ H(p_1, \\ldots, p_n) \\leq H(\\frac{1}{n}, \\ldots, \\frac{1}{n}) . $  \n",
    "- Maximum entropy increases with the number of states:\n",
    "  $ H(\\frac{1}{n}, \\ldots, \\frac{1}{n} ) \\leq H(\\frac{1}{n+1} , \\ldots, \\frac{1}{n+1}) $.  \n",
    "- Entropy is not affected by events zero probability.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79780f5c",
   "metadata": {},
   "source": [
    "## Conditional Entropy\n",
    "\n",
    "Let $ (X,Y) $ be a bivariate discrete random vector with  outcomes $ x_1, \\ldots, x_n $ and $ y_1, \\ldots, y_m $, respectively,\n",
    "occurring with probability density $ p(x_i, y_i) $.\n",
    "\n",
    "Conditional entropy $ H(X| Y) $ is\n",
    "defined as\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon2'></a>\n",
    "$$\n",
    "H(X | Y) = \\sum_{i,j} p(x_i,y_j) \\log \\frac{p(y_j)}{p(x_i,y_j)}. \\tag{26.2}\n",
    "$$\n",
    "\n",
    "Here $ \\frac{p(y_j)}{p(x_i,y_j)} $, the reciprocal of the conditional probability of $ x_i $ given $ y_j $, can be defined as the **conditional surprisal**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9460d00",
   "metadata": {},
   "source": [
    "## Independence as Maximum Conditional Entropy\n",
    "\n",
    "Let $ m=n $ and $ [x_1, \\ldots, x_n ] = [y_1, \\ldots, y_n] $.\n",
    "\n",
    "Let $ \\sum_j p(x_i,y_j) = \\sum_j p(x_j, y_i) $ for all $ i $,\n",
    "so that the marginal distributions of $ x $ and $ y $ are identical.\n",
    "\n",
    "Thus, $ x $ and $ y $ are identically distributed, but they\n",
    "are not necessarily independent.\n",
    "\n",
    "Consider the following problem:\n",
    "choose a joint distribution  $ p(x_i,y_j) $ to maximize  conditional entropy\n",
    "[(26.2)](#equation-eq-shannon2) subject to the restriction that  $ x $ and $ y $ are identically distributed.\n",
    "\n",
    "The conditional-entropy-maximizing  $ p(x_i,y_j) $ sets\n",
    "\n",
    "$$\n",
    "\\frac{p(x_i,y_j)}{p(y_j)} = \\sum_j p(x_i, y_j) = p(x_i)  \\forall i .\n",
    "$$\n",
    "\n",
    "Thus, among all joint distributions with identical marginal distributions,\n",
    "the conditional entropy maximizing joint distribution makes $ x $ and $ y $ be\n",
    "independent."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "573248e7",
   "metadata": {},
   "source": [
    "## Thermodynamics\n",
    "\n",
    "Josiah Willard Gibbs (see [https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs](https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs)) defined entropy as\n",
    "\n",
    "\n",
    "<a id='equation-eq-gibbs'></a>\n",
    "$$\n",
    "S = - k_B \\sum_i p_i  \\log p_i \\tag{26.3}\n",
    "$$\n",
    "\n",
    "where $ p_i $ is the probability of a micro state and $ k_B $ is Boltzmann’s constant.\n",
    "\n",
    "- The Boltzmann constant $ k_b $ relates energy at the micro  particle level with the temperature observed at the macro level. It equals what is called a gas constant  divided by an Avogadro constant.  \n",
    "\n",
    "\n",
    "The second law of thermodynamics states that the entropy of a closed physical system increases until $ S $ defined in [(26.3)](#equation-eq-gibbs) attains a maximum."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "783d211e",
   "metadata": {},
   "source": [
    "## Statistical  Divergence\n",
    "\n",
    "Let $ X $ be a discrete state space $ x_1, \\ldots, x_n $ and let $ p $ and $ q $ be  two discrete probability\n",
    "distributions on $ X $.\n",
    "\n",
    "Assume that $ \\frac{p_i}{q_t} \\in (0,\\infty) $ for all $ i $ for which $ p_i >0 $.\n",
    "\n",
    "Then the Kullback-Leibler statistical divergence, also called **relative entropy**,\n",
    "is defined as\n",
    "\n",
    "\n",
    "<a id='equation-eq-shanno3'></a>\n",
    "$$\n",
    "D(p|q) = \\sum_i p_i \\log \\left(\\frac{p_i}{q_i}\\right) = \\sum_i q_i \\left( \\frac{p_i}{q_i}\\right) \\log\\left( \\frac{p_i}{q_i}\\right) . \\tag{26.4}\n",
    "$$\n",
    "\n",
    "Evidently,\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "D(p|q) & = - \\sum_i p_i \\log q_i + \\sum_i p_i \\log p_i \\cr\n",
    "  & =  H(p,q) - H(p)   ,\n",
    "  \\end{aligned}\n",
    "$$\n",
    "\n",
    "where $ H(p,q) = \\sum_i p_i \\log   q_i $ is the cross-entropy.\n",
    "\n",
    "It is easy to verify, as we have done above, that $ D(p|q) \\geq 0 $ and that $ D(p|q) = 0 $ implies that $ p_i = q_i $ when $ q_i >0 $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebc3331d",
   "metadata": {},
   "source": [
    "## Continuous distributions\n",
    "\n",
    "For a continuous random variable, Kullback-Leibler divergence between two densities $ p $ and $ q $ is defined as\n",
    "\n",
    "$$\n",
    "D(p|q) = \\int p(x) \\log \\left(\\frac{p(x)}{q(x)} \\right) d \\, x .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10ecb73e",
   "metadata": {},
   "source": [
    "## Relative entropy and Gaussian distributions\n",
    "\n",
    "We want to compute relative entropy for two continuous densities $ \\phi $ and $ \\hat \\phi $ when\n",
    "$ \\phi $ is $ {\\cal N}(0,I) $ and  $ {\\hat \\phi} $ is $ {\\cal N}(w, \\Sigma) $, where the covariance matrix $ \\Sigma $ is nonsingular.\n",
    "\n",
    "We seek a formula for\n",
    "\n",
    "$$\n",
    "\\textrm{ent} = \\int (\\log {\\hat \\phi(\\varepsilon)} - \\log \\phi(\\varepsilon) ){\\hat \\phi(\\varepsilon)} d \\varepsilon.\n",
    "$$\n",
    "\n",
    "**Claim**\n",
    "\n",
    "\n",
    "<a id='equation-eq-relentropy101'></a>\n",
    "$$\n",
    "\\textrm{ent} = %\\int (\\log {\\hat \\phi} - \\log \\phi ){\\hat \\phi} d \\varepsilon=\n",
    "-{1 \\over 2} \\log\n",
    "\\det \\Sigma + {1 \\over 2}w'w + {1 \\over 2}\\mathrm{trace} (\\Sigma - I)\n",
    ". \\tag{26.5}\n",
    "$$\n",
    "\n",
    "**Proof**\n",
    "\n",
    "The log likelihood ratio is\n",
    "\n",
    "\n",
    "<a id='equation-footnote2'></a>\n",
    "$$\n",
    "\\log {\\hat \\phi}(\\varepsilon) - \\log \\phi(\\varepsilon) =\n",
    "{1 \\over 2} \\left[ - (\\varepsilon - w)' \\Sigma^{-1} (\\varepsilon - w)\n",
    "    +  \\varepsilon' \\varepsilon - \\log \\det\n",
    "    \\Sigma\\right] . \\tag{26.6}\n",
    "$$\n",
    "\n",
    "Observe that\n",
    "\n",
    "$$\n",
    "- \\int {1 \\over 2} (\\varepsilon - w)' \\Sigma^{-1} (\\varepsilon -\n",
    "w) {\\hat \\phi}(\\varepsilon) d\\varepsilon = - {1 \\over 2}\\mathrm{trace}(I).\n",
    "$$\n",
    "\n",
    "Applying the identity $ \\varepsilon = w + (\\varepsilon - w) $ gives\n",
    "\n",
    "$$\n",
    "{1\\over 2}\\varepsilon' \\varepsilon = {1 \\over 2}w' w + {1 \\over 2}\n",
    "(\\varepsilon - w)' (\\varepsilon - w) +  w' (\\varepsilon - w).\n",
    "$$\n",
    "\n",
    "Taking mathematical expectations\n",
    "\n",
    "$$\n",
    "{1 \\over 2} \\int \\varepsilon' \\varepsilon {\\hat \\phi}(\\varepsilon) d\n",
    "\\varepsilon = {1\\over 2} w'w + {1 \\over 2} \\mathrm{trace}(\\Sigma).\n",
    "$$\n",
    "\n",
    "Combining terms gives\n",
    "\n",
    "\n",
    "<a id='equation-eq-relentropy'></a>\n",
    "$$\n",
    "\\textrm{ent} = \\int (\\log {\\hat \\phi} - \\log \\phi ){\\hat \\phi} d \\varepsilon= -{1 \\over 2} \\log\n",
    "\\det \\Sigma + {1 \\over 2}w'w + {1 \\over 2}\\mathrm{trace} (\\Sigma - I)\n",
    ". \\tag{26.7}\n",
    "$$\n",
    "\n",
    "which agrees with equation [(26.5)](#equation-eq-relentropy101).\n",
    "Notice the separate  appearances of the mean distortion $ w $ and the covariance distortion\n",
    "$ \\Sigma - I $ in equation [(26.7)](#equation-eq-relentropy).\n",
    "\n",
    "**Extension**\n",
    "\n",
    "Let  $ N_0 = {\\mathcal N}(\\mu_0,\\Sigma_0) $ and $ N_1={\\mathcal N}(\\mu_1, \\Sigma_1) $ be two multivariate Gaussian\n",
    "distributions.\n",
    "\n",
    "Then\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon5'></a>\n",
    "$$\n",
    "D(N_0|N_1) = \\frac{1}{2} \\left(\\mathrm {trace} (\\Sigma_1^{-1} \\Sigma_0)\n",
    "+ (\\mu_1 -\\mu_0)' \\Sigma_1^{-1} (\\mu_1 - \\mu_0) - \\log\\left( \\frac{ \\mathrm {det }\\Sigma_0 }{\\mathrm {det}\\Sigma_1}\\right)\n",
    "   - k \\right). \\tag{26.8}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd21c8bb",
   "metadata": {},
   "source": [
    "## Von Neumann Entropy\n",
    "\n",
    "Let $ P $ and $ Q $ be two positive-definite symmetric matrices.\n",
    "\n",
    "A measure of the divergence between two $ P $ and $ Q $ is\n",
    "\n",
    "$$\n",
    "D(P|Q)= \\textrm{trace} ( P \\ln P - P \\ln Q - P + Q)\n",
    "$$\n",
    "\n",
    "where the log of a matrix is defined here  ([https://en.wikipedia.org/wiki/Logarithm_of_a_matrix](https://en.wikipedia.org/wiki/Logarithm_of_a_matrix)).\n",
    "\n",
    "A density matrix $ P $ from quantum mechanics is a positive definite matrix with trace $ 1 $.\n",
    "\n",
    "The von Neumann entropy of a density matrix $ P $ is\n",
    "\n",
    "$$\n",
    "S = - \\textrm{trace} (P \\ln P)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "069913e9",
   "metadata": {},
   "source": [
    "## Backus-Chernov-Zin Entropy\n",
    "\n",
    "After flipping signs, [[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)]  use Kullback-Leibler relative entropy as a measure of volatility of stochastic discount factors that they\n",
    "assert is useful for characterizing features of both the data and various theoretical models of stochastic discount factors.\n",
    "\n",
    "Where $ p_{t+1} $ is the physical or true measure, $ p_{t+1}^* $ is the risk-neutral measure, and $ E_t $ denotes conditional\n",
    "expectation under the $ p_{t+1} $ measure, [[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)] define entropy as\n",
    "\n",
    "\n",
    "<a id='equation-eq-bcz1'></a>\n",
    "$$\n",
    "L_t (p_{t+1}^*/p_{t+1}) = - E_t \\log( p_{t+1}^*/p_{t+1}). \\tag{26.9}\n",
    "$$\n",
    "\n",
    "Evidently, by virtue of the minus sign in equation [(26.9)](#equation-eq-bcz1),\n",
    "\n",
    "\n",
    "<a id='equation-eq-bcz2'></a>\n",
    "$$\n",
    "L_t (p_{t+1}^*/p_{t+1})  = D_{KL,t}( p_{t+1}^*|p_{t+1}), \\tag{26.10}\n",
    "$$\n",
    "\n",
    "where $ D_{KL,t} $ denotes conditional relative entropy.\n",
    "\n",
    "Let $ m_{t+1} $ be a stochastic discount factor, $ r_{t+1} $ a gross one-period return on a risky\n",
    "security, and $ (r_{t+1}^1)^{-1}\\equiv q_t^1 = E_t m_{t+1} $ be the reciprocal of a risk-free one-period gross rate of return.\n",
    "Then\n",
    "\n",
    "$$\n",
    "E_t (m_{t+1} r_{t+1}) = 1\n",
    "$$\n",
    "\n",
    "[[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)] note that a stochastic discount factor satisfies\n",
    "\n",
    "$$\n",
    "m_{t+1} = q_t^1 p_{t+1}^*/p_{t+1} .\n",
    "$$\n",
    "\n",
    "They derive the following **entropy bound**\n",
    "\n",
    "$$\n",
    "E L_t (m_{t+1}) \\geq E (\\log r_{t+1} - \\log r_{t+1}^1 )\n",
    "$$\n",
    "\n",
    "which they propose as a complement to a Hansen-Jagannathan [[Hansen and Jagannathan, 1991](https://python-advanced.quantecon.org/zreferences.html#id23)] bound."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69a98e36",
   "metadata": {},
   "source": [
    "## Wiener-Kolmogorov Prediction Error Formula as Entropy\n",
    "\n",
    "Let $ \\{x_t\\}_{t=-\\infty}^\\infty $ be a covariance stationary stochastic process with\n",
    "mean zero and spectral density $ S_x(\\omega) $.\n",
    "\n",
    "The variance of $ x $ is\n",
    "\n",
    "$$\n",
    "\\sigma_x^2 =\\left( \\frac{1}{2\\pi}\\right) \\int_{-\\pi}^\\pi  S_x (\\omega) d \\omega .\n",
    "$$\n",
    "\n",
    "As described  in chapter XIV of  [[Sargent, 1987](https://python-advanced.quantecon.org/zreferences.html#id200)], the Wiener-Kolmogorov formula for the one-period ahead prediction error is\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon6'></a>\n",
    "$$\n",
    "\\sigma_\\epsilon^2 = \\exp\\left[\\left( \\frac{1}{2\\pi}\\right) \\int_{-\\pi}^\\pi \\log S_x (\\omega) d \\omega \\right]. \\tag{26.11}\n",
    "$$\n",
    "\n",
    "Occasionally the logarithm of  the one-step-ahead prediction error $ \\sigma_\\epsilon^2 $\n",
    "is called entropy because it measures unpredictability.\n",
    "\n",
    "Consider the following problem  reminiscent of one  described earlier.\n",
    "\n",
    "**Problem:**\n",
    "\n",
    "Among all covariance stationary univariate processes with unconditional variance $ \\sigma_x^2 $, find a process with maximal\n",
    "one-step-ahead prediction error.\n",
    "\n",
    "The maximizer  is  a process with spectral density\n",
    "\n",
    "$$\n",
    "S_x(\\omega) = 2 \\pi \\sigma_x^2.\n",
    "$$\n",
    "\n",
    "Thus,  among\n",
    "all univariate covariance stationary processes with variance $ \\sigma_x^2 $, a process with a flat spectral density is the most uncertain, in the sense of one-step-ahead prediction error variance.\n",
    "\n",
    "This no-patterns-across-time outcome for a temporally dependent process resembles the no-pattern-across-states outcome for the static entropy maximizing coin or die  in the classic information theoretic\n",
    "analysis described above."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b7eb5d0",
   "metadata": {},
   "source": [
    "## Multivariate Processes\n",
    "\n",
    "Let $ y_t $ be an $ n \\times 1 $ covariance stationary stochastic process with mean $ 0 $ with\n",
    "matrix covariogram $ C_y(j) = E y_t y_{t-j}' $ and spectral density matrix\n",
    "\n",
    "$$\n",
    "S_y(\\omega) = \\sum_{j=-\\infty}^\\infty e^{- i \\omega j} C_y(j), \\quad \\omega \\in [-\\pi, \\pi].\n",
    "$$\n",
    "\n",
    "Let\n",
    "\n",
    "$$\n",
    "y_t = D(L) \\epsilon_t  \\equiv \\sum_{j=0}^\\infty D_j \\epsilon_t\n",
    "$$\n",
    "\n",
    "be a Wold representation for $ y $, where $ D(0)\\epsilon_t $ is a\n",
    "vector of one-step-ahead errors in predicting $ y_t $ conditional on the infinite history $ y^{t-1} = [y_{t-1}, y_{t-2}, \\ldots ] $ and\n",
    "$ \\epsilon_t $ is an $ n\\times 1 $ vector of serially uncorrelated random disturbances with mean zero and identity contemporaneous\n",
    "covariance matrix $ E \\epsilon_t \\epsilon_t' = I $.\n",
    "\n",
    "Linear-least-squares predictors have one-step-ahead prediction error $ D(0)  D(0)' $ that satisfies\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon22'></a>\n",
    "$$\n",
    "\\log \\det [D(0) D(0)'] = \\left(\\frac{1}{2 \\pi} \\right) \\int_{-\\pi}^\\pi \\log \\det [S_y(\\omega)] d \\omega. \\tag{26.12}\n",
    "$$\n",
    "\n",
    "Being a  measure of the unpredictability of an $ n \\times 1 $ vector covariance stationary  stochastic process,\n",
    "the left side of  [(26.12)](#equation-eq-shannon22)  is sometimes called entropy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c51a4c55",
   "metadata": {},
   "source": [
    "## Frequency Domain Robust Control\n",
    "\n",
    "Chapter 8 of [[Hansen and Sargent, 2008](https://python-advanced.quantecon.org/zreferences.html#id108)]  adapts work in the control theory literature to define a\n",
    "**frequency domain entropy** criterion for  robust control as\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon21'></a>\n",
    "$$\n",
    "\\int_\\Gamma \\log \\det [ \\theta I - G_F(\\zeta)' G_F(\\zeta) ] d \\lambda(\\zeta) , \\tag{26.13}\n",
    "$$\n",
    "\n",
    "where $ \\theta \\in (\\underline \\theta, +\\infty) $ is a positive robustness parameter and $ G_F(\\zeta) $ is a $ \\zeta $-transform of the\n",
    "objective function.\n",
    "\n",
    "Hansen and Sargent [[Hansen and Sargent, 2008](https://python-advanced.quantecon.org/zreferences.html#id108)] show that criterion [(26.13)](#equation-eq-shannon21)  can be represented as\n",
    "\n",
    "\n",
    "<a id='equation-eq-shannon220'></a>\n",
    "$$\n",
    "\\log \\det [ D(0)' D(0)] = \\int_\\Gamma \\log \\det [ \\theta I - G_F(\\zeta)' G_F(\\zeta) ] d \\lambda(\\zeta) , \\tag{26.14}\n",
    "$$\n",
    "\n",
    "for an appropriate covariance stationary stochastic process derived from $ \\theta, G_F(\\zeta) $.\n",
    "\n",
    "This explains the\n",
    "moniker **maximum entropy** robust control for decision rules $ F $ designed to maximize  criterion [(26.13)](#equation-eq-shannon21)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a52ba3a",
   "metadata": {},
   "source": [
    "## Relative Entropy for a Continuous Random Variable\n",
    "\n",
    "Let $ x $ be  a continuous random variable with density $ \\phi(x) $, and let $ g(x) $ be a nonnegative random variable satisfying $ \\int g(x) \\phi(x) dx =1 $.\n",
    "\n",
    "The relative entropy of the distorted density $ \\hat \\phi(x) = g(x) \\phi(x) $  is defined\n",
    "as\n",
    "\n",
    "$$\n",
    "\\textrm{ent}(g) = \\int g(x) \\log g(x) \\phi(x) d x .\n",
    "$$\n",
    "\n",
    "Fig. 26.2 plots the functions $ g \\log g $ and $ g -1 $\n",
    "over the interval $ g \\geq 0 $.\n",
    "\n",
    "That relative entropy $ \\textrm{ent}(g) \\geq 0 $ can be established by noting (a) that  $ g \\log g \\geq g-1 $ (see  Fig. 26.2)\n",
    "and (b) that under $ \\phi $, $ E g =1 $.\n",
    "\n",
    "Fig. 26.3 and Fig. 26.4 display aspects of relative entropy visually for a continuous random variable $ x $ for\n",
    "two densities with likelihood ratio $ g \\geq 0 $.\n",
    "\n",
    "Where the numerator density is $ {\\mathcal N}(0,1) $, for two denominator  Gaussian densities $ {\\mathcal N}(0,1.5) $ and $ {\\mathcal N}(0,.95) $, respectively, Fig. 26.3 and Fig. 26.4  display the functions  $ g \\log g $ and $ g -1 $ as functions of $ x $.\n",
    "\n",
    "![entropy_glogg.png](entropy_glogg.png)\n",
    "\n",
    "The function $ g \\log g $ for $ g \\geq 0 $. For a random variable $ g $ with $ E g =1 $, $ E g \\log g \\geq 0 $.  \n",
    "![entropy_1_over_15.jpg](entropy_1_over_15.jpg)\n",
    "\n",
    "Graphs of $ g \\log g $ and $ g-1 $ where  $ g $ is the ratio of the density of a $ {\\mathcal N}(0,1) $ random variable to the density of a $ {\\mathcal N}(0,1.5) $ random variable.\n",
    "Under the $ {\\mathcal N}(0,1.5) $ density, $ E g =1 $.  \n",
    "![entropy_1_over_95.png](entropy_1_over_95.png)\n",
    "\n",
    "$ g \\log g $ and $ g-1 $ where  $ g $ is the ratio of the density of a $ {\\mathcal N}(0,1) $ random variable to the density of a $ {\\mathcal N}(0,1.5) $ random variable.\n",
    "Under the $ {\\mathcal N}(0,1.5) $ density, $ E g =1 $.  "
   ]
  }
 ],
 "metadata": {
  "date": 1754356844.5925684,
  "filename": "entropy.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Etymology of Entropy"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}