{ "cells": [ { "cell_type": "markdown", "id": "cee75ce4", "metadata": {}, "source": [ "# Etymology of Entropy\n", "\n", "This lecture describes and compares several notions of entropy.\n", "\n", "Among the senses of entropy, we’ll encounter these\n", "\n", "- A measure of **uncertainty** of a random variable advanced by Claude Shannon [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)] \n", "- A key object governing thermodynamics \n", "- Kullback and Leibler’s measure of the statistical divergence between two probability distributions \n", "- A measure of the volatility of stochastic discount factors that appear in asset pricing theory \n", "- Measures of unpredictability that occur in classical Wiener-Kolmogorov linear prediction theory \n", "- A frequency domain criterion for constructing robust decision rules \n", "\n", "\n", "The concept of entropy plays an important role in robust control formulations described in this lecture\n", "[Risk and Model Uncertainty](https://python-advanced.quantecon.org/five_preferences.html) and in this lecture\n", "[Robustness](https://python-advanced.quantecon.org/robustness.html)." ] }, { "cell_type": "markdown", "id": "4321433a", "metadata": {}, "source": [ "## Information Theory\n", "\n", "In information theory [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)], entropy is a measure of the unpredictability of a random variable.\n", "\n", "To illustrate\n", "things, let $ X $ be a discrete random variable taking values $ x_1, \\ldots, x_n $\n", "with probabilities $ p_i = \\textrm{Prob}(X = x_i) \\geq 0, \\sum_i p_i =1 $.\n", "\n", "Claude Shannon’s [[Shannon and Weaver, 1949](https://python-advanced.quantecon.org/zreferences.html#id6)] definition of entropy is\n", "\n", "\n", "<a id='equation-eq-shannon1'></a>\n", "$$\n", "H(p) = \\sum_i p_i \\log_b (p_i^{-1}) = - \\sum_i p_i \\log_b (p_i) . \\tag{26.1}\n", "$$\n", "\n", "where $ \\log_b $ denotes the log function with base $ b $.\n", "\n", "Inspired by the limit\n", "\n", "$$\n", "\\lim_{p\\downarrow 0} p \\log p = \\lim_{p \\downarrow 0} \\frac{\\log p}{p^{-1}} = \\lim_{p \\downarrow 0}p = 0,\n", "$$\n", "\n", "we set $ p \\log p = 0 $ in equation [(26.1)](#equation-eq-shannon1).\n", "\n", "Typical bases for the logarithm are $ 2 $, $ e $, and $ 10 $.\n", "\n", "In the information theory literature, logarithms of base $ 2 $, $ e $, and $ 10 $ are associated with units of information\n", "called bits, nats, and dits, respectively.\n", "\n", "Shannon typically used base $ 2 $." ] }, { "cell_type": "markdown", "id": "f8cd1ea7", "metadata": {}, "source": [ "## A Measure of Unpredictability\n", "\n", "For a discrete random variable $ X $ with probability density $ p = \\{p_i\\}_{i=1}^n $, the **surprisal**\n", "for state $ i $ is $ s_i = \\log\\left(\\frac{1}{p_i}\\right) $.\n", "\n", "The quantity $ \\log\\left(\\frac{1}{p_i}\\right) $ is called the **surprisal** because it is inversely related to the likelihood that state\n", "$ i $ will occur.\n", "\n", "Note that entropy $ H(p) $ equals the **expected surprisal**\n", "\n", "$$\n", "H(p) = \\sum_i p_i s_i = \\sum_i p_i \\log\\left(\\frac{1}{p_i} \\right) = - \\sum_i p_i \\log\\left(p_i \\right).\n", "$$" ] }, { "cell_type": "markdown", "id": "fe94dc24", "metadata": {}, "source": [ "### Example\n", "\n", "Take a possibly unfair coin, so $ X = \\{0,1\\} $ with $ p = {\\rm Prob}(X=1) = p \\in [0,1] $.\n", "\n", "Then\n", "\n", "$$\n", "H(p) = -(1-p)\\log (1-p) - p \\log p.\n", "$$\n", "\n", "Evidently,\n", "\n", "$$\n", "H'(p) = \\log(1-p) - \\log p = 0\n", "$$\n", "\n", "at $ p=.5 $ and $ H''(p) = -\\frac{1}{1-p} -\\frac{1}{p} < 0 $ for $ p\\in (0,1) $.\n", "\n", "So $ p=.5 $ maximizes entropy, while entropy is minimized at $ p=0 $ and $ p=1 $.\n", "\n", "Thus, among all coins, a fair coin is the most unpredictable.\n", "\n", "See Fig. 26.1\n", "\n", "\n", "\n", "Entropy as a function of $ \\hat \\pi_1 $ when $ \\pi_1 = .5 $. " ] }, { "cell_type": "markdown", "id": "75cc50d3", "metadata": {}, "source": [ "### Example\n", "\n", "Take an $ n $-sided possibly unfair die with a probability distribution $ \\{p_i\\}_{i=1}^n $.\n", "The die is fair if $ p_i = \\frac{1}{n} \\forall i $.\n", "\n", "Among all dies, a fair die maximizes entropy.\n", "\n", "For a fair die,\n", "entropy equals $ H(p) = - n^{-1} \\sum_i \\log \\left( \\frac{1}{n} \\right) = \\log(n) $.\n", "\n", "To specify the expected number of bits needed to isolate the outcome of one roll of a fair $ n $-sided die requires $ \\log_2 (n) $ bits of information.\n", "\n", "For example,\n", "if $ n=2 $, $ \\log_2(2) =1 $.\n", "\n", "For $ n=3 $, $ \\log_2(3) = 1.585 $." ] }, { "cell_type": "markdown", "id": "3c46418a", "metadata": {}, "source": [ "## Mathematical Properties of Entropy\n", "\n", "For a discrete random variable with probability vector $ p $, entropy $ H(p) $ is\n", "a function that satisfies\n", "\n", "- $ H $ is *continuous*. \n", "- $ H $ is *symmetric*: $ H(p_1, p_2, \\ldots, p_n) = H(p_{r_1}, \\ldots, p_{r_n}) $ for any permutation $ r_1, \\ldots, r_n $ of $ 1,\\ldots, n $. \n", "- A uniform distribution maximizes $ H(p) $: $ H(p_1, \\ldots, p_n) \\leq H(\\frac{1}{n}, \\ldots, \\frac{1}{n}) . $ \n", "- Maximum entropy increases with the number of states:\n", " $ H(\\frac{1}{n}, \\ldots, \\frac{1}{n} ) \\leq H(\\frac{1}{n+1} , \\ldots, \\frac{1}{n+1}) $. \n", "- Entropy is not affected by events zero probability. " ] }, { "cell_type": "markdown", "id": "79780f5c", "metadata": {}, "source": [ "## Conditional Entropy\n", "\n", "Let $ (X,Y) $ be a bivariate discrete random vector with outcomes $ x_1, \\ldots, x_n $ and $ y_1, \\ldots, y_m $, respectively,\n", "occurring with probability density $ p(x_i, y_i) $.\n", "\n", "Conditional entropy $ H(X| Y) $ is\n", "defined as\n", "\n", "\n", "<a id='equation-eq-shannon2'></a>\n", "$$\n", "H(X | Y) = \\sum_{i,j} p(x_i,y_j) \\log \\frac{p(y_j)}{p(x_i,y_j)}. \\tag{26.2}\n", "$$\n", "\n", "Here $ \\frac{p(y_j)}{p(x_i,y_j)} $, the reciprocal of the conditional probability of $ x_i $ given $ y_j $, can be defined as the **conditional surprisal**." ] }, { "cell_type": "markdown", "id": "d9460d00", "metadata": {}, "source": [ "## Independence as Maximum Conditional Entropy\n", "\n", "Let $ m=n $ and $ [x_1, \\ldots, x_n ] = [y_1, \\ldots, y_n] $.\n", "\n", "Let $ \\sum_j p(x_i,y_j) = \\sum_j p(x_j, y_i) $ for all $ i $,\n", "so that the marginal distributions of $ x $ and $ y $ are identical.\n", "\n", "Thus, $ x $ and $ y $ are identically distributed, but they\n", "are not necessarily independent.\n", "\n", "Consider the following problem:\n", "choose a joint distribution $ p(x_i,y_j) $ to maximize conditional entropy\n", "[(26.2)](#equation-eq-shannon2) subject to the restriction that $ x $ and $ y $ are identically distributed.\n", "\n", "The conditional-entropy-maximizing $ p(x_i,y_j) $ sets\n", "\n", "$$\n", "\\frac{p(x_i,y_j)}{p(y_j)} = \\sum_j p(x_i, y_j) = p(x_i) \\forall i .\n", "$$\n", "\n", "Thus, among all joint distributions with identical marginal distributions,\n", "the conditional entropy maximizing joint distribution makes $ x $ and $ y $ be\n", "independent." ] }, { "cell_type": "markdown", "id": "573248e7", "metadata": {}, "source": [ "## Thermodynamics\n", "\n", "Josiah Willard Gibbs (see [https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs](https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs)) defined entropy as\n", "\n", "\n", "<a id='equation-eq-gibbs'></a>\n", "$$\n", "S = - k_B \\sum_i p_i \\log p_i \\tag{26.3}\n", "$$\n", "\n", "where $ p_i $ is the probability of a micro state and $ k_B $ is Boltzmann’s constant.\n", "\n", "- The Boltzmann constant $ k_b $ relates energy at the micro particle level with the temperature observed at the macro level. It equals what is called a gas constant divided by an Avogadro constant. \n", "\n", "\n", "The second law of thermodynamics states that the entropy of a closed physical system increases until $ S $ defined in [(26.3)](#equation-eq-gibbs) attains a maximum." ] }, { "cell_type": "markdown", "id": "783d211e", "metadata": {}, "source": [ "## Statistical Divergence\n", "\n", "Let $ X $ be a discrete state space $ x_1, \\ldots, x_n $ and let $ p $ and $ q $ be two discrete probability\n", "distributions on $ X $.\n", "\n", "Assume that $ \\frac{p_i}{q_t} \\in (0,\\infty) $ for all $ i $ for which $ p_i >0 $.\n", "\n", "Then the Kullback-Leibler statistical divergence, also called **relative entropy**,\n", "is defined as\n", "\n", "\n", "<a id='equation-eq-shanno3'></a>\n", "$$\n", "D(p|q) = \\sum_i p_i \\log \\left(\\frac{p_i}{q_i}\\right) = \\sum_i q_i \\left( \\frac{p_i}{q_i}\\right) \\log\\left( \\frac{p_i}{q_i}\\right) . \\tag{26.4}\n", "$$\n", "\n", "Evidently,\n", "\n", "$$\n", "\\begin{aligned}\n", "D(p|q) & = - \\sum_i p_i \\log q_i + \\sum_i p_i \\log p_i \\cr\n", " & = H(p,q) - H(p) ,\n", " \\end{aligned}\n", "$$\n", "\n", "where $ H(p,q) = \\sum_i p_i \\log q_i $ is the cross-entropy.\n", "\n", "It is easy to verify, as we have done above, that $ D(p|q) \\geq 0 $ and that $ D(p|q) = 0 $ implies that $ p_i = q_i $ when $ q_i >0 $." ] }, { "cell_type": "markdown", "id": "ebc3331d", "metadata": {}, "source": [ "## Continuous distributions\n", "\n", "For a continuous random variable, Kullback-Leibler divergence between two densities $ p $ and $ q $ is defined as\n", "\n", "$$\n", "D(p|q) = \\int p(x) \\log \\left(\\frac{p(x)}{q(x)} \\right) d \\, x .\n", "$$" ] }, { "cell_type": "markdown", "id": "10ecb73e", "metadata": {}, "source": [ "## Relative entropy and Gaussian distributions\n", "\n", "We want to compute relative entropy for two continuous densities $ \\phi $ and $ \\hat \\phi $ when\n", "$ \\phi $ is $ {\\cal N}(0,I) $ and $ {\\hat \\phi} $ is $ {\\cal N}(w, \\Sigma) $, where the covariance matrix $ \\Sigma $ is nonsingular.\n", "\n", "We seek a formula for\n", "\n", "$$\n", "\\textrm{ent} = \\int (\\log {\\hat \\phi(\\varepsilon)} - \\log \\phi(\\varepsilon) ){\\hat \\phi(\\varepsilon)} d \\varepsilon.\n", "$$\n", "\n", "**Claim**\n", "\n", "\n", "<a id='equation-eq-relentropy101'></a>\n", "$$\n", "\\textrm{ent} = %\\int (\\log {\\hat \\phi} - \\log \\phi ){\\hat \\phi} d \\varepsilon=\n", "-{1 \\over 2} \\log\n", "\\det \\Sigma + {1 \\over 2}w'w + {1 \\over 2}\\mathrm{trace} (\\Sigma - I)\n", ". \\tag{26.5}\n", "$$\n", "\n", "**Proof**\n", "\n", "The log likelihood ratio is\n", "\n", "\n", "<a id='equation-footnote2'></a>\n", "$$\n", "\\log {\\hat \\phi}(\\varepsilon) - \\log \\phi(\\varepsilon) =\n", "{1 \\over 2} \\left[ - (\\varepsilon - w)' \\Sigma^{-1} (\\varepsilon - w)\n", " + \\varepsilon' \\varepsilon - \\log \\det\n", " \\Sigma\\right] . \\tag{26.6}\n", "$$\n", "\n", "Observe that\n", "\n", "$$\n", "- \\int {1 \\over 2} (\\varepsilon - w)' \\Sigma^{-1} (\\varepsilon -\n", "w) {\\hat \\phi}(\\varepsilon) d\\varepsilon = - {1 \\over 2}\\mathrm{trace}(I).\n", "$$\n", "\n", "Applying the identity $ \\varepsilon = w + (\\varepsilon - w) $ gives\n", "\n", "$$\n", "{1\\over 2}\\varepsilon' \\varepsilon = {1 \\over 2}w' w + {1 \\over 2}\n", "(\\varepsilon - w)' (\\varepsilon - w) + w' (\\varepsilon - w).\n", "$$\n", "\n", "Taking mathematical expectations\n", "\n", "$$\n", "{1 \\over 2} \\int \\varepsilon' \\varepsilon {\\hat \\phi}(\\varepsilon) d\n", "\\varepsilon = {1\\over 2} w'w + {1 \\over 2} \\mathrm{trace}(\\Sigma).\n", "$$\n", "\n", "Combining terms gives\n", "\n", "\n", "<a id='equation-eq-relentropy'></a>\n", "$$\n", "\\textrm{ent} = \\int (\\log {\\hat \\phi} - \\log \\phi ){\\hat \\phi} d \\varepsilon= -{1 \\over 2} \\log\n", "\\det \\Sigma + {1 \\over 2}w'w + {1 \\over 2}\\mathrm{trace} (\\Sigma - I)\n", ". \\tag{26.7}\n", "$$\n", "\n", "which agrees with equation [(26.5)](#equation-eq-relentropy101).\n", "Notice the separate appearances of the mean distortion $ w $ and the covariance distortion\n", "$ \\Sigma - I $ in equation [(26.7)](#equation-eq-relentropy).\n", "\n", "**Extension**\n", "\n", "Let $ N_0 = {\\mathcal N}(\\mu_0,\\Sigma_0) $ and $ N_1={\\mathcal N}(\\mu_1, \\Sigma_1) $ be two multivariate Gaussian\n", "distributions.\n", "\n", "Then\n", "\n", "\n", "<a id='equation-eq-shannon5'></a>\n", "$$\n", "D(N_0|N_1) = \\frac{1}{2} \\left(\\mathrm {trace} (\\Sigma_1^{-1} \\Sigma_0)\n", "+ (\\mu_1 -\\mu_0)' \\Sigma_1^{-1} (\\mu_1 - \\mu_0) - \\log\\left( \\frac{ \\mathrm {det }\\Sigma_0 }{\\mathrm {det}\\Sigma_1}\\right)\n", " - k \\right). \\tag{26.8}\n", "$$" ] }, { "cell_type": "markdown", "id": "dd21c8bb", "metadata": {}, "source": [ "## Von Neumann Entropy\n", "\n", "Let $ P $ and $ Q $ be two positive-definite symmetric matrices.\n", "\n", "A measure of the divergence between two $ P $ and $ Q $ is\n", "\n", "$$\n", "D(P|Q)= \\textrm{trace} ( P \\ln P - P \\ln Q - P + Q)\n", "$$\n", "\n", "where the log of a matrix is defined here ([https://en.wikipedia.org/wiki/Logarithm_of_a_matrix](https://en.wikipedia.org/wiki/Logarithm_of_a_matrix)).\n", "\n", "A density matrix $ P $ from quantum mechanics is a positive definite matrix with trace $ 1 $.\n", "\n", "The von Neumann entropy of a density matrix $ P $ is\n", "\n", "$$\n", "S = - \\textrm{trace} (P \\ln P)\n", "$$" ] }, { "cell_type": "markdown", "id": "069913e9", "metadata": {}, "source": [ "## Backus-Chernov-Zin Entropy\n", "\n", "After flipping signs, [[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)] use Kullback-Leibler relative entropy as a measure of volatility of stochastic discount factors that they\n", "assert is useful for characterizing features of both the data and various theoretical models of stochastic discount factors.\n", "\n", "Where $ p_{t+1} $ is the physical or true measure, $ p_{t+1}^* $ is the risk-neutral measure, and $ E_t $ denotes conditional\n", "expectation under the $ p_{t+1} $ measure, [[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)] define entropy as\n", "\n", "\n", "<a id='equation-eq-bcz1'></a>\n", "$$\n", "L_t (p_{t+1}^*/p_{t+1}) = - E_t \\log( p_{t+1}^*/p_{t+1}). \\tag{26.9}\n", "$$\n", "\n", "Evidently, by virtue of the minus sign in equation [(26.9)](#equation-eq-bcz1),\n", "\n", "\n", "<a id='equation-eq-bcz2'></a>\n", "$$\n", "L_t (p_{t+1}^*/p_{t+1}) = D_{KL,t}( p_{t+1}^*|p_{t+1}), \\tag{26.10}\n", "$$\n", "\n", "where $ D_{KL,t} $ denotes conditional relative entropy.\n", "\n", "Let $ m_{t+1} $ be a stochastic discount factor, $ r_{t+1} $ a gross one-period return on a risky\n", "security, and $ (r_{t+1}^1)^{-1}\\equiv q_t^1 = E_t m_{t+1} $ be the reciprocal of a risk-free one-period gross rate of return.\n", "Then\n", "\n", "$$\n", "E_t (m_{t+1} r_{t+1}) = 1\n", "$$\n", "\n", "[[Backus *et al.*, 2014](https://python-advanced.quantecon.org/zreferences.html#id7)] note that a stochastic discount factor satisfies\n", "\n", "$$\n", "m_{t+1} = q_t^1 p_{t+1}^*/p_{t+1} .\n", "$$\n", "\n", "They derive the following **entropy bound**\n", "\n", "$$\n", "E L_t (m_{t+1}) \\geq E (\\log r_{t+1} - \\log r_{t+1}^1 )\n", "$$\n", "\n", "which they propose as a complement to a Hansen-Jagannathan [[Hansen and Jagannathan, 1991](https://python-advanced.quantecon.org/zreferences.html#id23)] bound." ] }, { "cell_type": "markdown", "id": "69a98e36", "metadata": {}, "source": [ "## Wiener-Kolmogorov Prediction Error Formula as Entropy\n", "\n", "Let $ \\{x_t\\}_{t=-\\infty}^\\infty $ be a covariance stationary stochastic process with\n", "mean zero and spectral density $ S_x(\\omega) $.\n", "\n", "The variance of $ x $ is\n", "\n", "$$\n", "\\sigma_x^2 =\\left( \\frac{1}{2\\pi}\\right) \\int_{-\\pi}^\\pi S_x (\\omega) d \\omega .\n", "$$\n", "\n", "As described in chapter XIV of [[Sargent, 1987](https://python-advanced.quantecon.org/zreferences.html#id200)], the Wiener-Kolmogorov formula for the one-period ahead prediction error is\n", "\n", "\n", "<a id='equation-eq-shannon6'></a>\n", "$$\n", "\\sigma_\\epsilon^2 = \\exp\\left[\\left( \\frac{1}{2\\pi}\\right) \\int_{-\\pi}^\\pi \\log S_x (\\omega) d \\omega \\right]. \\tag{26.11}\n", "$$\n", "\n", "Occasionally the logarithm of the one-step-ahead prediction error $ \\sigma_\\epsilon^2 $\n", "is called entropy because it measures unpredictability.\n", "\n", "Consider the following problem reminiscent of one described earlier.\n", "\n", "**Problem:**\n", "\n", "Among all covariance stationary univariate processes with unconditional variance $ \\sigma_x^2 $, find a process with maximal\n", "one-step-ahead prediction error.\n", "\n", "The maximizer is a process with spectral density\n", "\n", "$$\n", "S_x(\\omega) = 2 \\pi \\sigma_x^2.\n", "$$\n", "\n", "Thus, among\n", "all univariate covariance stationary processes with variance $ \\sigma_x^2 $, a process with a flat spectral density is the most uncertain, in the sense of one-step-ahead prediction error variance.\n", "\n", "This no-patterns-across-time outcome for a temporally dependent process resembles the no-pattern-across-states outcome for the static entropy maximizing coin or die in the classic information theoretic\n", "analysis described above." ] }, { "cell_type": "markdown", "id": "6b7eb5d0", "metadata": {}, "source": [ "## Multivariate Processes\n", "\n", "Let $ y_t $ be an $ n \\times 1 $ covariance stationary stochastic process with mean $ 0 $ with\n", "matrix covariogram $ C_y(j) = E y_t y_{t-j}' $ and spectral density matrix\n", "\n", "$$\n", "S_y(\\omega) = \\sum_{j=-\\infty}^\\infty e^{- i \\omega j} C_y(j), \\quad \\omega \\in [-\\pi, \\pi].\n", "$$\n", "\n", "Let\n", "\n", "$$\n", "y_t = D(L) \\epsilon_t \\equiv \\sum_{j=0}^\\infty D_j \\epsilon_t\n", "$$\n", "\n", "be a Wold representation for $ y $, where $ D(0)\\epsilon_t $ is a\n", "vector of one-step-ahead errors in predicting $ y_t $ conditional on the infinite history $ y^{t-1} = [y_{t-1}, y_{t-2}, \\ldots ] $ and\n", "$ \\epsilon_t $ is an $ n\\times 1 $ vector of serially uncorrelated random disturbances with mean zero and identity contemporaneous\n", "covariance matrix $ E \\epsilon_t \\epsilon_t' = I $.\n", "\n", "Linear-least-squares predictors have one-step-ahead prediction error $ D(0) D(0)' $ that satisfies\n", "\n", "\n", "<a id='equation-eq-shannon22'></a>\n", "$$\n", "\\log \\det [D(0) D(0)'] = \\left(\\frac{1}{2 \\pi} \\right) \\int_{-\\pi}^\\pi \\log \\det [S_y(\\omega)] d \\omega. \\tag{26.12}\n", "$$\n", "\n", "Being a measure of the unpredictability of an $ n \\times 1 $ vector covariance stationary stochastic process,\n", "the left side of [(26.12)](#equation-eq-shannon22) is sometimes called entropy." ] }, { "cell_type": "markdown", "id": "c51a4c55", "metadata": {}, "source": [ "## Frequency Domain Robust Control\n", "\n", "Chapter 8 of [[Hansen and Sargent, 2008](https://python-advanced.quantecon.org/zreferences.html#id108)] adapts work in the control theory literature to define a\n", "**frequency domain entropy** criterion for robust control as\n", "\n", "\n", "<a id='equation-eq-shannon21'></a>\n", "$$\n", "\\int_\\Gamma \\log \\det [ \\theta I - G_F(\\zeta)' G_F(\\zeta) ] d \\lambda(\\zeta) , \\tag{26.13}\n", "$$\n", "\n", "where $ \\theta \\in (\\underline \\theta, +\\infty) $ is a positive robustness parameter and $ G_F(\\zeta) $ is a $ \\zeta $-transform of the\n", "objective function.\n", "\n", "Hansen and Sargent [[Hansen and Sargent, 2008](https://python-advanced.quantecon.org/zreferences.html#id108)] show that criterion [(26.13)](#equation-eq-shannon21) can be represented as\n", "\n", "\n", "<a id='equation-eq-shannon220'></a>\n", "$$\n", "\\log \\det [ D(0)' D(0)] = \\int_\\Gamma \\log \\det [ \\theta I - G_F(\\zeta)' G_F(\\zeta) ] d \\lambda(\\zeta) , \\tag{26.14}\n", "$$\n", "\n", "for an appropriate covariance stationary stochastic process derived from $ \\theta, G_F(\\zeta) $.\n", "\n", "This explains the\n", "moniker **maximum entropy** robust control for decision rules $ F $ designed to maximize criterion [(26.13)](#equation-eq-shannon21)." ] }, { "cell_type": "markdown", "id": "2a52ba3a", "metadata": {}, "source": [ "## Relative Entropy for a Continuous Random Variable\n", "\n", "Let $ x $ be a continuous random variable with density $ \\phi(x) $, and let $ g(x) $ be a nonnegative random variable satisfying $ \\int g(x) \\phi(x) dx =1 $.\n", "\n", "The relative entropy of the distorted density $ \\hat \\phi(x) = g(x) \\phi(x) $ is defined\n", "as\n", "\n", "$$\n", "\\textrm{ent}(g) = \\int g(x) \\log g(x) \\phi(x) d x .\n", "$$\n", "\n", "Fig. 26.2 plots the functions $ g \\log g $ and $ g -1 $\n", "over the interval $ g \\geq 0 $.\n", "\n", "That relative entropy $ \\textrm{ent}(g) \\geq 0 $ can be established by noting (a) that $ g \\log g \\geq g-1 $ (see Fig. 26.2)\n", "and (b) that under $ \\phi $, $ E g =1 $.\n", "\n", "Fig. 26.3 and Fig. 26.4 display aspects of relative entropy visually for a continuous random variable $ x $ for\n", "two densities with likelihood ratio $ g \\geq 0 $.\n", "\n", "Where the numerator density is $ {\\mathcal N}(0,1) $, for two denominator Gaussian densities $ {\\mathcal N}(0,1.5) $ and $ {\\mathcal N}(0,.95) $, respectively, Fig. 26.3 and Fig. 26.4 display the functions $ g \\log g $ and $ g -1 $ as functions of $ x $.\n", "\n", "\n", "\n", "The function $ g \\log g $ for $ g \\geq 0 $. For a random variable $ g $ with $ E g =1 $, $ E g \\log g \\geq 0 $. \n", "\n", "\n", "Graphs of $ g \\log g $ and $ g-1 $ where $ g $ is the ratio of the density of a $ {\\mathcal N}(0,1) $ random variable to the density of a $ {\\mathcal N}(0,1.5) $ random variable.\n", "Under the $ {\\mathcal N}(0,1.5) $ density, $ E g =1 $. \n", "\n", "\n", "$ g \\log g $ and $ g-1 $ where $ g $ is the ratio of the density of a $ {\\mathcal N}(0,1) $ random variable to the density of a $ {\\mathcal N}(0,1.5) $ random variable.\n", "Under the $ {\\mathcal N}(0,1.5) $ density, $ E g =1 $. " ] } ], "metadata": { "date": 1754356844.5925684, "filename": "entropy.md", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "title": "Etymology of Entropy" }, "nbformat": 4, "nbformat_minor": 5 }