How to implement Kullback-Leibler divergence using Mathematica's probability and distribution functions?

In the continuous case the Kullback-Leibler-Divergence from distribution $Q$ to distribution $P$ is defined as

\begin{equation} D_{KL}( P ||Q) = \int \limits_{-\infty}^{+\infty} p(x)\cdot \log \frac{p(x)}{q(x)} dx \end{equation}

So what you need is the probability density function which is PDF in Mathematica:

distP = NormalDistribution[]; (* avoid capital letters *)
distQ = GumbelDistribution[];

klDivergenceContinuous = Function[ { dist1, dist2 },
    NIntegrate[
        PDF[ dist1, x ] × ( Log @ PDF[ dist1, x ] - Log @ PDF[ dist2, x ] ),
        { x, -∞, +\[Infinity] }
    ]
];

klDivergenceContinuous[ distP, distQ ]

0.229783

Update: More general solution

The definition of the expected value in the continuous case is given as

\begin{equation} \mathbb{E}[ f(X) ] = \int \limits_{-\infty}^{+\infty} f(x)\cdot p(x) dx \end{equation}

We can use this definition to find the KL-Divergence. Since Mathematica's implementation of Expectation will handle discrete distributions also (effectivly summing over all values instead of integrating over them), the following routine might turn out to be more general:

Options[ klDivergence ] = Options @ Expectation;

klDivergence[ distP_ , distQ_, opts:OptionsPattern[ klDivergence ] ] := Expectation[ 
    Log @ PDF[ distP, \[FormalX] ] - Log @ PDF[ distQ, \[FormalX] ], 
    \[FormalX] \[Distributed] distP,
    opts
]

klDivergence[ 1, distP_, distQ_, opts:OptionsPattern[ klDivergence ] ] := klDivergence[
    distP, 
    distQ, 
    opts 
]

klDivergence[ n_Integer, distP_, distQ_, opts:OptionsPattern[ klDivergence ] ] 
    /; n > 0 := Module[
    {
        vars = Table[ Unique[ \[FormalX] ], n ]
    },
    Expectation[ 
        Log @ PDF[ distP, vars ] - Log @ PDF[ distQ, vars ], 
        vars \[Distributed] distP,
        opts
    ]  
]

This might (in principle) work for discrete as well as for continuous distributions:

Continuous Distributions (OP)

klDivergence[ distP, distQ ]

$-\frac{1}{2}+ \sqrt{\mathrm{e}} - \frac{1}{2} \log 2\pi $

N[%]

0.229783

Discrete Distributions

We may use the example given in (104506) to check how this works out in the discrete case:

pmfA = EmpiricalDistribution[ {1/6, 1/6, 1/6, 1/6, 1/6, 1/6} -> Range[6] ];
pmfB = EmpiricalDistribution[ {1/4, 1/4, 1/8, 1/8, 1/8, 1/8} -> Range[6] ];

klDivergence[ pmfA, pmfB ]

1/3 (Log[4] - 3 Log[6] + 2 Log[8])

The problems arising from a discrete distribution containing zero probabilities (cf. the question linked above) may be solved by dropping the zero-entries from the distribution:

(* pmfB = {1/4, 1/4, 1/2, 0, 0, 0} *)
pmfBzero = EmpiricalDistribution[ {1/4, 1/4, 1/2} -> Range[3] ];

klDivergence[ pmfBzero, pmfA ]

Log[ 3/Sqrt[2] ]


I know that this question already has a very good answer, but I still wanted to present the Wolfram Function Repository function I wrote specifically for this purpose (before I even found this question).

Like the solution by gwr, it uses (N)Expectation to compute the result, though I use LogLikelihood instead of Log @ PDF[...]. The function also checks the domains of the distributions for you.

edit The current version will complain if you use an EmpiricalDistribution, but I submitted a new version that fixes that problem.