Title: | Sample-Based Estimation of Kullback-Leibler Divergence |
---|---|
Description: | Estimation algorithms for Kullback-Leibler divergence between two probability distributions, based on one or two samples, and including uncertainty quantification. Distributions can be uni- or multivariate and continuous, discrete or mixed. |
Authors: | Niklas Hartung [aut, cre, cph] |
Maintainer: | Niklas Hartung <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0.9001 |
Built: | 2024-11-01 11:15:39 UTC |
Source: | https://github.com/niklhart/kldest |
Combinations of input arguments
combinations(...)
combinations(...)
... |
Any number of atomic vectors. |
A data frame with columns named as the inputs, containing all input combinations.
combinations(a = 1:2, b = letters[1:3], c = LETTERS[1:2])
combinations(a = 1:2, b = letters[1:3], c = LETTERS[1:2])
Specify a matrix with constant values on the diagonal and on the off-diagonals. Such matrices can be used to vary the degree of dependency in covariate matrices, for example when evaluating accuracy of KL-divergence estimation algorithms.
constDiagMatrix(dim = 1, diag = 1, offDiag = 0)
constDiagMatrix(dim = 1, diag = 1, offDiag = 0)
dim |
Dimension |
diag |
Value at the diagonal |
offDiag |
Value at off-diagonals |
A dim
-by-dim
matrix
constDiagMatrix(dim = 3, diag = 1, offDiag = 0.9)
constDiagMatrix(dim = 3, diag = 1, offDiag = 0.9)
Subsampling-based confidence intervals computed by kld_ci_subsampling()
require the convergence rate of the KL divergence estimator as an input. The
default rate of 0.5
assumes that the variance term dominates the bias term.
For high-dimensional problems, depending on the data, the convergence rate
might be lower. This function allows to empirically derive the convergence
rate.
convergence_rate( estimator, X, Y = NULL, q = NULL, n.sizes = 4, spacing.factor = 1.5, typical.subsample = function(n) sqrt(n), B = 500L, plot = FALSE )
convergence_rate( estimator, X, Y = NULL, q = NULL, n.sizes = 4, spacing.factor = 1.5, typical.subsample = function(n) sqrt(n), B = 500L, plot = FALSE )
estimator |
A KL divergence estimator. |
X , Y
|
|
q |
The density function of the approximate distribution |
n.sizes |
Number of different subsample sizes to use (default: |
spacing.factor |
Multiplicative factor controlling the spacing of sample
sizes (default: |
typical.subsample |
A function that produces a typical subsample size,
used as the geometric mean of subsample sizes (default: |
B |
Number of subsamples to draw per subsample size. |
plot |
A boolean (default: |
References:
Politis, Romano and Wolf, "Subsampling", Chapter 8 (1999), for theory.
The implementation has been adapted from lecture notes by C. J. Geyer, https://www.stat.umn.edu/geyer/5601/notes/sub.pdf
A scalar, the parameter in the empirical convergence
rate
of the
estimator
to the true KL divergence.
It can be used in the convergence.rate
argument of kld_ci_subsampling()
as convergence.rate = function(n) n^beta
.
# NN method usually has a convergence rate around 0.5: set.seed(0) convergence_rate(kld_est_nn, X = rnorm(1000), Y = rnorm(1000, mean = 1, sd = 2))
# NN method usually has a convergence rate around 0.5: set.seed(0) convergence_rate(kld_est_nn, X = rnorm(1000), Y = rnorm(1000, mean = 1, sd = 2))
Detect if a one- or two-sample problem is specified
is_two_sample(Y, q)
is_two_sample(Y, q)
Y |
A vector, matrix, data frame or |
q |
A function or |
TRUE
for a two-sample problem (i.e., Y
non-null and q = NULL
)
and FALSE
for a one-sample problem (i.e., Y = NULL
and q
non-null).
This function computes a confidence interval for KL divergence based on Efron's bootstrap. The approach only works for kernel density-based estimators since nearest neighbour-based estimators cannot deal with the ties produced when sampling with replacement.
kld_ci_bootstrap( X, Y, estimator = kld_est_kde1, B = 500L, alpha = 0.05, method = c("quantile", "se"), include.boot = FALSE, ... )
kld_ci_bootstrap( X, Y, estimator = kld_est_kde1, B = 500L, alpha = 0.05, method = c("quantile", "se"), include.boot = FALSE, ... )
X , Y
|
|
estimator |
A function expecting two inputs |
B |
Number of bootstrap replicates (default: |
alpha |
Error level, defaults to |
method |
Either |
include.boot |
Boolean, |
... |
Arguments passed on to |
Reference:
Efron, "Bootstrap Methods: Another Look at the Jackknife", The Annals of Statistics, Vol. 7, No. 1 (1979).
A list with the following fields:
"est"
(the estimated KL divergence),
"boot"
(a length B
numeric vector with KL divergence estimates on
the bootstrap subsamples), only included if include.boot = TRUE
,
"ci"
(a length 2
vector containing the lower and upper limits of the
estimated confidence interval).
# 1D Gaussian, two samples set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X, Y) kld_ci_bootstrap(X, Y)
# 1D Gaussian, two samples set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X, Y) kld_ci_bootstrap(X, Y)
This function computes a confidence interval for KL divergence based on the subsampling bootstrap introduced by Politis and Romano. See Details for theoretical properties of this method.
kld_ci_subsampling( X, Y = NULL, q = NULL, estimator = kld_est_nn, B = 500L, alpha = 0.05, subsample.size = function(x) x^(2/3), convergence.rate = sqrt, method = c("quantile", "se"), include.boot = FALSE, n.cores = 1L, ... )
kld_ci_subsampling( X, Y = NULL, q = NULL, estimator = kld_est_nn, B = 500L, alpha = 0.05, subsample.size = function(x) x^(2/3), convergence.rate = sqrt, method = c("quantile", "se"), include.boot = FALSE, n.cores = 1L, ... )
X , Y
|
|
q |
The density function of the approximate distribution |
estimator |
The Kullback-Leibler divergence estimation method; a
function expecting two inputs ( |
B |
Number of bootstrap replicates (default: |
alpha |
Error level, defaults to |
subsample.size |
A function specifying the size of the subsamples,
defaults to |
convergence.rate |
A function computing the convergence rate of the
estimator as a function of sample sizes. Defaults to |
method |
Either |
include.boot |
Boolean, |
n.cores |
Number of cores to use in parallel computing (defaults to |
... |
Arguments passed on to |
In general terms, tetting be the subsample size for a sample of
size
, and
the convergence rate of the estimator, a
confidence interval calculated by subsampling has asymptotic coverage
as long as
,
and
.
In many cases, the convergence rate of the nearest-neighbour based KL
divergence estimator is and the condition on the
subsample size reduces to
and
.
By default,
. In a two-sample problem,
and
are replaced by effective sample sizes
and
.
Reference:
Politis and Romano, "Large sample confidence regions based on subsamples under minimal assumptions", The Annals of Statistics, Vol. 22, No. 4 (1994).
A list with the following fields:
"est"
(the estimated KL divergence),
"ci"
(a length 2
vector containing the lower and upper limits of the
estimated confidence interval).
"boot"
(a length B
numeric vector with KL divergence estimates on
the bootstrap subsamples), only included if include.boot = TRUE
,
# 1D Gaussian (one- and two-sample problems) set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean =1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y = Y) kld_est_nn(X, q = q) kld_ci_subsampling(X, Y)$ci kld_ci_subsampling(X, q = q)$ci
# 1D Gaussian (one- and two-sample problems) set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean =1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y = Y) kld_est_nn(X, q = q) kld_ci_subsampling(X, Y)$ci kld_ci_subsampling(X, q = q)$ci
Analytical KL divergence for two discrete distributions
kld_discrete(P, Q)
kld_discrete(P, Q)
P , Q
|
Numerical arrays with the same dimensions, representing discrete probability distributions |
A scalar (the Kullback-Leibler divergence)
# 1-D example P <- 1:4/10 Q <- rep(0.25,4) kld_discrete(P,Q) # The above example in 2-D P <- matrix(1:4/10,nrow=2) Q <- matrix(0.25,nrow=2,ncol=2) kld_discrete(P,Q)
# 1-D example P <- 1:4/10 Q <- rep(0.25,4) kld_discrete(P,Q) # The above example in 2-D P <- matrix(1:4/10,nrow=2) Q <- matrix(0.25,nrow=2,ncol=2) kld_discrete(P,Q)
For two mixed continuous/discrete distributions with densities and
, and denoting
, the Kullback-Leibler
divergence
is given as
Conditioning on the discrete variables , this can be re-written as
Here, the terms
are approximated via nearest neighbour- or kernel-based density estimates on
the datasets X
and Y
stratified by the discrete variables, and
is approximated using relative frequencies.
kld_est( X, Y = NULL, q = NULL, estimator.continuous = kld_est_nn, estimator.discrete = kld_est_discrete, vartype = NULL )
kld_est( X, Y = NULL, q = NULL, estimator.continuous = kld_est_nn, estimator.discrete = kld_est_discrete, vartype = NULL )
X , Y
|
|
q |
The density function of the approximate distribution |
estimator.continuous , estimator.discrete
|
KL divergence estimators for
continuous and discrete data, respectively. Both are functions with two
arguments |
vartype |
A length |
A scalar, the estimated Kullback-Leibler divergence .
# 2D example, two samples set.seed(0) X <- data.frame(cont = rnorm(10), discr = c(rep('a',4),rep('b',6))) Y <- data.frame(cont = c(rnorm(5), rnorm(5, sd = 2)), discr = c(rep('a',5),rep('b',5))) kld_est(X, Y) # 2D example, one sample set.seed(0) X <- data.frame(cont = rnorm(10), discr = c(rep(0,4),rep(1,6))) q <- list(cond = function(xc,xd) dnorm(xc, mean = xd, sd = 1), disc = function(xd) dbinom(xd, size = 1, prob = 0.5)) kld_est(X, q = q, vartype = c("c","d"))
# 2D example, two samples set.seed(0) X <- data.frame(cont = rnorm(10), discr = c(rep('a',4),rep('b',6))) Y <- data.frame(cont = c(rnorm(5), rnorm(5, sd = 2)), discr = c(rep('a',5),rep('b',5))) kld_est(X, Y) # 2D example, one sample set.seed(0) X <- data.frame(cont = rnorm(10), discr = c(rep(0,4),rep(1,6))) q <- list(cond = function(xc,xd) dnorm(xc, mean = xd, sd = 1), disc = function(xd) dbinom(xd, size = 1, prob = 0.5)) kld_est(X, q = q, vartype = c("c","d"))
This is the bias-reduced generalized k-NN based KL divergence estimator from Wang et al. (2009) specified in Eq.(29).
kld_est_brnn(X, Y, max.k = 100, warn.max.k = TRUE, eps = 0)
kld_est_brnn(X, Y, max.k = 100, warn.max.k = TRUE, eps = 0)
X , Y
|
|
max.k |
Maximum numbers of nearest neighbours to compute (default: |
warn.max.k |
If |
eps |
Error bound in the nearest neighbour search. A value of |
Finite sample bias reduction is achieved by an adaptive choice of the number
of nearest neighbours. Fixing the number of nearest neighbours upfront, as
done in kld_est_nn()
, may result in very different distances
of a datapoint
to its
-th nearest
neighbours in
and
-th nearest neighbours in
,
respectively, which may lead to unequal biases in NN density estimation,
especially in a high-dimensional setting.
To overcome this issue, the number of neighbours
are here chosen
in a way to render
comparable, by taking the largest
possible number of neighbours
smaller than
.
Since the bias reduction explicitly uses both samples X
and Y
, one-sample
estimation is not possible using this method.
Reference: Wang, Kulkarni and Verdú, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009). DOI: https://doi.org/10.1109/TIT.2009.2016060
A scalar, the estimated Kullback-Leibler divergence .
# KL-D between one or two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean = 1, sd =2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y) kld_est_nn(X, q = q) kld_est_nn(X, Y, k = 5) kld_est_nn(X, q = q, k = 5) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_nn(X, Y) kld_est_nn(X, Y, k = 5) kld_est_brnn(X, Y)
# KL-D between one or two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean = 1, sd =2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y) kld_est_nn(X, q = q) kld_est_nn(X, Y, k = 5) kld_est_nn(X, q = q, k = 5) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_nn(X, Y) kld_est_nn(X, Y, k = 5) kld_est_brnn(X, Y)
Plug-in KL divergence estimator for samples from discrete distributions
kld_est_discrete(X, Y = NULL, q = NULL)
kld_est_discrete(X, Y = NULL, q = NULL)
X , Y
|
|
q |
The probability mass function of the approximate distribution
|
A scalar, the estimated Kullback-Leibler divergence .
# 1D example, two samples X <- c(rep('M',5),rep('F',5)) Y <- c(rep('M',6),rep('F',4)) kld_est_discrete(X, Y) # 1D example, one sample X <- c(rep(0,4),rep(1,6)) q <- function(x) dbinom(x, size = 1, prob = 0.5) kld_est_discrete(X, q = q)
# 1D example, two samples X <- c(rep('M',5),rep('F',5)) Y <- c(rep('M',6),rep('F',4)) kld_est_discrete(X, Y) # 1D example, one sample X <- c(rep(0,4),rep(1,6)) q <- function(x) dbinom(x, size = 1, prob = 0.5) kld_est_discrete(X, q = q)
Disclaimer: this function doesn't use binning and/or the fast Fourier transform and hence, it is extremely slow even for moderate datasets. For this reason, it is not exported currently.
kld_est_kde(X, Y, hX = NULL, hY = NULL, rule = c("Silverman", "Scott"))
kld_est_kde(X, Y, hX = NULL, hY = NULL, rule = c("Silverman", "Scott"))
X , Y
|
|
hX , hY
|
Positive scalars or length |
rule |
A heuristic for computing arguments
As an alternative, Scott's rule
|
This estimation method approximates the densities of the unknown distributions
and
by kernel density estimates, using a sample size- and
dimension-dependent bandwidth parameter and a Gaussian kernel. It works for
any number of dimensions but is very slow.
A scalar, the estimated Kullback-Leibler divergence .
# KL-D between two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X, Y) kld_est_nn(X, Y) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_kde2(X, Y) kld_est_nn(X, Y) kld_est_brnn(X, Y)
# KL-D between two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X, Y) kld_est_nn(X, Y) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_kde2(X, Y) kld_est_nn(X, Y) kld_est_brnn(X, Y)
This estimation method approximates the densities of the unknown distributions
and
by a kernel density estimate using function 'density' from
package 'stats'. Only the two-sample, not the one-sample problem is implemented.
kld_est_kde1(X, Y, MC = FALSE, ...)
kld_est_kde1(X, Y, MC = FALSE, ...)
X , Y
|
Numeric vectors or single-column matrices, representing samples
from the true distribution |
MC |
A boolean: use a Monte Carlo approximation instead of numerical
integration via the trapezoidal rule (default: |
... |
Further parameters to passed on to |
A scalar, the estimated Kullback-Leibler divergence .
# KL-D between two samples from 1D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X,Y) kld_est_kde1(X,Y, MC = TRUE)
# KL-D between two samples from 1D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_kde1(X,Y) kld_est_kde1(X,Y, MC = TRUE)
This estimation method approximates the densities of the unknown bivariate
distributions and
by kernel density estimates using function
'bkde' from package 'KernSmooth'. If 'KernSmooth' is not installed, a message
is issued and the (much) slower function 'kld_est_kde' is used instead.
kld_est_kde2( X, Y, MC = FALSE, hX = NULL, hY = NULL, rule = c("Silverman", "Scott"), eps = 1e-05 )
kld_est_kde2( X, Y, MC = FALSE, hX = NULL, hY = NULL, rule = c("Silverman", "Scott"), eps = 1e-05 )
X , Y
|
|
MC |
A boolean: use a Monte Carlo approximation instead of numerical
integration via the trapezoidal rule (default: |
hX , hY
|
Bandwidths for the kernel density estimates of |
rule |
A heuristic to derive parameters
|
eps |
A nonnegative scalar; if |
A scalar, the estimated Kullback-Leibler divergence .
# KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(1000) X2 <- rnorm(1000) Y1 <- rnorm(1000) Y2 <- Y1 + rnorm(1000) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_kde2(X,Y)
# KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(1000) X2 <- rnorm(1000) Y1 <- rnorm(1000) Y2 <- Y1 + rnorm(1000) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_kde2(X,Y)
torch
Estimation of KL divergence between continuous distributions based on the Donsker-Varadhan representation
using Monte Carlo averages to approximate the expectations, and optimizing
over a class of neural networks. The torch
package is required to use this
function.
kld_est_neural( X, Y, d_hidden = 1024, learning_rate = 1e-04, epochs = 5000, device = c("cpu", "cuda", "mps"), verbose = FALSE )
kld_est_neural( X, Y, d_hidden = 1024, learning_rate = 1e-04, epochs = 5000, device = c("cpu", "cuda", "mps"), verbose = FALSE )
X , Y
|
|
Number of nodes in hidden layer (default: |
|
learning_rate |
Learning rate during gradient descent (default: |
epochs |
Number of training epochs (default: |
device |
Calculation device, either |
verbose |
Generate progress report to consolue during training of the
neutral network (default: |
Disclaimer: this is a simple test implementation which is not optimized by any means. In particular:
it uses a fully connected network with (only) a single hidden layer
it uses standard gradient descient on the full dataset and not more advanced estimators Also, he syntax is likely to change in the future.
Estimation is done as described for mutual information in Belghazi et al. (see ref. below), except that standard gradient descent is used on the full samples X and Y instead of using batches. Indeed, in the case where X and Y have a different length, batch sampling is not that straightforward.
Reference: Belghazi et al., Mutual Information Neural Estimation, PMLR 80:531-540, 2018.
A scalar, the estimated Kullback-Leibler divergence .
# 2D example # analytical solution kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) # sample generation set.seed(0) nxy <- 1000 X1 <- rnorm(nxy) X2 <- rnorm(nxy) Y1 <- rnorm(nxy) Y2 <- Y1 + rnorm(nxy) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) # Estimation kld_est_nn(X, Y) ## Not run: # requires the torch package and takes ~1 min kld_est_neural(X, Y) ## End(Not run)
# 2D example # analytical solution kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) # sample generation set.seed(0) nxy <- 1000 X1 <- rnorm(nxy) X2 <- rnorm(nxy) Y1 <- rnorm(nxy) Y2 <- Y1 + rnorm(nxy) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) # Estimation kld_est_nn(X, Y) ## Not run: # requires the torch package and takes ~1 min kld_est_neural(X, Y) ## End(Not run)
This function estimates Kullback-Leibler divergence between
two continuous distributions
and
using nearest-neighbour (NN)
density estimation in a Monte Carlo approximation of
.
kld_est_nn(X, Y = NULL, q = NULL, k = 1L, eps = 0, log.q = FALSE)
kld_est_nn(X, Y = NULL, q = NULL, k = 1L, eps = 0, log.q = FALSE)
X , Y
|
|
q |
The density function of the approximate distribution |
k |
The number of nearest neighbours to consider for NN density estimation.
Larger values for |
eps |
Error bound in the nearest neighbour search. A value of |
log.q |
If |
Input for estimation is a sample X
from and either the density
function
q
of (one-sample problem) or a sample
Y
of
(two-sample problem). In the two-sample problem, it is the estimator in Eq.(5)
of Wang et al. (2009). In the one-sample problem, the asymptotic bias (the
expectation of a Gamma distribution) is substracted, see Pérez-Cruz (2008),
Eq.(18).
References:
Wang, Kulkarni and Verdú, "Divergence Estimation for Multidimensional Densities Via k-Nearest-Neighbor Distances", IEEE Transactions on Information Theory, Vol. 55, No. 5 (2009).
Pérez-Cruz, "Kullback-Leibler Divergence Estimation of Continuous Distributions", IEEE International Symposium on Information Theory (2008).
A scalar, the estimated Kullback-Leibler divergence .
# KL-D between one or two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean = 1, sd =2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y) kld_est_nn(X, q = q) kld_est_nn(X, Y, k = 5) kld_est_nn(X, q = q, k = 5) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_nn(X, Y) kld_est_nn(X, Y, k = 5) kld_est_brnn(X, Y)
# KL-D between one or two samples from 1-D Gaussians: set.seed(0) X <- rnorm(100) Y <- rnorm(100, mean = 1, sd = 2) q <- function(x) dnorm(x, mean = 1, sd =2) kld_gaussian(mu1 = 0, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_est_nn(X, Y) kld_est_nn(X, q = q) kld_est_nn(X, Y, k = 5) kld_est_nn(X, q = q, k = 5) kld_est_brnn(X, Y) # KL-D between two samples from 2-D Gaussians: set.seed(0) X1 <- rnorm(100) X2 <- rnorm(100) Y1 <- rnorm(100) Y2 <- Y1 + rnorm(100) X <- cbind(X1,X2) Y <- cbind(Y1,Y2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(0,2), sigma2 = matrix(c(1,1,1,2),nrow=2)) kld_est_nn(X, Y) kld_est_nn(X, Y, k = 5) kld_est_brnn(X, Y)
This function computes , where
and
, in rate parametrization.
kld_exponential(lambda1, lambda2)
kld_exponential(lambda1, lambda2)
lambda1 |
A scalar (rate parameter of true exponential distribution) |
lambda2 |
A scalar (rate parameter of approximate exponential distribution) |
A scalar (the Kullback-Leibler divergence)
kld_exponential(lambda1 = 1, lambda2 = 2)
kld_exponential(lambda1 = 1, lambda2 = 2)
This function computes , where
and
.
kld_gaussian(mu1, sigma1, mu2, sigma2)
kld_gaussian(mu1, sigma1, mu2, sigma2)
mu1 |
A numeric vector (mean of true Gaussian) |
sigma1 |
A s.p.d. matrix (Covariance matrix of true Gaussian) |
mu2 |
A numeric vector (mean of approximate Gaussian) |
sigma2 |
A s.p.d. matrix (Covariance matrix of approximate Gaussian) |
A scalar (the Kullback-Leibler divergence)
kld_gaussian(mu1 = 1, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(1,2), sigma2 = matrix(c(1,0.5,0.5,1), nrow = 2))
kld_gaussian(mu1 = 1, sigma1 = 1, mu2 = 1, sigma2 = 2^2) kld_gaussian(mu1 = rep(0,2), sigma1 = diag(2), mu2 = rep(1,2), sigma2 = matrix(c(1,0.5,0.5,1), nrow = 2))
This function computes , where
and
, with
.
kld_uniform(a1, b1, a2, b2)
kld_uniform(a1, b1, a2, b2)
a1 , b1
|
Range of true uniform distribution |
a2 , b2
|
Range of approximate uniform distribution |
A scalar (the Kullback-Leibler divergence)
kld_uniform(a1 = 0, b1 = 1, a2 = 0, b2 = 2)
kld_uniform(a1 = 0, b1 = 1, a2 = 0, b2 = 2)
This function computes , where
and
.
kld_uniform_gaussian(a = 0, b = 1, mu = 0, sigma2 = 1)
kld_uniform_gaussian(a = 0, b = 1, mu = 0, sigma2 = 1)
a , b
|
Parameters of uniform (true) distribution |
mu , sigma2
|
Parameters of Gaussian (approximate) distribution |
A scalar (the Kullback-Leibler divergence)
kld_uniform_gaussian(a = 0, b = 1, mu = 0, sigma2 = 1)
kld_uniform_gaussian(a = 0, b = 1, mu = 0, sigma2 = 1)
Probability density function of multivariate Gaussian distribution
mvdnorm(x, mu, Sigma)
mvdnorm(x, mu, Sigma)
x |
A vector of length |
mu |
A vector of length |
Sigma |
A |
The probability density of evaluated at
x
.
# 1D example mvdnorm(x = 2, mu = 1, Sigma = 2) dnorm(x = 2, mean = 1, sd = sqrt(2)) # Independent 2D example mvdnorm(x = c(2,2), mu = c(1,1), Sigma = diag(1:2)) prod(dnorm(x = c(2,2), mean = c(1,1), sd = sqrt(1:2))) # Correlated 2D example mvdnorm(x = c(2,2), mu = c(1,1), Sigma = matrix(c(2,1,1,2),nrow=2))
# 1D example mvdnorm(x = 2, mu = 1, Sigma = 2) dnorm(x = 2, mean = 1, sd = sqrt(2)) # Independent 2D example mvdnorm(x = c(2,2), mu = c(1,1), Sigma = diag(1:2)) prod(dnorm(x = c(2,2), mean = c(1,1), sd = sqrt(1:2))) # Correlated 2D example mvdnorm(x = c(2,2), mu = c(1,1), Sigma = matrix(c(2,1,1,2),nrow=2))
Since Kullback-Leibler divergence is scale-invariant, its sample-based
approximations can be computed on a conveniently chosen scale. This helper
functions transforms each variable in a way that all marginal distributions
of the joint dataset are uniform. In this way, the scales of
different variables are rendered comparable, with the idea of a better
performance of neighbour-based methods in this situation.
to_uniform_scale(X, Y)
to_uniform_scale(X, Y)
X , Y
|
|
A list with fields X
and Y
, containing the transformed samples.
# 2D example n <- 10L X <- cbind(rnorm(n, mean = 0, sd = 3), rnorm(n, mean = 1, sd = 2)) Y <- cbind(rnorm(n, mean = 1, sd = 2), rnorm(n, mean = 0, sd = 2)) to_uniform_scale(X, Y)
# 2D example n <- 10L X <- cbind(rnorm(n, mean = 0, sd = 3), rnorm(n, mean = 1, sd = 2)) Y <- cbind(rnorm(n, mean = 1, sd = 2), rnorm(n, mean = 0, sd = 2)) to_uniform_scale(X, Y)
Matrix trace operator
tr(M)
tr(M)
M |
A square matrix |
The matrix trace (a scalar)
Trapezoidal integration in 1 or 2 dimensions
trapz(h, fx)
trapz(h, fx)
h |
A length |
fx |
A |
The trapezoidal approximation of the integral.
# 1D example trapz(h = 1, fx = 1:10) # 2D example trapz(h = c(1,1), fx = matrix(1:10, nrow = 2))
# 1D example trapz(h = 1, fx = 1:10) # 2D example trapz(h = c(1,1), fx = matrix(1:10, nrow = 2))