Skip to content

Composite Distributions

Composite distributions

This section contains composite distributions in the sense that they refer to other distribution which they combine or modify in some way.

Mixture distribution

The mixture distribution, sometimes called addition of distributions, is a (weighted) sum of distributions \(f_i(\vec{x})\), depending on the same variable(s) \(\vec{x}\):

\[ \begin{aligned} \text{MixturePdf}(x) = \frac{1}{\ensuremath{\mathcal{M}}}\sum_{i=1}^{n} c_i \cdot f_i(\vec{x}) , \end{aligned} \]

where the \(c_i\) are coefficients and \(\vec{x}\) is the vector of variables.

  • name: custom unique string
  • type: mixture_dist
  • name: name of the variable \(x\) (optional, since the variable is fully defined by the summands)
  • summands: array of names referencing distributions
  • coefficients: array of names of coefficients \(c_i\) or numbers to be added
  • extended: boolean denoting whether this is an extended distribution (optional, as it can be inferred from the lengths of the lists for summands and coefficients) This distribution can be treated as extended or as non-extended.
  • If the number of coefficients equals the number of distributions and extended is false, then the sum of the coefficient must be (approximately) one.
  • If the number of coefficients equals the number of distributions and extended is true, then the sum of the coefficients will be used as the Poisson rate parameter and the mixture distribution itself will use the coefficients normalized by their sum.
  • If the number of coefficients is one less than the number of distributions, then extended must be false and \(c_n\) is computed from the other coefficients as
\[ \begin{aligned} c_n &= 1 - \sum_{i=1}^{n-1}c_i . \end{aligned} \]

Product distribution

The product of s of independent distributions \(f_i\) is defined as

\[ \begin{aligned} \text{ProductPdf}(x) &= \frac{1}{\ensuremath{\mathcal{M}}}\prod_i^n f(x). \end{aligned} \]
  • name: custom string
  • type: product_dist
  • x: name of the variable \(x\) (optional, since the variable is fully defined by the factors)
  • factors: array of names referencing distributions

Density Function distribution

A distribution that is specified via a non-normalized density function \(f(x)\). Formally, it corresponds to the probability measure that has the density \(f(x)\) in respect to the Lebesgue measure.

The density function is normalized automatically by \(\ensuremath{\mathcal{M}}= \int f(x) dx\), so the PDF of the distribution is

\[ \begin{aligned} \text{DensityFunctionPdf}(f, x) &= \frac{f(x)}{\ensuremath{\mathcal{M}}} \end{aligned} \]

The distribution can be specified either via the non-normalized density function \(f(x)\) or via the non-normalized log-density function \(\log(f(x))\):

  • density_function\_dist$: Specified via the PDF \(f(x)\)
  • name: custom string
  • type: density_function_dist
  • function: The density function \(f\)
  • log_density_function\_dist: Specified via the PDF \(f(x)\)
  • name: custom string
  • type: log_density_function_dist
  • function: The density function \(f\)

Poisson point process

A Poisson point process distribution is understood here as the distribution of the outcomes of a (typically inhomogeneous) Poisson point process. In particle physics, such a distribution is often called an extended distribution [^barlow-extended-ml]. In HS3, a Poisson point process distribution is specified using a global-rate parameter \(\lambda\) and an underlying distribution \(m\), either explicitly or implicitly (see below). Random values are drawn from the Poisson point process distribution by drawing a random number \(n\) from a Poisson distribution of the global-rate \(\lambda\), and then drawing \(n\) random values from the underlying distribution \(m\). The resulting random values are vectors \(x\) of length \(n\), so their length varies. The PDF the Poisson point process distribution at an outcome \(x\) (a vector of length \(n\)) is

\[ \begin{aligned} \text{PoissonPointProcessPdf}(\lambda, m, x) &= \text{PoissonPdf}(n, \lambda) \cdot \prod_i^n \text{PDF}(m, x_i). \end{aligned} \]

In particle physics, the function \(\text{PoissonPointProcessPdf}(\lambda, m(\theta), x)\), for a fixed observation \(x\) and varying parameters \(\lambda\) and \(\theta\), is often called an extended likelihood. A a poisson point process distribution can be specified in HS3 in two ways:

  • rate_extended_dist: The global-rate parameter \(\lambda\) and underlying distribution \(m\) are specified explicitly:
  • name: custom string
  • type: rate_extended_dist
  • rate: The global rate \(\lambda\)
  • dist: The underlying distribution \(m\)

The name of the variable \(x\) is taken from the underlying distribution. The underlying distribution must not be referred to from other components of the statistical model.

  • rate_density_dist: Specified via a non-normalized rate-density function \(f\). Both the global-rate parameter and the underlying distribution are implicit: \(\lambda = \int f(y) d y\) and \(m = \text{DensityFunctionPdf}(f)\). More formally, the distribution corresponds to the inhomogeneous Poisson point process that is defined by a non-normalized rate measure which has density \(f\) in respect to the Lebesque measure.
  • name: custom string
  • type: rate_density_dist
  • x: name of the variable \(x\)
  • density: The rate-density function \(f\)

Bin count distribution

This is a binned version of the Poisson point process distribution (Poisson point process). Is is the distribution of the bin counts that result from histogramming the outcomes of a Poisson point process distribution using a given binning scheme (see Binned Data). Like the Poisson point process distribution, a Bin-counts distribution can either be specified via a global-rate parameter \(\lambda\) and an underlying distribution \(m\), or via a rate-density function \(f\) (in which case \(\lambda\) and \(m\) are implicit). In addition, the binning scheme also has to be specified in either case, unless it can be inferred (see below). For \(k\) bins, this type of distribution corresponds to a product of \(k\) Poisson distributions with rates

\[ \begin{aligned} \nu_i &= \lambda \cdot \int_{\text{bin}_i} \text{PDF}(y, m) dy \quad \text{equivalent to}\\ \nu_i &= \int_{\text{bin}_i} f(y) dy \end{aligned} \]

The PDF of the Bin-counts distribution at an outcome \(x\) (a vector of length \(k\), same as the number of bins) is

\[ \begin{aligned} \text{BinCountsPdf}(x) &= \prod_i^k \text{PoissonPdf}(x_i, \nu_i) \end{aligned} \]
  • bincounts_extended_dist: The global-rate parameter \(\lambda\) and underlying distribution \(m\) are specified explicitly:
  • name: custom string
  • type: bincounts_extended_dist
  • rate: The global rate \(\lambda\)
  • dist: The underlying distribution \(m\)
  • axes: a definition of the binning to be used, following the defintions in Binned data]. optional if dist is a binned distribution, in which case the same binning is used by default.

The name of the variable \(x\) is taken from the underlying distribution. The underlying distribution must not be referred to from other components of the statistical model. bincounts_density_dist: Specified via a non-normalized rate-density function \(f\). Both the global-rate parameter and the underlying distribution are implicit: \(\lambda = \int f(y) dy\) and \(m = \text{DensityFunctionPdf}(f)\).

More formally, the distribution corresponds to the inhomogeneous Poisson point process that is defined by a non-normalized rate measure which has density \(f\) in respect to the Lebesque measure.

  • name: custom string
  • type: bincounts_density_dist
  • x: name of the variable \(x\)
  • density: The rate-density function \(f\)
  • axes: a definition of the binning to be used, following the defintions in Binned Data.