Class GaussianDistribution

  • All Implemented Interfaces:
    java.io.Serializable, Distribution, ExponentialFamily

    public class GaussianDistribution
    extends AbstractDistribution
    implements ExponentialFamily
    The normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe any variable that tends to cluster around the mean.

    The family of normal distributions is closed under linear transformations. That is, if X is normally distributed, then a linear transform aX + b (for some real numbers a ≠ 0 and b) is also normally distributed. If X1, X2 are two independent normal random variables, then their linear combination will also be normally distributed. The converse is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal, which is known as the Cramer's theorem. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the one with the maximum entropy.

    The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that

    n (1⁄n Σ Xi - μ) → N(0, σ2).

    The theorem will hold even if the summands Xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

    Therefore, certain other distributions can be approximated by the normal distribution, for example:

    • The binomial distribution B(n, p) is approximately normal N(np, np(1-p)) for large n and for p not too close to zero or one.
    • The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
    • The chi-squared distribution Χ2(k) is approximately normal N(k, 2k) for large k.
    • The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.
    Author:
    Haifeng Li
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double cdf​(double x)
      Cumulative distribution function.
      double entropy()
      Shannon entropy of the distribution.
      static GaussianDistribution getInstance()  
      double logp​(double x)
      The density at x in log scale, which may prevents the underflow problem.
      Mixture.Component M​(double[] x, double[] posteriori)
      The M step in the EM algorithm, which depends the specific distribution.
      double mean()
      The mean of distribution.
      int npara()
      The number of parameters of the distribution.
      double p​(double x)
      The probability density function for continuous distribution or probability mass function for discrete distribution at x.
      double quantile​(double p)
      The quantile, the probability to the left of quantile(p) is p.
      double rand()
      Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.
      double randInverseCDF()
      Uses Inverse CDF method to generate a Gaussian deviate.
      double sd()
      The standard deviation of distribution.
      java.lang.String toString()  
      double var()
      The variance of distribution.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • GaussianDistribution

        public GaussianDistribution​(double mu,
                                    double sigma)
        Constructor
        Parameters:
        mu - mean.
        sigma - standard deviation.
      • GaussianDistribution

        public GaussianDistribution​(double[] data)
        Constructor. Mean and standard deviation will be estimated from the data by MLE.
    • Method Detail

      • npara

        public int npara()
        Description copied from interface: Distribution
        The number of parameters of the distribution.
        Specified by:
        npara in interface Distribution
      • mean

        public double mean()
        Description copied from interface: Distribution
        The mean of distribution.
        Specified by:
        mean in interface Distribution
      • var

        public double var()
        Description copied from interface: Distribution
        The variance of distribution.
        Specified by:
        var in interface Distribution
      • sd

        public double sd()
        Description copied from interface: Distribution
        The standard deviation of distribution.
        Specified by:
        sd in interface Distribution
      • entropy

        public double entropy()
        Description copied from interface: Distribution
        Shannon entropy of the distribution.
        Specified by:
        entropy in interface Distribution
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • rand

        public double rand()
        Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.
        Specified by:
        rand in interface Distribution
      • randInverseCDF

        public double randInverseCDF()
        Uses Inverse CDF method to generate a Gaussian deviate.
      • p

        public double p​(double x)
        Description copied from interface: Distribution
        The probability density function for continuous distribution or probability mass function for discrete distribution at x.
        Specified by:
        p in interface Distribution
      • logp

        public double logp​(double x)
        Description copied from interface: Distribution
        The density at x in log scale, which may prevents the underflow problem.
        Specified by:
        logp in interface Distribution
      • cdf

        public double cdf​(double x)
        Description copied from interface: Distribution
        Cumulative distribution function. That is the probability to the left of x.
        Specified by:
        cdf in interface Distribution
      • quantile

        public double quantile​(double p)
        The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf. Original algorythm and Perl implementation can be found at: http://www.math.uio.no/~jacklam/notes/invnorm/index.html
        Specified by:
        quantile in interface Distribution
      • M

        public Mixture.Component M​(double[] x,
                                   double[] posteriori)
        Description copied from interface: ExponentialFamily
        The M step in the EM algorithm, which depends the specific distribution.
        Specified by:
        M in interface ExponentialFamily
        Parameters:
        x - the input data for estimation
        posteriori - the posteriori probability.
        Returns:
        the (unnormalized) weight of this distribution in the mixture.