Class GaussianDistribution
- java.lang.Object
-
- smile.stat.distribution.AbstractDistribution
-
- smile.stat.distribution.GaussianDistribution
-
- All Implemented Interfaces:
java.io.Serializable
,Distribution
,ExponentialFamily
public class GaussianDistribution extends AbstractDistribution implements ExponentialFamily
The normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe any variable that tends to cluster around the mean.The family of normal distributions is closed under linear transformations. That is, if X is normally distributed, then a linear transform aX + b (for some real numbers a ≠ 0 and b) is also normally distributed. If X1, X2 are two independent normal random variables, then their linear combination will also be normally distributed. The converse is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal, which is known as the Cramer's theorem. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the one with the maximum entropy.
The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that
√n (1⁄n Σ Xi - μ) → N(0, σ2).
The theorem will hold even if the summands Xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.
Therefore, certain other distributions can be approximated by the normal distribution, for example:
- The binomial distribution B(n, p) is approximately normal N(np, np(1-p)) for large n and for p not too close to zero or one.
- The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
- The chi-squared distribution Χ2(k) is approximately normal N(k, 2k) for large k.
- The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.
- Author:
- Haifeng Li
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description GaussianDistribution(double[] data)
Constructor.GaussianDistribution(double mu, double sigma)
Constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double
cdf(double x)
Cumulative distribution function.double
entropy()
Shannon entropy of the distribution.static GaussianDistribution
getInstance()
double
logp(double x)
The density at x in log scale, which may prevents the underflow problem.Mixture.Component
M(double[] x, double[] posteriori)
The M step in the EM algorithm, which depends the specific distribution.double
mean()
The mean of distribution.int
npara()
The number of parameters of the distribution.double
p(double x)
The probability density function for continuous distribution or probability mass function for discrete distribution at x.double
quantile(double p)
The quantile, the probability to the left of quantile(p) is p.double
rand()
Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.double
randInverseCDF()
Uses Inverse CDF method to generate a Gaussian deviate.double
sd()
The standard deviation of distribution.java.lang.String
toString()
double
var()
The variance of distribution.-
Methods inherited from class smile.stat.distribution.AbstractDistribution
inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rejection
-
-
-
-
Constructor Detail
-
GaussianDistribution
public GaussianDistribution(double mu, double sigma)
Constructor- Parameters:
mu
- mean.sigma
- standard deviation.
-
GaussianDistribution
public GaussianDistribution(double[] data)
Constructor. Mean and standard deviation will be estimated from the data by MLE.
-
-
Method Detail
-
getInstance
public static GaussianDistribution getInstance()
-
npara
public int npara()
Description copied from interface:Distribution
The number of parameters of the distribution.- Specified by:
npara
in interfaceDistribution
-
mean
public double mean()
Description copied from interface:Distribution
The mean of distribution.- Specified by:
mean
in interfaceDistribution
-
var
public double var()
Description copied from interface:Distribution
The variance of distribution.- Specified by:
var
in interfaceDistribution
-
sd
public double sd()
Description copied from interface:Distribution
The standard deviation of distribution.- Specified by:
sd
in interfaceDistribution
-
entropy
public double entropy()
Description copied from interface:Distribution
Shannon entropy of the distribution.- Specified by:
entropy
in interfaceDistribution
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
rand
public double rand()
Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.- Specified by:
rand
in interfaceDistribution
-
randInverseCDF
public double randInverseCDF()
Uses Inverse CDF method to generate a Gaussian deviate.
-
p
public double p(double x)
Description copied from interface:Distribution
The probability density function for continuous distribution or probability mass function for discrete distribution at x.- Specified by:
p
in interfaceDistribution
-
logp
public double logp(double x)
Description copied from interface:Distribution
The density at x in log scale, which may prevents the underflow problem.- Specified by:
logp
in interfaceDistribution
-
cdf
public double cdf(double x)
Description copied from interface:Distribution
Cumulative distribution function. That is the probability to the left of x.- Specified by:
cdf
in interfaceDistribution
-
quantile
public double quantile(double p)
The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf. Original algorythm and Perl implementation can be found at: http://www.math.uio.no/~jacklam/notes/invnorm/index.html- Specified by:
quantile
in interfaceDistribution
-
M
public Mixture.Component M(double[] x, double[] posteriori)
Description copied from interface:ExponentialFamily
The M step in the EM algorithm, which depends the specific distribution.- Specified by:
M
in interfaceExponentialFamily
- Parameters:
x
- the input data for estimationposteriori
- the posteriori probability.- Returns:
- the (unnormalized) weight of this distribution in the mixture.
-
-