Class EM

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Clusterer, DensityBasedClusterer, NumberOfClustersRequestable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, WeightedInstancesHandler

    public class EM
    extends RandomizableDensityBasedClusterer
    implements NumberOfClustersRequestable, WeightedInstancesHandler
    Simple EM (expectation maximisation) class.

    EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.

    The cross validation performed to determine the number of clusters is done in the following steps:
    1. the number of clusters is set to 1
    2. the training set is split randomly into 10 folds.
    3. EM is performed 10 times using the 10 folds the usual CV way.
    4. the loglikelihood is averaged over all 10 results.
    5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.

    The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances.

    Valid options are:

     -N <num>
      number of clusters. If omitted or -1 specified, then 
      cross validation is used to select the number of clusters.
     -I <num>
      max iterations.
     (default 100)
     -V
      verbose.
     -M <num>
      minimum allowable standard deviation for normal density
      computation
      (default 1e-6)
     -O
      Display model in old format (good when there are many clusters)
     
     -S <num>
      Random number seed.
      (default 100)
    Version:
    $Revision: 9988 $
    Author:
    Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • EM

        public EM()
        Constructor.
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this clusterer
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -N <num>
          number of clusters. If omitted or -1 specified, then 
          cross validation is used to select the number of clusters.
         -I <num>
          max iterations.
         (default 100)
         -V
          verbose.
         -M <num>
          minimum allowable standard deviation for normal density
          computation
          (default 1e-6)
         -O
          Display model in old format (good when there are many clusters)
         
         -S <num>
          Random number seed.
          (default 100)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableDensityBasedClusterer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • displayModelInOldFormatTipText

        public java.lang.String displayModelInOldFormatTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDisplayModelInOldFormat

        public void setDisplayModelInOldFormat​(boolean d)
        Set whether to display model output in the old, original format.
        Parameters:
        d - true if model ouput is to be shown in the old format
      • getDisplayModelInOldFormat

        public boolean getDisplayModelInOldFormat()
        Get whether to display model output in the old, original format.
        Returns:
        true if model ouput is to be shown in the old format
      • minStdDevTipText

        public java.lang.String minStdDevTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMinStdDev

        public void setMinStdDev​(double m)
        Set the minimum value for standard deviation when calculating normal density. Reducing this value can help prevent arithmetic overflow resulting from multiplying large densities (arising from small standard deviations) when there are many singleton or near singleton values.
        Parameters:
        m - minimum value for standard deviation
      • setMinStdDevPerAtt

        public void setMinStdDevPerAtt​(double[] m)
      • getMinStdDev

        public double getMinStdDev()
        Get the minimum allowable standard deviation.
        Returns:
        the minumum allowable standard deviation
      • numClustersTipText

        public java.lang.String numClustersTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumClusters

        public void setNumClusters​(int n)
                            throws java.lang.Exception
        Set the number of clusters (-1 to select by CV).
        Specified by:
        setNumClusters in interface NumberOfClustersRequestable
        Parameters:
        n - the number of clusters
        Throws:
        java.lang.Exception - if n is 0
      • getNumClusters

        public int getNumClusters()
        Get the number of clusters
        Returns:
        the number of clusters.
      • maxIterationsTipText

        public java.lang.String maxIterationsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaxIterations

        public void setMaxIterations​(int i)
                              throws java.lang.Exception
        Set the maximum number of iterations to perform
        Parameters:
        i - the number of iterations
        Throws:
        java.lang.Exception - if i is less than 1
      • getMaxIterations

        public int getMaxIterations()
        Get the maximum number of iterations
        Returns:
        the number of iterations
      • debugTipText

        public java.lang.String debugTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebug

        public void setDebug​(boolean v)
        Set debug mode - verbose output
        Parameters:
        v - true for verbose output
      • getDebug

        public boolean getDebug()
        Get debug mode
        Returns:
        true if debug mode is set
      • getClusterModelsNumericAtts

        public double[][][] getClusterModelsNumericAtts()
        Return the normal distributions for the cluster models
        Returns:
        a double[][][] value
      • getClusterPriors

        public double[] getClusterPriors()
        Return the priors for the clusters
        Returns:
        a double[] value
      • toString

        public java.lang.String toString()
        Outputs the generated clusters into a string.
        Overrides:
        toString in class java.lang.Object
        Returns:
        the clusterer in string representation
      • numberOfClusters

        public int numberOfClusters()
                             throws java.lang.Exception
        Returns the number of clusters.
        Specified by:
        numberOfClusters in interface Clusterer
        Specified by:
        numberOfClusters in class AbstractClusterer
        Returns:
        the number of clusters generated for a training dataset.
        Throws:
        java.lang.Exception - if number of clusters could not be returned successfully
      • buildClusterer

        public void buildClusterer​(Instances data)
                            throws java.lang.Exception
        Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
        Specified by:
        buildClusterer in interface Clusterer
        Specified by:
        buildClusterer in class AbstractClusterer
        Parameters:
        data - set of instances serving as training data
        Throws:
        java.lang.Exception - if the clusterer has not been generated successfully
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain the following arguments:

        -t training file [-T test file] [-N number of clusters] [-S random seed]