Class SubspaceCluster

  • All Implemented Interfaces:
    java.io.Serializable, OptionHandler, Randomizable, RevisionHandler

    public class SubspaceCluster
    extends ClusterGenerator
    A data generator that produces data points in hyperrectangular subspace clusters.

    Valid options are:

     -h
      Prints this help.
     -o <file>
      The name of the output file, otherwise the generated data is
      printed to stdout.
     -r <name>
      The name of the relation.
     -d
      Whether to print debug informations.
     -S
      The seed for random function (default 1)
     -a <num>
      The number of attributes (default 1).
     -c
      Class Flag, if set, the cluster is listed in extra attribute.
     -b <range>
      The indices for boolean attributes.
     -m <range>
      The indices for nominal attributes.
     -P <num>
      The noise rate in percent (default 0.0).
      Can be between 0% and 30%. (Remark: The original 
      algorithm only allows noise up to 10%.)
     -C <cluster-definition>
      A cluster definition of class 'SubspaceClusterDefinition'
      (definition needs to be quoted to be recognized as 
      a single argument).
     
     Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
     
     -A <range>
      Generates randomly distributed instances in the cluster.
     -U <range>
      Generates uniformly distributed instances in the cluster.
     -G <range>
      Generates gaussian distributed instances in the cluster.
     -D <num>,<num>
      The attribute min/max (-A and -U) or mean/stddev (-G) for
      the cluster.
     -N <num>..<num>
      The range of number of instances per cluster (default 1..50).
     -I
      Uses integer instead of continuous values (default continuous).
    Version:
    $Revision: 1.5 $
    Author:
    Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • UNIFORM_RANDOM

        public static final int UNIFORM_RANDOM
        cluster type: uniform/random
        See Also:
        Constant Field Values
      • TOTAL_UNIFORM

        public static final int TOTAL_UNIFORM
        cluster type: total uniform
        See Also:
        Constant Field Values
      • TAGS_CLUSTERTYPE

        public static final Tag[] TAGS_CLUSTERTYPE
        the tags for the cluster types
      • CONTINUOUS

        public static final int CONTINUOUS
        cluster subtype: continuous
        See Also:
        Constant Field Values
      • TAGS_CLUSTERSUBTYPE

        public static final Tag[] TAGS_CLUSTERSUBTYPE
        the tags for the cluster types
    • Constructor Detail

      • SubspaceCluster

        public SubspaceCluster()
        initializes the generator, sets the number of clusters to 0, since user has to specify them explicitly
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this data generator.
        Returns:
        a description of the data generator suitable for displaying in the explorer/experimenter gui
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class ClusterGenerator
        Returns:
        an enumeration of all the available options
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a list of options for this object.

        Valid options are:

         -h
          Prints this help.
         -o <file>
          The name of the output file, otherwise the generated data is
          printed to stdout.
         -r <name>
          The name of the relation.
         -d
          Whether to print debug informations.
         -S
          The seed for random function (default 1)
         -a <num>
          The number of attributes (default 1).
         -c
          Class Flag, if set, the cluster is listed in extra attribute.
         -b <range>
          The indices for boolean attributes.
         -m <range>
          The indices for nominal attributes.
         -P <num>
          The noise rate in percent (default 0.0).
          Can be between 0% and 30%. (Remark: The original 
          algorithm only allows noise up to 10%.)
         -C <cluster-definition>
          A cluster definition of class 'SubspaceClusterDefinition'
          (definition needs to be quoted to be recognized as 
          a single argument).
         
         Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
         
         -A <range>
          Generates randomly distributed instances in the cluster.
         -U <range>
          Generates uniformly distributed instances in the cluster.
         -G <range>
          Generates gaussian distributed instances in the cluster.
         -D <num>,<num>
          The attribute min/max (-A and -U) or mean/stddev (-G) for
          the cluster.
         -N <num>..<num>
          The range of number of instances per cluster (default 1..50).
         -I
          Uses integer instead of continuous values (default continuous).
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class ClusterGenerator
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • setNumAttributes

        public void setNumAttributes​(int numAttributes)
        Sets the number of attributes the dataset should have.
        Overrides:
        setNumAttributes in class ClusterGenerator
        Parameters:
        numAttributes - the new number of attributes
      • numAttributesTipText

        public java.lang.String numAttributesTipText()
        Returns the tip text for this property
        Overrides:
        numAttributesTipText in class ClusterGenerator
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getNoiseRate

        public double getNoiseRate()
        Gets the percentage of noise set.
        Returns:
        the percentage of noise set
      • setNoiseRate

        public void setNoiseRate​(double newNoiseRate)
        Sets the percentage of noise set.
        Parameters:
        newNoiseRate - new percentage of noise
      • noiseRateTipText

        public java.lang.String noiseRateTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getClusterDefinitions

        public ClusterDefinition[] getClusterDefinitions()
        returns the currently set clusters
        Returns:
        the currently set clusters
      • setClusterDefinitions

        public void setClusterDefinitions​(ClusterDefinition[] value)
                                   throws java.lang.Exception
        sets the clusters to use
        Parameters:
        value - the clusters do use
        Throws:
        java.lang.Exception - if clusters are not the correct class
      • clusterDefinitionsTipText

        public java.lang.String clusterDefinitionsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getSingleModeFlag

        public boolean getSingleModeFlag()
        Gets the single mode flag.
        Specified by:
        getSingleModeFlag in class DataGenerator
        Returns:
        true if methode generateExample can be used.
      • isBoolean

        public boolean isBoolean​(int index)
        Returns true if attribute is boolean
        Parameters:
        index - of the attribute
        Returns:
        true if the attribute is boolean
      • isNominal

        public boolean isNominal​(int index)
        Returns true if attribute is nominal
        Parameters:
        index - of the attribute
        Returns:
        true if the attribute is nominal
      • getNumValues

        public int[] getNumValues()
        returns array that stores the number of values for a nominal attribute.
        Returns:
        the array that stores the number of values for a nominal attribute
      • generateExample

        public Instance generateExample()
                                 throws java.lang.Exception
        Generate an example of the dataset.
        Specified by:
        generateExample in class DataGenerator
        Returns:
        the instance generated
        Throws:
        java.lang.Exception - if format not defined or generating
        examples one by one is not possible, because voting is chosen
      • generateExamples

        public Instances generateExamples()
                                   throws java.lang.Exception
        Generate all examples of the dataset.
        Specified by:
        generateExamples in class DataGenerator
        Returns:
        the instance generated
        Throws:
        java.lang.Exception - if format not defined
      • generateFinished

        public java.lang.String generateFinished()
                                          throws java.lang.Exception
        Compiles documentation about the data generation after the generation process
        Specified by:
        generateFinished in class DataGenerator
        Returns:
        string with additional information about generated dataset
        Throws:
        java.lang.Exception - no input structure has been defined
      • generateStart

        public java.lang.String generateStart()
        Compiles documentation about the data generation before the generation process
        Specified by:
        generateStart in class DataGenerator
        Returns:
        string with additional information
      • getRevision

        public java.lang.String getRevision()
        Returns the revision string.
        Returns:
        the revision
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        args - should contain arguments for the data producer: