Class MetaCost

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class MetaCost
    extends RandomizableSingleClassifierEnhancer
    implements TechnicalInformationHandler
    This metaclassifier makes its base classifier cost-sensitive using the method specified in

    Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999.

    This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

    BibTeX:

     @inproceedings{Domingos1999,
        author = {Pedro Domingos},
        booktitle = {Fifth International Conference on Knowledge Discovery and Data Mining},
        pages = {155-164},
        title = {MetaCost: A general method for making classifiers cost-sensitive},
        year = {1999}
     }
     

    Valid options are:

     -I <num>
      Number of bagging iterations.
      (default 10)
     -C <cost file name>
      File name of a cost matrix to use. If this is not supplied,
      a cost matrix will be loaded on demand. The name of the
      on-demand file is the relation name of the training data
      plus ".cost", and the path to the on-demand file is
      specified with the -N option.
     -N <directory>
      Name of a directory to search for cost files when loading
      costs on demand (default current directory).
     -cost-matrix <matrix>
      The cost matrix in Matlab single line format.
     -P
      Size of each bag, as a percentage of the
      training set size. (default 100)
     -S <num>
      Random number seed.
      (default 1)
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
     -W
      Full name of base classifier.
      (default: weka.classifiers.rules.ZeroR)
     
     Options specific to classifier weka.classifiers.rules.ZeroR:
     
     -D
      If set, classifier is run in debug mode and
      may output additional info to the console
    Options after -- are passed to the designated classifier.

    Version:
    $Revision: 1.24 $
    Author:
    Len Trigg (len@reeltwo.com)
    See Also:
    Serialized Form
    • Field Detail

      • MATRIX_ON_DEMAND

        public static final int MATRIX_ON_DEMAND
        load cost matrix on demand
        See Also:
        Constant Field Values
      • MATRIX_SUPPLIED

        public static final int MATRIX_SUPPLIED
        use explicit matrix
        See Also:
        Constant Field Values
      • TAGS_MATRIX_SOURCE

        public static final Tag[] TAGS_MATRIX_SOURCE
        Specify possible sources of the cost matrix
    • Constructor Detail

      • MetaCost

        public MetaCost()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing classifier
        Returns:
        a description suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -I <num>
          Number of bagging iterations.
          (default 10)
         -C <cost file name>
          File name of a cost matrix to use. If this is not supplied,
          a cost matrix will be loaded on demand. The name of the
          on-demand file is the relation name of the training data
          plus ".cost", and the path to the on-demand file is
          specified with the -N option.
         -N <directory>
          Name of a directory to search for cost files when loading
          costs on demand (default current directory).
         -cost-matrix <matrix>
          The cost matrix in Matlab single line format.
         -P
          Size of each bag, as a percentage of the
          training set size. (default 100)
         -S <num>
          Random number seed.
          (default 1)
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
         -W
          Full name of base classifier.
          (default: weka.classifiers.rules.ZeroR)
         
         Options specific to classifier weka.classifiers.rules.ZeroR:
         
         -D
          If set, classifier is run in debug mode and
          may output additional info to the console
        Options after -- are passed to the designated classifier.

        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableSingleClassifierEnhancer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • costMatrixSourceTipText

        public java.lang.String costMatrixSourceTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getCostMatrixSource

        public SelectedTag getCostMatrixSource()
        Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.
        Returns:
        the cost matrix source.
      • setCostMatrixSource

        public void setCostMatrixSource​(SelectedTag newMethod)
        Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.
        Parameters:
        newMethod - the cost matrix location method.
      • onDemandDirectoryTipText

        public java.lang.String onDemandDirectoryTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getOnDemandDirectory

        public java.io.File getOnDemandDirectory()
        Returns the directory that will be searched for cost files when loading on demand.
        Returns:
        The cost file search directory.
      • setOnDemandDirectory

        public void setOnDemandDirectory​(java.io.File newDir)
        Sets the directory that will be searched for cost files when loading on demand.
        Parameters:
        newDir - The cost file search directory.
      • bagSizePercentTipText

        public java.lang.String bagSizePercentTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getBagSizePercent

        public int getBagSizePercent()
        Gets the size of each bag, as a percentage of the training set size.
        Returns:
        the bag size, as a percentage.
      • setBagSizePercent

        public void setBagSizePercent​(int newBagSizePercent)
        Sets the size of each bag, as a percentage of the training set size.
        Parameters:
        newBagSizePercent - the bag size, as a percentage.
      • numIterationsTipText

        public java.lang.String numIterationsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumIterations

        public void setNumIterations​(int numIterations)
        Sets the number of bagging iterations
        Parameters:
        numIterations - the number of iterations to use
      • getNumIterations

        public int getNumIterations()
        Gets the number of bagging iterations
        Returns:
        the maximum number of bagging iterations
      • costMatrixTipText

        public java.lang.String costMatrixTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getCostMatrix

        public CostMatrix getCostMatrix()
        Gets the misclassification cost matrix.
        Returns:
        the cost matrix
      • setCostMatrix

        public void setCostMatrix​(CostMatrix newCostMatrix)
        Sets the misclassification cost matrix.
        Parameters:
        newCostMatrix - the cost matrix
      • buildClassifier

        public void buildClassifier​(Instances data)
                             throws java.lang.Exception
        Builds the model of the base learner.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        data - the training data
        Throws:
        java.lang.Exception - if the classifier could not be built successfully
      • distributionForInstance

        public double[] distributionForInstance​(Instance instance)
                                         throws java.lang.Exception
        Classifies a given instance after filtering.
        Overrides:
        distributionForInstance in class Classifier
        Parameters:
        instance - the instance to be classified
        Returns:
        the class distribution for the given instance
        Throws:
        java.lang.Exception - if instance could not be classified successfully
      • toString

        public java.lang.String toString()
        Output a representation of this classifier
        Overrides:
        toString in class java.lang.Object
        Returns:
        a string representaiton of the classifier
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - should contain the following arguments: -t training file [-T test file] [-c class index]