Package weka.classifiers
Class BVDecomposeSegCVSub
- java.lang.Object
-
- weka.classifiers.BVDecomposeSegCVSub
-
- All Implemented Interfaces:
OptionHandler
,RevisionHandler
,TechnicalInformationHandler
public class BVDecomposeSegCVSub extends java.lang.Object implements OptionHandler, TechnicalInformationHandler, RevisionHandler
This class performs Bias-Variance decomposion on any classifier using the sub-sampled cross-validation procedure as specified in (1).
The Kohavi and Wolpert definition of bias and variance is specified in (2).
The Webb definition of bias and variance is specified in (3).
Geoffrey I. Webb, Paul Conilione (2002). Estimating bias and variance from data. School of Computer Science and Software Engineering, Victoria, Australia.
Ron Kohavi, David H. Wolpert: Bias Plus Variance Decomposition for Zero-One Loss Functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, 275-283, 1996.
Geoffrey I. Webb (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning. 40(2):159-196. BibTeX:@misc{Webb2002, address = {School of Computer Science and Software Engineering, Victoria, Australia}, author = {Geoffrey I. Webb and Paul Conilione}, institution = {Monash University}, title = {Estimating bias and variance from data}, year = {2002}, PDF = {http://www.csse.monash.edu.au/\~webb/Files/WebbConilione04.pdf} } @inproceedings{Kohavi1996, author = {Ron Kohavi and David H. Wolpert}, booktitle = {Machine Learning: Proceedings of the Thirteenth International Conference}, editor = {Lorenza Saitta}, pages = {275-283}, publisher = {Morgan Kaufmann}, title = {Bias Plus Variance Decomposition for Zero-One Loss Functions}, year = {1996}, PS = {http://robotics.stanford.edu/\~ronnyk/biasVar.ps} } @article{Webb2000, author = {Geoffrey I. Webb}, journal = {Machine Learning}, number = {2}, pages = {159-196}, title = {MultiBoosting: A Technique for Combining Boosting and Wagging}, volume = {40}, year = {2000} }
Valid options are:-c <class index> The index of the class attribute. (default last)
-D Turn on debugging output.
-l <num> The number of times each instance is classified. (default 10)
-p <proportion of objects in common> The average proportion of instances common between any two training sets
-s <seed> The random number seed used.
-t <name of arff file> The name of the arff file used for the decomposition.
-T <number of instances in training set> The number of instances in the training set.
-W <classifier class name> Full class name of the learner used in the decomposition. eg: weka.classifiers.bayes.NaiveBayes
Options specific to learner weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the console
Options after -- are passed to the designated sub-learner.- Version:
- $Revision: 1.7 $
- Author:
- Paul Conilione (paulc4321@yahoo.com.au)
-
-
Constructor Summary
Constructors Constructor Description BVDecomposeSegCVSub()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
decompose()
Carry out the bias-variance decomposition using the sub-sampled cross-validation method.java.util.Vector
findCentralTendencies(double[] predProbs)
Finds the central tendency, given the classifications for an instance.Classifier
getClassifier()
Gets the name of the classifier being analysedint
getClassifyIterations()
Gets the number of times an instance is classifiedint
getClassIndex()
Get the index (starting from 1) of the attribute used as the class.java.lang.String
getDataFileName()
Get the name of the data file used for the decompositionboolean
getDebug()
Gets whether debugging is turned ondouble
getError()
Get the calculated error ratedouble
getKWBias()
Get the calculated bias squared according to the Kohavi and Wolpert definitiondouble
getKWSigma()
Get the calculated sigma according to the Kohavi and Wolpert definitiondouble
getKWVariance()
Get the calculated variance according to the Kohavi and Wolpert definitionjava.lang.String[]
getOptions()
Gets the current settings of the CheckClassifier.double
getP()
Get the proportion of instances that are common between two training sets.java.lang.String
getRevision()
Returns the revision string.int
getSeed()
Gets the random number seedTechnicalInformation
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.int
getTrainSize()
Get the training sizedouble
getWBias()
Get the calculated bias according to the Webb definitiondouble
getWVariance()
Get the calculated variance according to the Webb definitionjava.lang.String
globalInfo()
Returns a string describing this objectjava.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] args)
Test method for this classvoid
randomize(int[] index, java.util.Random random)
Accepts an array of ints and randomises the values in the array, using the random seed.void
setClassifier(Classifier newClassifier)
Set the classifiers being analysedvoid
setClassifyIterations(int classifyIterations)
Sets the number of times an instance is classifiedvoid
setClassIndex(int classIndex)
Sets index of attribute to discretize onvoid
setDataFileName(java.lang.String dataFileName)
Sets the name of the dataset file.void
setDebug(boolean debug)
Sets debugging modevoid
setOptions(java.lang.String[] options)
Sets the OptionHandler's options using the given list.void
setP(double proportion)
Set the proportion of instances that are common between two training sets used to train a classifier.void
setSeed(int seed)
Sets the random number seedvoid
setTrainSize(int size)
Set the training size.java.lang.String
toString()
Returns description of the bias-variance decomposition results.
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this object- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Returns:
- an enumeration of all the available options.
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible). Valid options are:-c <class index> The index of the class attribute. (default last)
-D Turn on debugging output.
-l <num> The number of times each instance is classified. (default 10)
-p <proportion of objects in common> The average proportion of instances common between any two training sets
-s <seed> The random number seed used.
-t <name of arff file> The name of the arff file used for the decomposition.
-T <number of instances in training set> The number of instances in the training set.
-W <classifier class name> Full class name of the learner used in the decomposition. eg: weka.classifiers.bayes.NaiveBayes
Options specific to learner weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the console
- Specified by:
setOptions
in interfaceOptionHandler
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the CheckClassifier.- Specified by:
getOptions
in interfaceOptionHandler
- Returns:
- an array of strings suitable for passing to setOptions
-
setClassifier
public void setClassifier(Classifier newClassifier)
Set the classifiers being analysed- Parameters:
newClassifier
- the Classifier to use.
-
getClassifier
public Classifier getClassifier()
Gets the name of the classifier being analysed- Returns:
- the classifier being analysed.
-
setDebug
public void setDebug(boolean debug)
Sets debugging mode- Parameters:
debug
- true if debug output should be printed
-
getDebug
public boolean getDebug()
Gets whether debugging is turned on- Returns:
- true if debugging output is on
-
setSeed
public void setSeed(int seed)
Sets the random number seed- Parameters:
seed
- the random number seed
-
getSeed
public int getSeed()
Gets the random number seed- Returns:
- the random number seed
-
setClassifyIterations
public void setClassifyIterations(int classifyIterations)
Sets the number of times an instance is classified- Parameters:
classifyIterations
- number of times an instance is classified
-
getClassifyIterations
public int getClassifyIterations()
Gets the number of times an instance is classified- Returns:
- the maximum number of times an instance is classified
-
setDataFileName
public void setDataFileName(java.lang.String dataFileName)
Sets the name of the dataset file.- Parameters:
dataFileName
- name of dataset file.
-
getDataFileName
public java.lang.String getDataFileName()
Get the name of the data file used for the decomposition- Returns:
- the name of the data file
-
getClassIndex
public int getClassIndex()
Get the index (starting from 1) of the attribute used as the class.- Returns:
- the index of the class attribute
-
setClassIndex
public void setClassIndex(int classIndex)
Sets index of attribute to discretize on- Parameters:
classIndex
- the index (starting from 1) of the class attribute
-
getKWBias
public double getKWBias()
Get the calculated bias squared according to the Kohavi and Wolpert definition- Returns:
- the bias squared
-
getWBias
public double getWBias()
Get the calculated bias according to the Webb definition- Returns:
- the bias
-
getKWVariance
public double getKWVariance()
Get the calculated variance according to the Kohavi and Wolpert definition- Returns:
- the variance
-
getWVariance
public double getWVariance()
Get the calculated variance according to the Webb definition- Returns:
- the variance according to Webb
-
getKWSigma
public double getKWSigma()
Get the calculated sigma according to the Kohavi and Wolpert definition- Returns:
- the sigma
-
setTrainSize
public void setTrainSize(int size)
Set the training size.- Parameters:
size
- the size of the training set
-
getTrainSize
public int getTrainSize()
Get the training size- Returns:
- the size of the training set
-
setP
public void setP(double proportion)
Set the proportion of instances that are common between two training sets used to train a classifier.- Parameters:
proportion
- the proportion of instances that are common between training sets.
-
getP
public double getP()
Get the proportion of instances that are common between two training sets.- Returns:
- the proportion
-
getError
public double getError()
Get the calculated error rate- Returns:
- the error rate
-
decompose
public void decompose() throws java.lang.Exception
Carry out the bias-variance decomposition using the sub-sampled cross-validation method.- Throws:
java.lang.Exception
- if the decomposition couldn't be carried out
-
findCentralTendencies
public java.util.Vector findCentralTendencies(double[] predProbs)
Finds the central tendency, given the classifications for an instance. Where the central tendency is defined as the class that was most commonly selected for a given instance.For example, instance 'x' may be classified out of 3 classes y = {1, 2, 3}, so if x is classified 10 times, and is classified as follows, '1' = 2 times, '2' = 5 times and '3' = 3 times. Then the central tendency is '2'.
However, it is important to note that this method returns a list of all classes that have the highest number of classifications. In cases where there are several classes with the largest number of classifications, then all of these classes are returned. For example if 'x' is classified '1' = 4 times, '2' = 4 times and '3' = 2 times. Then '1' and '2' are returned.
- Parameters:
predProbs
- the array of classifications for a single instance.- Returns:
- a Vector containing Integer objects which store the class(s) which are the central tendency.
-
toString
public java.lang.String toString()
Returns description of the bias-variance decomposition results.- Overrides:
toString
in classjava.lang.Object
- Returns:
- the bias-variance decomposition results as a string
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Test method for this class- Parameters:
args
- the command line arguments
-
randomize
public final void randomize(int[] index, java.util.Random random)
Accepts an array of ints and randomises the values in the array, using the random seed.- Parameters:
index
- is the array of integersrandom
- is the Random seed.
-
-