Package weka.classifiers.rules
Class JRip
- java.lang.Object
-
- weka.classifiers.Classifier
-
- weka.classifiers.rules.JRip
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
,AdditionalMeasureProducer
,CapabilitiesHandler
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
,WeightedInstancesHandler
public class JRip extends Classifier implements AdditionalMeasureProducer, WeightedInstancesHandler, TechnicalInformationHandler
This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP.
The algorithm is briefly described as follows:
Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:
1. Building stage:
Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.
1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).
1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).
2. Optimization stage:
after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.
3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
ENDDO
Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.
Details please see:
William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.
PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.
BibTeX:@inproceedings{Cohen1995, author = {William W. Cohen}, booktitle = {Twelfth International Conference on Machine Learning}, pages = {115-123}, publisher = {Morgan Kaufmann}, title = {Fast Effective Rule Induction}, year = {1995} }
Valid options are:-F <number of folds> Set number of folds for REP One fold is used as pruning set. (default 3)
-N <min. weights> Set the minimal weights of instances within a split. (default 2.0)
-O <number of runs> Set the number of runs of optimizations. (Default: 2)
-D Set whether turn on the debug mode (Default: false)
-S <seed> The seed of randomization (Default: 1)
-E Whether NOT check the error rate>=0.5 in stopping criteria (default: check)
-P Whether NOT use pruning (default: use pruning)
- Version:
- $Revision: 8119 $
- Author:
- Xin Xu (xx5@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
JRip.Antd
The single antecedent in the rule, which is composed of an attribute and the corresponding value.class
JRip.NominalAntd
The antecedent with nominal attributeclass
JRip.NumericAntd
The antecedent with numeric attributeclass
JRip.RipperRule
This class implements a single rule that predicts specified class.
-
Constructor Summary
Constructors Constructor Description JRip()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
buildClassifier(Instances instances)
Builds Ripper in the order of class frequencies.java.lang.String
checkErrorRateTipText()
Returns the tip text for this propertyjava.lang.String
debugTipText()
Returns the tip text for this propertydouble[]
distributionForInstance(Instance datum)
Classify the test instance with the rule learner and provide the class distributionsjava.util.Enumeration
enumerateMeasures()
Returns an enumeration of the additional measure namesjava.lang.String
foldsTipText()
Returns the tip text for this propertyCapabilities
getCapabilities()
Returns default capabilities of the classifier.boolean
getCheckErrorRate()
Gets whether to check for error rate is in stopping criterionboolean
getDebug()
Gets whether debug information is output to the consoleint
getFolds()
Gets the number of foldsdouble
getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measuredouble
getMinNo()
Gets the minimum total weight of the instances in a ruleint
getOptimizations()
Gets the the number of optimization runsjava.lang.String[]
getOptions()
Gets the current settings of the Classifier.java.lang.String
getRevision()
Returns the revision string.FastVector
getRuleset()
Get the ruleset generated by RipperRuleStats
getRuleStats(int pos)
Get the statistics of the ruleset in the given positionlong
getSeed()
Gets the current seed value to use in randomizing the dataTechnicalInformation
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
getUsePruning()
Gets whether pruning is performedjava.lang.String
globalInfo()
Returns a string describing classifierjava.util.Enumeration
listOptions()
Returns an enumeration describing the available options Valid options are:static void
main(java.lang.String[] args)
Main method.java.lang.String
minNoTipText()
Returns the tip text for this propertyjava.lang.String
optimizationsTipText()
Returns the tip text for this propertyjava.lang.String
seedTipText()
Returns the tip text for this propertyvoid
setCheckErrorRate(boolean d)
Sets whether to check for error rate is in stopping criterionvoid
setDebug(boolean d)
Sets whether debug information is output to the consolevoid
setFolds(int fold)
Sets the number of folds to usevoid
setMinNo(double m)
Sets the minimum total weight of the instances in a rulevoid
setOptimizations(int run)
Sets the number of optimization runsvoid
setOptions(java.lang.String[] options)
Parses a given list of options.void
setSeed(long s)
Sets the seed value to use in randomizing the datavoid
setUsePruning(boolean d)
Sets whether pruning is performedjava.lang.String
toString()
Prints the all the rules of the rule learner.java.lang.String
usePruningTipText()
Returns the tip text for this property-
Methods inherited from class weka.classifiers.Classifier
classifyInstance, forName, makeCopies, makeCopy
-
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing classifier- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options Valid options are:-F number
The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)-N number
The minimal weights of instances within a split. (Default: 2)-O number
Set the number of runs of optimizations. (Default: 2)-D
Whether turn on the debug mode -S number
The seed of randomization used in Ripper.(Default: 1)-E
Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)-P
Whether NOT use pruning. (default: use pruning)- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classClassifier
- Returns:
- an enumeration of all the available options
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a given list of options. Valid options are:-F <number of folds> Set number of folds for REP One fold is used as pruning set. (default 3)
-N <min. weights> Set the minimal weights of instances within a split. (default 2.0)
-O <number of runs> Set the number of runs of optimizations. (Default: 2)
-D Set whether turn on the debug mode (Default: false)
-S <seed> The seed of randomization (Default: 1)
-E Whether NOT check the error rate>=0.5 in stopping criteria (default: check)
-P Whether NOT use pruning (default: use pruning)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classClassifier
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classClassifier
- Returns:
- an array of strings suitable for passing to setOptions
-
enumerateMeasures
public java.util.Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names- Specified by:
enumerateMeasures
in interfaceAdditionalMeasureProducer
- Returns:
- an enumeration of the measure names
-
getMeasure
public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure- Specified by:
getMeasure
in interfaceAdditionalMeasureProducer
- Parameters:
additionalMeasureName
- the name of the measure to query for its value- Returns:
- the value of the named measure
- Throws:
java.lang.IllegalArgumentException
- if the named measure is not supported
-
foldsTipText
public java.lang.String foldsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setFolds
public void setFolds(int fold)
Sets the number of folds to use- Parameters:
fold
- the number of folds
-
getFolds
public int getFolds()
Gets the number of folds- Returns:
- the number of folds
-
minNoTipText
public java.lang.String minNoTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMinNo
public void setMinNo(double m)
Sets the minimum total weight of the instances in a rule- Parameters:
m
- the minimum total weight of the instances in a rule
-
getMinNo
public double getMinNo()
Gets the minimum total weight of the instances in a rule- Returns:
- the minimum total weight of the instances in a rule
-
seedTipText
public java.lang.String seedTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(long s)
Sets the seed value to use in randomizing the data- Parameters:
s
- the new seed value
-
getSeed
public long getSeed()
Gets the current seed value to use in randomizing the data- Returns:
- the seed value
-
optimizationsTipText
public java.lang.String optimizationsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOptimizations
public void setOptimizations(int run)
Sets the number of optimization runs- Parameters:
run
- the number of optimization runs
-
getOptimizations
public int getOptimizations()
Gets the the number of optimization runs- Returns:
- the number of optimization runs
-
debugTipText
public java.lang.String debugTipText()
Returns the tip text for this property- Overrides:
debugTipText
in classClassifier
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebug
public void setDebug(boolean d)
Sets whether debug information is output to the console- Overrides:
setDebug
in classClassifier
- Parameters:
d
- whether debug information is output to the console
-
getDebug
public boolean getDebug()
Gets whether debug information is output to the console- Overrides:
getDebug
in classClassifier
- Returns:
- whether debug information is output to the console
-
checkErrorRateTipText
public java.lang.String checkErrorRateTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCheckErrorRate
public void setCheckErrorRate(boolean d)
Sets whether to check for error rate is in stopping criterion- Parameters:
d
- whether to check for error rate is in stopping criterion
-
getCheckErrorRate
public boolean getCheckErrorRate()
Gets whether to check for error rate is in stopping criterion- Returns:
- true if checking for error rate is in stopping criterion
-
usePruningTipText
public java.lang.String usePruningTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setUsePruning
public void setUsePruning(boolean d)
Sets whether pruning is performed- Parameters:
d
- Whether pruning is performed
-
getUsePruning
public boolean getUsePruning()
Gets whether pruning is performed- Returns:
- true if pruning is performed
-
getRuleset
public FastVector getRuleset()
Get the ruleset generated by Ripper- Returns:
- the ruleset
-
getRuleStats
public RuleStats getRuleStats(int pos)
Get the statistics of the ruleset in the given position- Parameters:
pos
- the position of the stats, assuming correct- Returns:
- the statistics of the ruleset in the given position
-
getCapabilities
public Capabilities getCapabilities()
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classClassifier
- Returns:
- the capabilities of this classifier
- See Also:
Capabilities
-
buildClassifier
public void buildClassifier(Instances instances) throws java.lang.Exception
Builds Ripper in the order of class frequencies. For each class it's built in two stages: building and optimization- Specified by:
buildClassifier
in classClassifier
- Parameters:
instances
- the training data- Throws:
java.lang.Exception
- if classifier can't be built successfully
-
distributionForInstance
public double[] distributionForInstance(Instance datum)
Classify the test instance with the rule learner and provide the class distributions- Overrides:
distributionForInstance
in classClassifier
- Parameters:
datum
- the instance to be classified- Returns:
- the distribution
-
toString
public java.lang.String toString()
Prints the all the rules of the rule learner.- Overrides:
toString
in classjava.lang.Object
- Returns:
- a textual description of the classifier
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classClassifier
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method.- Parameters:
args
- the options for the classifier
-
-