Package smile.regression
Class RegressionTree
- java.lang.Object
-
- smile.regression.RegressionTree
-
- All Implemented Interfaces:
java.io.Serializable
,Regression<double[]>
public class RegressionTree extends java.lang.Object implements Regression<double[]>
Decision tree for regression. A decision tree can be learned by splitting the training set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.Classification and Regression Tree techniques have a number of advantages over many of those alternative techniques.
- Simple to understand and interpret.
- In most cases, the interpretation of results summarized in a tree is very simple. This simplicity is useful not only for purposes of rapid classification of new observations, but can also often yield a much simpler "model" for explaining why observations are classified or predicted in a particular manner.
- Able to handle both numerical and categorical data.
- Other techniques are usually specialized in analyzing datasets that have only one type of variable.
- Tree methods are nonparametric and nonlinear.
- The final results of using tree methods for classification or regression can be summarized in a series of (usually few) logical if-then conditions (tree nodes). Therefore, there is no implicit assumption that the underlying relationships between the predictor variables and the dependent variable are linear, follow some specific non-linear link function, or that they are even monotonic in nature. Thus, tree methods are particularly well suited for data mining tasks, where there is often little a priori knowledge nor any coherent set of theories or predictions regarding which variables are related and how. In those types of data analytics, tree methods can often reveal simple relationships between just a few variables that could have easily gone unnoticed using other analytic techniques.
Some techniques such as bagging, boosting, and random forest use more than one decision tree for their analysis.
- Author:
- Haifeng Li
- See Also:
GradientTreeBoost
,RandomForest
, Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
RegressionTree.NodeOutput
An interface to calculate node output.static class
RegressionTree.Trainer
Trainer for regression tree.
-
Constructor Summary
Constructors Constructor Description RegressionTree(double[][] x, double[] y, int maxNodes)
Constructor.RegressionTree(double[][] x, double[] y, int maxNodes, int nodeSize)
Constructor.RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes)
Constructor.RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize)
Constructor.RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize, int[] samples, RegressionTree.NodeOutput output)
Constructor.RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes)
Constructor.RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize)
Constructor.RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output)
Constructor.RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output, double[] monotonicRegression)
RegressionTree(AttributeDataset data, int maxNodes)
Constructor.RegressionTree(AttributeDataset data, int maxNodes, int nodeSize)
Constructor.RegressionTree(AttributeDataset data, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output)
Constructor.RegressionTree(AttributeDataset data, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output, double[] monotonicRegression)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
dot()
Returns the graphic representation in Graphviz dot format.smile.regression.RegressionTree.Node
getRoot()
Returs the root node.double[]
importance()
Returns the variable importance.int
maxDepth()
Returns the maximum depth" of the tree -- the number of nodes along the longest path from the root node down to the farthest leaf node.double
predict(double[] x)
Predicts the dependent variable of an instance.double
predict(int[] x)
Predicts the dependent variable of an instance with sparse binary features.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface smile.regression.Regression
predict
-
-
-
-
Constructor Detail
-
RegressionTree
public RegressionTree(double[][] x, double[] y, int maxNodes)
Constructor. Learns a regression tree with (most) given number of leaves. All attributes are assumed to be numeric.- Parameters:
x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.
-
RegressionTree
public RegressionTree(double[][] x, double[] y, int maxNodes, int nodeSize)
Constructor. Learns a regression tree with (most) given number of leaves. All attributes are assumed to be numeric.- Parameters:
x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.
-
RegressionTree
public RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes)
Constructor. Learns a regression tree with (most) given number of leaves.- Parameters:
attributes
- the attribute properties.x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.
-
RegressionTree
public RegressionTree(AttributeDataset data, int maxNodes)
Constructor. Learns a regression tree for random forest and gradient tree boosting.- Parameters:
data
- the dataset.maxNodes
- the maximum number of leaf nodes in the tree. samples[i] should be 0 or 1 to indicate if the instance is used for training.
-
RegressionTree
public RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize)
Constructor. Learns a regression tree with (most) given number of leaves.- Parameters:
attributes
- the attribute properties.x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.
-
RegressionTree
public RegressionTree(AttributeDataset data, int maxNodes, int nodeSize)
Constructor. Learns a regression tree for random forest and gradient tree boosting.- Parameters:
data
- the dataset.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
-
RegressionTree
public RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output)
Constructor. Learns a regression tree for random forest and gradient tree boosting.- Parameters:
attributes
- the attribute properties.x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.mtry
- the number of input variables to pick to split on at each node. It seems that p/3 give generally good performance, where p is the number of variables.order
- the index of training values in ascending order. Note that only numeric attributes need be sorted.samples
- the sample set of instances for stochastic learning. samples[i] should be 0 or 1 to indicate if the instance is used for training.
-
RegressionTree
public RegressionTree(AttributeDataset data, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output)
Constructor. Learns a regression tree for random forest and gradient tree boosting.- Parameters:
data
- the dataset.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.mtry
- the number of input variables to pick to split on at each node. It seems that p/3 give generally good performance, where p is the number of variables.order
- the index of training values in ascending order. Note that only numeric attributes need be sorted.samples
- the sample set of instances for stochastic learning. samples[i] should be 0 or 1 to indicate if the instance is used for training.
-
RegressionTree
public RegressionTree(AttributeDataset data, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output, double[] monotonicRegression)
-
RegressionTree
public RegressionTree(Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output, double[] monotonicRegression)
-
RegressionTree
public RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes)
Constructor. Learns a regression tree on sparse binary samples.- Parameters:
numFeatures
- the number of sparse binary features.x
- the training instances of sparse binary features.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.
-
RegressionTree
public RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize)
Constructor. Learns a regression tree on sparse binary samples.- Parameters:
numFeatures
- the number of sparse binary features.x
- the training instances of sparse binary features.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
-
RegressionTree
public RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize, int[] samples, RegressionTree.NodeOutput output)
Constructor. Learns a regression tree on sparse binary samples.- Parameters:
numFeatures
- the number of sparse binary features.x
- the training instances.y
- the response variable.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.samples
- the sample set of instances for stochastic learning. samples[i] should be 0 or 1 to indicate if the instance is used for training.
-
-
Method Detail
-
importance
public double[] importance()
Returns the variable importance. Every time a split of a node is made on variable the impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over the tree gives a simple measure of variable importance.- Returns:
- the variable importance
-
predict
public double predict(double[] x)
Description copied from interface:Regression
Predicts the dependent variable of an instance.- Specified by:
predict
in interfaceRegression<double[]>
- Parameters:
x
- the instance.- Returns:
- the predicted value of dependent variable.
-
predict
public double predict(int[] x)
Predicts the dependent variable of an instance with sparse binary features.- Parameters:
x
- the instance.- Returns:
- the predicted value of dependent variable.
-
maxDepth
public int maxDepth()
Returns the maximum depth" of the tree -- the number of nodes along the longest path from the root node down to the farthest leaf node.
-
dot
public java.lang.String dot()
Returns the graphic representation in Graphviz dot format. Try http://viz-js.com/ to visualize the returned string.
-
getRoot
public smile.regression.RegressionTree.Node getRoot()
Returs the root node.- Returns:
- root node.
-
-