Class DecisionTree
source code
object --+
         |
        DecisionTree
Learning algorithm for a decision tree model
for classification or regression.
EXPERIMENTAL: This is an experimental API.
              It will probably be modified for Spark v1.2.
Example usage:
>>> from numpy import array
>>> import sys
>>> from pyspark.mllib.regression import LabeledPoint
>>> from pyspark.mllib.tree import DecisionTree
>>> from pyspark.mllib.linalg import SparseVector
>>>
>>> data = [
...     LabeledPoint(0.0, [0.0]),
...     LabeledPoint(1.0, [1.0]),
...     LabeledPoint(1.0, [2.0]),
...     LabeledPoint(1.0, [3.0])
... ]
>>> categoricalFeaturesInfo = {} # no categorical features
>>> model = DecisionTree.trainClassifier(sc.parallelize(data), numClasses=2,
...                                      categoricalFeaturesInfo=categoricalFeaturesInfo)
>>> sys.stdout.write(model)
DecisionTreeModel classifier
  If (feature 0 <= 0.5)
   Predict: 0.0
  Else (feature 0 > 0.5)
   Predict: 1.0
>>> model.predict(array([1.0])) > 0
True
>>> model.predict(array([0.0])) == 0
True
>>> sparse_data = [
...     LabeledPoint(0.0, SparseVector(2, {0: 0.0})),
...     LabeledPoint(1.0, SparseVector(2, {1: 1.0})),
...     LabeledPoint(0.0, SparseVector(2, {0: 0.0})),
...     LabeledPoint(1.0, SparseVector(2, {1: 2.0}))
... ]
>>>
>>> model = DecisionTree.trainRegressor(sc.parallelize(sparse_data),
...                                     categoricalFeaturesInfo=categoricalFeaturesInfo)
>>> model.predict(array([0.0, 1.0])) == 1
True
>>> model.predict(array([0.0, 0.0])) == 0
True
>>> model.predict(SparseVector(2, {1: 1.0})) == 1
True
>>> model.predict(SparseVector(2, {1: 0.0})) == 0
True
  
    | Inherited from object:__delattr__,__format__,__getattribute__,__hash__,__init__,__new__,__reduce__,__reduce_ex__,__repr__,__setattr__,__sizeof__,__str__,__subclasshook__ | 
    |  | 
        
          | trainClassifier(data,
        numClasses,
        categoricalFeaturesInfo,
        impurity="gini",
        maxDepth=4,
        maxBins=100) Train a DecisionTreeModel for classification.
 | source code |  | 
    |  | 
        
          | trainRegressor(data,
        categoricalFeaturesInfo,
        impurity="variance",
        maxDepth=4,
        maxBins=100) Train a DecisionTreeModel for regression.
 | source code |  | 
  
    | Inherited from object:__class__ | 
| 
  | trainClassifier(data,
        numClasses,
        categoricalFeaturesInfo,
        impurity="gini",
        maxDepth=4,
        maxBins=100)
    Static Method
 | source code |  
Train a DecisionTreeModel for classification.
:param data: Training data: RDD of LabeledPoint.
             Labels are integers {0,1,...,numClasses}.
:param numClasses: Number of classes for classification.
:param categoricalFeaturesInfo: Map from categorical feature index
                                to number of categories.
                                Any feature not in this map
                                is treated as continuous.
:param impurity: Supported values: "entropy" or "gini"
:param maxDepth: Max depth of tree.
                 E.g., depth 0 means 1 leaf node.
                 Depth 1 means 1 internal node + 2 leaf nodes.
:param maxBins: Number of bins used for finding splits at each node.
:return: DecisionTreeModel
   | 
 
| 
  | trainRegressor(data,
        categoricalFeaturesInfo,
        impurity="variance",
        maxDepth=4,
        maxBins=100)
    Static Method
 | source code |  
Train a DecisionTreeModel for regression.
:param data: Training data: RDD of LabeledPoint.
             Labels are real numbers.
:param categoricalFeaturesInfo: Map from categorical feature index
                                to number of categories.
                                Any feature not in this map
                                is treated as continuous.
:param impurity: Supported values: "variance"
:param maxDepth: Max depth of tree.
                 E.g., depth 0 means 1 leaf node.
                 Depth 1 means 1 internal node + 2 leaf nodes.
:param maxBins: Number of bins used for finding splits at each node.
:return: DecisionTreeModel
   |