QuantileDiscretizer

java.lang.Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Estimator<Bucketizer>
  - - org.apache.spark.ml.feature.QuantileDiscretizer

All Implemented Interfaces:

java.io.Serializable, Logging, Params, Identifiable
```
public final class QuantileDiscretizer
extends Estimator<Bucketizer>
```
:: Experimental :: QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features. The bin ranges are chosen by taking a sample of the data and dividing it into roughly equal parts. The lower and upper bin bounds will be -Infinity and +Infinity, covering all real values. This attempts to find numBuckets partitions based on a sample of data, but it may find fewer depending on the data sample values.

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description

QuantileDiscretizer()

QuantileDiscretizer(java.lang.String uid)

Constructors
Constructor and Description
`QuantileDiscretizer()`
`QuantileDiscretizer(java.lang.String uid)`

Method Summary

Methods
Modifier and Type	Method and Description
`QuantileDiscretizer`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`Bucketizer`	`fit(DataFrame dataset)` Fits a model to the input data.
`int`	`getNumBuckets()`
`static QuantileDiscretizer`	`load(java.lang.String path)`
`static int`	`minSamplesRequired()`
`IntParam`	`numBuckets()` Maximum number of buckets (quantiles, or categories) into which data points are grouped.
`QuantileDiscretizer`	`setInputCol(java.lang.String value)`
`QuantileDiscretizer`	`setNumBuckets(int value)`
`QuantileDiscretizer`	`setOutputCol(java.lang.String value)`
`QuantileDiscretizer`	`setSeed(long value)`
`StructType`	`transformSchema(StructType schema)` :: DeveloperApi ::
`java.lang.String`	`uid()` An immutable unique ID for the object and its derivatives.

Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit

Methods inherited from class org.apache.spark.ml.PipelineStage
transformSchema

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParams

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - QuantileDiscretizer
```
public QuantileDiscretizer(java.lang.String uid)
```
  - QuantileDiscretizer
```
public QuantileDiscretizer()
```
- Method Detail
  - minSamplesRequired
```
public static int minSamplesRequired()
```
  - load
```
public static QuantileDiscretizer load(java.lang.String path)
```
  - uid
```
public java.lang.String uid()
```
    Description copied from interface: Identifiable
    
    An immutable unique ID for the object and its derivatives.
    
    Specified by:
    
    uid in interface Identifiable
    
    Returns:
    (undocumented)
  - setNumBuckets
```
public QuantileDiscretizer setNumBuckets(int value)
```
  - setInputCol
```
public QuantileDiscretizer setInputCol(java.lang.String value)
```
  - setOutputCol
```
public QuantileDiscretizer setOutputCol(java.lang.String value)
```
  - setSeed
```
public QuantileDiscretizer setSeed(long value)
```
  - transformSchema
```
public StructType transformSchema(StructType schema)
```
    Description copied from class: PipelineStage
    
    :: DeveloperApi ::
    Derives the output schema from the input schema.
    
    Specified by:
    
    transformSchema in class PipelineStage
    
    Parameters:
    schema - (undocumented)
    
    Returns:
    (undocumented)
  - fit
```
public Bucketizer fit(DataFrame dataset)
```
    Description copied from class: Estimator
    
    Fits a model to the input data.
    
    Specified by:
    
    fit in class Estimator<Bucketizer>
    
    Parameters:
    dataset - (undocumented)
    
    Returns:
    (undocumented)
  - copy
```
public QuantileDiscretizer copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Estimator<Bucketizer>
    
    Parameters:
    extra - (undocumented)
    
    Returns:
    (undocumented)
    See Also:
    defaultCopy()
  - numBuckets
```
public IntParam numBuckets()
```
    Maximum number of buckets (quantiles, or categories) into which data points are grouped. Must be >= 2. default: 2
    
    Returns:
    (undocumented)
  - getNumBuckets
```
public int getNumBuckets()
```

Class QuantileDiscretizer

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.ml.Estimator

Methods inherited from class org.apache.spark.ml.PipelineStage

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

QuantileDiscretizer

QuantileDiscretizer

Method Detail

minSamplesRequired

load

uid

setNumBuckets

setInputCol

setOutputCol

setSeed

transformSchema

fit

copy

numBuckets

getNumBuckets