PrefixSpan¶
- 
class pyspark.mllib.fpm.PrefixSpan[source]¶
- A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in Jian Pei et al (2001) [1] - New in version 1.6.0. - 1
- Jian Pei et al., “PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth,” Proceedings 17th International Conference on Data Engineering, Heidelberg, Germany, 2001, pp. 215-224, doi: https://doi.org/10.1109/ICDE.2001.914830 
 - Methods - train(data[, minSupport, maxPatternLength, …])- Finds the complete set of frequent sequential patterns in the input sequences of itemsets. - Methods Documentation - 
classmethod train(data: pyspark.rdd.RDD[List[List[T]]], minSupport: float = 0.1, maxPatternLength: int = 10, maxLocalProjDBSize: int = 32000000) → pyspark.mllib.fpm.PrefixSpanModel[T][source]¶
- Finds the complete set of frequent sequential patterns in the input sequences of itemsets. - New in version 1.6.0. - Parameters
- datapyspark.RDD
- The input data set, each element contains a sequence of itemsets. 
- minSupportfloat, optional
- The minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output. (default: 0.1) 
- maxPatternLengthint, optional
- The maximal length of the sequential pattern, any pattern that appears less than maxPatternLength will be output. (default: 10) 
- maxLocalProjDBSizeint, optional
- The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run. (default: 32000000) 
 
- data