pyspark.streaming.DStream.countByValueAndWindow¶
- 
DStream.countByValueAndWindow(windowDuration: int, slideDuration: int, numPartitions: Optional[int] = None) → pyspark.streaming.dstream.DStream[Tuple[T, int]][source]¶
- Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream. - Parameters
- windowDurationint
- width of the window; must be a multiple of this DStream’s batching interval 
- slideDurationint
- sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval 
- numPartitionsint, optional
- number of partitions of each RDD in the new DStream.