pyspark.streaming.DStream.reduceByWindow¶
- 
DStream.reduceByWindow(reduceFunc, invReduceFunc, windowDuration, slideDuration)[source]¶
- Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream. - if invReduceFunc is not None, the reduction is done incrementally using the old window’s reduced value : - reduce the new values that entered the window (e.g., adding new counts) 
 - 2. “inverse reduce” the old values that left the window (e.g., subtracting old counts) This is more efficient than invReduceFunc is None. - Parameters
- reduceFuncfunction
- associative and commutative reduce function 
- invReduceFuncfunction
- inverse reduce function of reduceFunc; such that for all y, and invertible x: invReduceFunc(reduceFunc(x, y), x) = y 
- windowDurationint
- width of the window; must be a multiple of this DStream’s batching interval 
- slideDurationint
- sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval