pyspark.Accumulator¶
- 
class pyspark.Accumulator(aid: int, value: T, accum_param: pyspark.accumulators.AccumulatorParam[T])[source]¶
- A shared variable that can be accumulated, i.e., has a commutative and associative “add” operation. Worker tasks on a Spark cluster can add values to an Accumulator with the += operator, but only the driver program is allowed to access its value, using value. Updates from the workers get propagated automatically to the driver program. - While - SparkContextsupports accumulators for primitive data types like- intand- float, users can also define accumulators for custom types by providing a custom- AccumulatorParamobject. Refer to its doctest for an example.- Examples - >>> a = sc.accumulator(1) >>> a.value 1 >>> a.value = 2 >>> a.value 2 >>> a += 5 >>> a.value 7 >>> sc.accumulator(1.0).value 1.0 >>> sc.accumulator(1j).value 1j >>> rdd = sc.parallelize([1,2,3]) >>> def f(x): ... global a ... a += x ... >>> rdd.foreach(f) >>> a.value 13 >>> b = sc.accumulator(0) >>> def g(x): ... b.add(x) ... >>> rdd.foreach(g) >>> b.value 6 - >>> rdd.map(lambda x: a.value).collect() Traceback (most recent call last): ... Py4JJavaError: ... - >>> def h(x): ... global a ... a.value = 7 ... >>> rdd.foreach(h) Traceback (most recent call last): ... Py4JJavaError: ... - >>> sc.accumulator([1.0, 2.0, 3.0]) Traceback (most recent call last): ... TypeError: ... - Methods - add(term)- Adds a term to this accumulator’s value - Attributes - Get the accumulator’s value; only usable in driver program