pyspark.sql.functions.sum_distinct¶
- 
pyspark.sql.functions.sum_distinct(col: ColumnOrName) → pyspark.sql.column.Column[source]¶
- Aggregate function: returns the sum of distinct values in the expression. - New in version 3.2.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- target column to compute on. 
 
- col
- Returns
- Column
- the column for computed results. 
 
 - Examples - >>> df = spark.createDataFrame([(None,), (1,), (1,), (2,)], schema=["numbers"]) >>> df.select(sum_distinct(col("numbers"))).show() +---------------------+ |sum(DISTINCT numbers)| +---------------------+ | 3| +---------------------+