pyspark.RDD.distinct¶
- 
RDD.distinct(numPartitions: Optional[int] = None) → pyspark.rdd.RDD[T][source]¶
- Return a new RDD containing the distinct elements in this RDD. - New in version 0.7.0. - Parameters
- numPartitionsint, optional
- the number of partitions in new - RDD
 
- Returns
 - See also - Examples - >>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect()) [1, 2, 3]