pyspark.RDD.intersection¶
- 
RDD.intersection(other)[source]¶
- Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. - Notes - This method performs a shuffle internally. - Examples - >>> rdd1 = sc.parallelize([1, 10, 2, 3, 4, 5]) >>> rdd2 = sc.parallelize([1, 6, 2, 3, 7, 8]) >>> rdd1.intersection(rdd2).collect() [1, 2, 3]