pyspark.RDD.randomSplit¶
- 
RDD.randomSplit(weights: Sequence[Union[int, float]], seed: Optional[int] = None) → List[pyspark.rdd.RDD[T]][source]¶
- Randomly splits this RDD with the provided weights. - New in version 1.3.0. - Parameters
- weightslist
- weights for splits, will be normalized if they don’t sum to 1 
- seedint, optional
- random seed 
 
- Returns
- list
- split - RDDs in a list
 
 - See also - Examples - >>> rdd = sc.parallelize(range(500), 1) >>> rdd1, rdd2 = rdd.randomSplit([2, 3], 17) >>> len(rdd1.collect() + rdd2.collect()) 500 >>> 150 < rdd1.count() < 250 True >>> 250 < rdd2.count() < 350 True