pyspark.RDD.saveAsPickleFile¶
- 
RDD.saveAsPickleFile(path: str, batchSize: int = 10) → None[source]¶
- Save this RDD as a SequenceFile of serialized objects. The serializer used is - pyspark.serializers.CPickleSerializer, default batch size is 10.- New in version 1.1.0. - Parameters
- pathstr
- path to pickled file 
- batchSizeint, optional, default 10
- the number of Python objects represented as a single Java object. 
 
 - See also - Examples - >>> import os >>> import tempfile >>> with tempfile.TemporaryDirectory() as d: ... path = os.path.join(d, "pickle_file") ... ... # Write a temporary pickled file ... sc.parallelize(range(10)).saveAsPickleFile(path, 3) ... ... # Load picked file as an RDD ... sorted(sc.pickleFile(path, 3).collect()) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]