pyspark.sql.streaming.DataStreamWriter¶
- 
class pyspark.sql.streaming.DataStreamWriter(df: DataFrame)[source]¶
- Interface used to write a streaming - DataFrameto external storage systems (e.g. file systems, key-value stores, etc). Use- DataFrame.writeStreamto access this.- New in version 2.0.0. - Changed in version 3.5.0: Supports Spark Connect. - Notes - This API is evolving. - Examples - The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then writes the stream out to the console. The streaming query stops in 3 seconds. - >>> import time >>> df = spark.readStream.format("rate").load() >>> df = df.selectExpr("value % 3 as v") >>> q = df.writeStream.format("console").start() >>> time.sleep(3) >>> q.stop() - Methods - foreach(f)- Sets the output of the streaming query to be processed using the provided writer - f.- foreachBatch(func)- Sets the output of the streaming query to be processed using the provided function. - format(source)- Specifies the underlying output data source. - option(key, value)- Adds an output option for the underlying data source. - options(**options)- Adds output options for the underlying data source. - outputMode(outputMode)- Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink. - partitionBy(*cols)- Partitions the output by the given columns on the file system. - queryName(queryName)- Specifies the name of the - StreamingQuerythat can be started with- start().- start([path, format, outputMode, …])- Streams the contents of the - DataFrameto a data source.- toTable(tableName[, format, outputMode, …])- Starts the execution of the streaming query, which will continually output results to the given table as new data arrives. - trigger(*[, processingTime, once, …])- Set the trigger for the stream query.