pyspark.sql.functions.tuple_union_theta_double#

pyspark.sql.functions.tuple_union_theta_double(col1, col2, lgNomEntries=None, mode=None)[source]#

Merges a Datasketches TupleSketch with double summaries with a ThetaSketch.

New in version 4.2.0.

Parameters
col1Column or column name

The TupleSketch column with double summaries

col2Column or column name

The ThetaSketch column

lgNomEntriesColumn or int, optional

The log-base-2 of nominal entries (must be between 4 and 26, defaults to 12)

modeColumn or str, optional

The summary mode: “sum” (default), “min”, “max”, or “alwaysone”

Returns
Column

The binary representation of the merged TupleSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, 10.0, 3), (2, 20.0, 4)], ["key1", "v1", "key2"])  # noqa
>>> df = df.agg(
...     sf.tuple_sketch_agg_double("key1", "v1").alias("sketch1"),
...     sf.theta_sketch_agg("key2").alias("sketch2")
... )
>>> df.select(sf.tuple_sketch_estimate_double(sf.tuple_union_theta_double(df.sketch1, "sketch2"))).show()  # noqa
+---------------------------------------------------------------------------------+
|tuple_sketch_estimate_double(tuple_union_theta_double(sketch1, sketch2, 12, sum))|
+---------------------------------------------------------------------------------+
|                                                                              4.0|
+---------------------------------------------------------------------------------+