pyspark.pandas.groupby.DataFrameGroupBy.describe¶
- 
DataFrameGroupBy.describe() → pyspark.pandas.frame.DataFrame[source]¶
- Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding - NaNvalues.- Analyzes both numeric and object series, as well as - DataFramecolumn sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.- Note - Unlike pandas, the percentiles in pandas-on-Spark are based upon approximate percentile computation because computing percentiles across a large dataset is extremely expensive. - Returns
- DataFrame
- Summary statistics of the DataFrame provided. 
 
 - See also - DataFrame.count
- DataFrame.max
- DataFrame.min
- DataFrame.mean
- DataFrame.std
 - Examples - >>> df = ps.DataFrame({'a': [1, 1, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) >>> df a b c 0 1 4 7 1 1 5 8 2 3 6 9 - Describing a - DataFrame. By default only numeric fields are returned.- >>> described = df.groupby('a').describe() >>> described.sort_index() b c count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max a 1 2.0 4.5 0.707107 4.0 4.0 4.0 5.0 5.0 2.0 7.5 0.707107 7.0 7.0 7.0 8.0 8.0 3 1.0 6.0 NaN 6.0 6.0 6.0 6.0 6.0 1.0 9.0 NaN 9.0 9.0 9.0 9.0 9.0