pyspark.pandas.DataFrame.idxmin#

DataFrame.idxmin(axis=0)[source]#

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

Note

This API collect all rows with minimum value using to_pandas() because we suppose the number of rows with min values are usually small in general.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

Returns

Series

See also

Series.idxmin

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf
   a    b    c
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmin()
a    0
b    3
c    1
dtype: int64

For axis=1, return the column label of the minimum value in each row:

>>> psdf.idxmin(axis=1)
0    a
1    a
2    a
3    b
dtype: object

For Multi-column Index

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
   a    b    c
   x    y    z
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmin()
a  x    0
b  y    3
c  z    1
dtype: int64