pyspark.pandas.DataFrame.dropna¶
- 
DataFrame.dropna(axis: Union[int, str] = 0, how: str = 'any', thresh: Optional[int] = None, subset: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]], None] = None, inplace: bool = False) → Optional[pyspark.pandas.frame.DataFrame][source]¶
- Remove missing values. - Parameters
- axis{0 or ‘index’}, default 0
- Determine if rows or columns which contain missing values are removed. - 0, or ‘index’ : Drop rows which contain missing values. 
 
- how{‘any’, ‘all’}, default ‘any’
- Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. - ‘any’ : If any NA values are present, drop that row or column. 
- ‘all’ : If all values are NA, drop that row or column. 
 
- threshint, optional
- Require that many non-NA values. 
- subsetarray-like, optional
- Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. 
- inplacebool, default False
- If True, do operation inplace and return None. 
 
- Returns
- DataFrame
- DataFrame with NA entries dropped from it. 
 
 - See also - DataFrame.drop
- Drop specified labels from columns. 
- DataFrame.isnull
- Indicate missing values. 
- DataFrame.notnull
- Indicate existing (non-missing) values. 
 - Examples - >>> df = ps.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], ... "toy": [None, 'Batmobile', 'Bullwhip'], ... "born": [None, "1940-04-25", None]}, ... columns=['name', 'toy', 'born']) >>> df name toy born 0 Alfred None None 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None - Drop the rows where at least one element is missing. - >>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25 - Drop the columns where at least one element is missing. - >>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman - Drop the rows where all elements are missing. - >>> df.dropna(how='all') name toy born 0 Alfred None None 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None - Keep only the rows with at least 2 non-NA values. - >>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip None - Define in which columns to look for missing values. - >>> df.dropna(subset=['name', 'born']) name toy born 1 Batman Batmobile 1940-04-25 - Keep the DataFrame with valid entries in the same variable. - >>> df.dropna(inplace=True) >>> df name toy born 1 Batman Batmobile 1940-04-25