pyspark.pandas.Series¶
- 
class pyspark.pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)[source]¶
- pandas-on-Spark Series that corresponds to pandas Series logically. This holds Spark Column internally. - Variables
- _internal – an internal immutable Frame to manage metadata. 
- _psdf – Parent’s pandas-on-Spark DataFrame 
 
- Parameters
- dataarray-like, dict, or scalar value, pandas Series
- Contains data stored in Series If data is a dict, argument order is maintained for Python 3.6 and later. Note that if data is a pandas Series, other arguments should not be used. 
- indexarray-like or Index (1d)
- Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict. 
- dtypenumpy.dtype or None
- If None, dtype will be inferred 
- copyboolean, default False
- Copy input data 
 
 - Methods - abs()- Return a Series/DataFrame with absolute numeric value of each element. - add(other)- Return Addition of series and other, element-wise (binary operator +). - add_prefix(prefix)- Prefix labels with string prefix. - add_suffix(suffix)- Suffix labels with string suffix. - agg(func)- Aggregate using one or more operations over the specified axis. - aggregate(func)- Aggregate using one or more operations over the specified axis. - align(other[, join, axis, copy])- Align two objects on their axes with the specified join method. - all([axis])- Return whether all elements are True. - any([axis])- Return whether any element is True. - append(to_append[, ignore_index, …])- Concatenate two or more Series. - apply(func[, args])- Invoke function on values of Series. - argmax()- Return int position of the largest value in the Series. - argmin()- Return int position of the smallest value in the Series. - argsort()- Return the integer indices that would sort the Series values. - asof(where)- Return the last row(s) without any NaNs before where. - astype(dtype)- Cast a pandas-on-Spark object to a specified dtype - dtype.- at_time(time[, asof, axis])- Select values at particular time of day (example: 9:30AM). - backfill([axis, inplace, limit])- Synonym for DataFrame.fillna() or Series.fillna() with - method=`bfill`.- between(left, right[, inclusive])- Return boolean Series equivalent to left <= series <= right. - between_time(start_time, end_time[, …])- Select values between particular times of the day (example: 9:00-9:30 AM). - bfill([axis, inplace, limit])- Synonym for DataFrame.fillna() or Series.fillna() with - method=`bfill`.- bool()- Return the bool of a single element in the current object. - clip([lower, upper])- Trim values at input threshold(s). - combine_first(other)- Combine Series values, choosing the calling Series’s values first. - compare(other[, keep_shape, keep_equal])- Compare to another Series and show the differences. - copy([deep])- Make a copy of this object’s indices and data. - corr(other[, method])- Compute correlation with other Series, excluding missing values. - count([axis, numeric_only])- Count non-NA cells for each column. - cummax([skipna])- Return cumulative maximum over a DataFrame or Series axis. - cummin([skipna])- Return cumulative minimum over a DataFrame or Series axis. - cumprod([skipna])- Return cumulative product over a DataFrame or Series axis. - cumsum([skipna])- Return cumulative sum over a DataFrame or Series axis. - describe([percentiles])- Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding - NaNvalues.- diff([periods])- First discrete difference of element. - div(other)- Return Floating division of series and other, element-wise (binary operator /). - divide(other)- Return Floating division of series and other, element-wise (binary operator /). - divmod(other)- Return Integer division and modulo of series and other, element-wise (binary operator divmod). - dot(other)- Compute the dot product between the Series and the columns of other. - drop([labels, index, level])- Return Series with specified index labels removed. - drop_duplicates([keep, inplace])- Return Series with duplicate values removed. - droplevel(level)- Return Series with requested index level(s) removed. - dropna([axis, inplace])- Return a new Series with missing values removed. - eq(other)- Compare if the current value is equal to the other. - equals(other)- Compare if the current value is equal to the other. - expanding([min_periods])- Provide expanding transformations. - explode()- Transform each element of a list-like to a row. - factorize([sort, na_sentinel])- Encode the object as an enumerated type or categorical variable. - ffill([axis, inplace, limit])- Synonym for DataFrame.fillna() or Series.fillna() with - method=`ffill`.- fillna([value, method, axis, inplace, limit])- Fill NA/NaN values. - filter([items, like, regex, axis])- Subset rows or columns of dataframe according to labels in the specified index. - first(offset)- Select first periods of time series data based on a date offset. - Retrieves the index of the first valid value. - floordiv(other)- Return Integer division of series and other, element-wise (binary operator //). - ge(other)- Compare if the current value is greater than or equal to the other. - get(key[, default])- Get item from object for given key (DataFrame column, Panel slice, etc.). - get_dtype_counts()- Return counts of unique dtypes in this object. - groupby(by[, axis, as_index, dropna])- Group DataFrame or Series using a Series of columns. - gt(other)- Compare if the current value is greater than the other. - head([n])- Return the first n rows. - hist([bins])- Draw one histogram of the DataFrame’s columns. - idxmax([skipna])- Return the row label of the maximum value. - idxmin([skipna])- Return the row label of the minimum value. - isin(values)- Check whether values are contained in Series or Index. - isna()- Detect existing (non-missing) values. - isnull()- Detect existing (non-missing) values. - item()- Return the first element of the underlying data as a Python scalar. - items()- This is an alias of - iteritems.- Lazily iterate over (index, value) tuples. - keys()- Return alias for index. - kurt([axis, numeric_only])- Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). - kurtosis([axis, numeric_only])- Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). - last(offset)- Select final periods of time series data based on a date offset. - Return index for last non-NA/null value. - le(other)- Compare if the current value is less than or equal to the other. - lt(other)- Compare if the current value is less than the other. - mad()- Return the mean absolute deviation of values. - map(arg)- Map values of Series according to input correspondence. - mask(cond[, other])- Replace values where the condition is True. - max([axis, numeric_only])- Return the maximum of the values. - mean([axis, numeric_only])- Return the mean of the values. - median([axis, numeric_only, accuracy])- Return the median of the values for the requested axis. - min([axis, numeric_only])- Return the minimum of the values. - mod(other)- Return Modulo of series and other, element-wise (binary operator %). - mode([dropna])- Return the mode(s) of the dataset. - mul(other)- Return Multiplication of series and other, element-wise (binary operator *). - multiply(other)- Return Multiplication of series and other, element-wise (binary operator *). - ne(other)- Compare if the current value is not equal to the other. - nlargest([n])- Return the largest n elements. - notna()- Detect existing (non-missing) values. - notnull()- Detect existing (non-missing) values. - nsmallest([n])- Return the smallest n elements. - nunique([dropna, approx, rsd])- Return number of unique elements in the object. - pad([axis, inplace, limit])- Synonym for DataFrame.fillna() or Series.fillna() with - method=`ffill`.- pct_change([periods])- Percentage change between the current and a prior element. - pipe(func, *args, **kwargs)- Apply func(self, *args, **kwargs). - pop(item)- Return item and drop from series. - pow(other)- Return Exponential power of series of series and other, element-wise (binary operator **). - prod([axis, numeric_only, min_count])- Return the product of the values. - product([axis, numeric_only, min_count])- Return the product of the values. - quantile([q, accuracy])- Return value at the given quantile. - radd(other)- Return Reverse Addition of series and other, element-wise (binary operator +). - rank([method, ascending])- Compute numerical data ranks (1 through n) along axis. - rdiv(other)- Return Reverse Floating division of series and other, element-wise (binary operator /). - rdivmod(other)- Return Integer division and modulo of series and other, element-wise (binary operator rdivmod). - reindex([index, fill_value])- Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. - reindex_like(other)- Return a Series with matching indices as other object. - rename([index])- Alter Series name. - rename_axis([mapper, index, inplace])- Set the name of the axis for the index or columns. - repeat(repeats)- Repeat elements of a Series. - replace([to_replace, value, regex])- Replace values given in to_replace with value. - reset_index([level, drop, name, inplace])- Generate a new DataFrame or Series with the index reset. - rfloordiv(other)- Return Reverse Integer division of series and other, element-wise (binary operator //). - rmod(other)- Return Reverse Modulo of series and other, element-wise (binary operator %). - rmul(other)- Return Reverse Multiplication of series and other, element-wise (binary operator *). - rolling(window[, min_periods])- Provide rolling transformations. - round([decimals])- Round each value in a Series to the given number of decimals. - rpow(other)- Return Reverse Exponential power of series and other, element-wise (binary operator **). - rsub(other)- Return Reverse Subtraction of series and other, element-wise (binary operator -). - rtruediv(other)- Return Reverse Floating division of series and other, element-wise (binary operator /). - sample([n, frac, replace, random_state])- Return a random sample of items from an axis of object. - sem([axis, ddof, numeric_only])- Return unbiased standard error of the mean over requested axis. - shift([periods, fill_value])- Shift Series/Index by desired number of periods. - skew([axis, numeric_only])- Return unbiased skew normalized by N-1. - sort_index([axis, level, ascending, …])- Sort object by labels (along an axis) - sort_values([ascending, inplace, na_position])- Sort by the values. - squeeze([axis])- Squeeze 1 dimensional axis objects into scalars. - std([axis, ddof, numeric_only])- Return sample standard deviation. - sub(other)- Return Subtraction of series and other, element-wise (binary operator -). - subtract(other)- Return Subtraction of series and other, element-wise (binary operator -). - sum([axis, numeric_only, min_count])- Return the sum of the values. - swapaxes(i, j[, copy])- Interchange axes and swap values axes appropriately. - swaplevel([i, j, copy])- Swap levels i and j in a MultiIndex. - tail([n])- Return the last n rows. - take(indices)- Return the elements in the given positional indices along an axis. - to_clipboard([excel, sep])- Copy object to the system clipboard. - to_csv([path, sep, na_rep, columns, header, …])- Write object to a comma-separated values (csv) file. - to_dataframe([name])- Convert Series to DataFrame. - to_dict([into])- Convert Series to {label -> value} dict or dict-like object. - to_excel(excel_writer[, sheet_name, na_rep, …])- Write object to an Excel sheet. - to_frame([name])- Convert Series to DataFrame. - to_json([path, compression, num_files, …])- Convert the object to a JSON string. - to_latex([buf, columns, col_space, header, …])- Render an object to a LaTeX tabular environment table. - to_list()- Return a list of the values. - to_markdown([buf, mode])- Print Series or DataFrame in Markdown-friendly format. - to_numpy()- A NumPy ndarray representing the values in this DataFrame or Series. - Return a pandas Series. - to_string([buf, na_rep, float_format, …])- Render a string representation of the Series. - tolist()- Return a list of the values. - transform(func[, axis])- Call - funcproducing the same type as self with transformed values and that has the same axis length as input.- transpose(*args, **kwargs)- Return the transpose, which is by definition self. - truediv(other)- Return Floating division of series and other, element-wise (binary operator /). - truncate([before, after, axis, copy])- Truncate a Series or DataFrame before and after some index value. - unique()- Return unique values of Series object. - unstack([level])- Unstack, a.k.a. - update(other)- Modify Series in place using non-NA values from passed Series. - value_counts([normalize, sort, ascending, …])- Return a Series containing counts of unique values. - var([axis, ddof, numeric_only])- Return unbiased variance. - where(cond[, other])- Replace values where the condition is False. - xs(key[, level])- Return cross-section from the Series. - Attributes - Return the transpose, which is by definition self. - Access a single value for a row/column label pair. - Return a list of the row axis labels. - Return the dtype object of the underlying data. - Return the dtype object of the underlying data. - Returns true if the current object is empty. - Return True if it has any missing values. - Access a single value for a row/column pair by integer position. - Purely integer-location based indexing for selection by position. - The index (axis labels) Column of the Series. - Return boolean if values in the object are monotonically increasing. - Return boolean if values in the object are monotonically decreasing. - Return boolean if values in the object are monotonically increasing. - Return boolean if values in the object are unique - Access a group of rows and columns by label(s) or a boolean Series. - Return name of the Series. - Return an int representing the number of array dimensions. - Return a tuple of the shape of the underlying data. - Return an int representing the number of elements in this object. - Return a Numpy representation of the DataFrame or the Series.