pyspark.pandas.Series¶

class pyspark.pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)[source]¶

pandas-on-Spark Series that corresponds to pandas Series logically. This holds Spark Column internally.

Variables

_internal – an internal immutable Frame to manage metadata.
_psdf – Parent’s pandas-on-Spark DataFrame

Parameters

dataarray-like, dict, or scalar value, pandas Series: Contains data stored in Series If data is a dict, argument order is maintained for Python 3.6 and later. Note that if data is a pandas Series, other arguments should not be used.
indexarray-like or Index (1d): Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtypenumpy.dtype or None: If None, dtype will be inferred
copyboolean, default False: Copy input data

Methods

`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other)	Return Addition of series and other, element-wise (binary operator +).
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`agg`(func)	Aggregate using one or more operations over the specified axis.
`aggregate`(func)	Aggregate using one or more operations over the specified axis.
`align`(other[, join, axis, copy])	Align two objects on their axes with the specified join method.
`all`([axis])	Return whether all elements are True.
`any`([axis])	Return whether any element is True.
`append`(to_append[, ignore_index, …])	Concatenate two or more Series.
`apply`(func[, args])	Invoke function on values of Series.
`argmax`()	Return int position of the largest value in the Series.
`argmin`()	Return int position of the smallest value in the Series.
`argsort`()	Return the integer indices that would sort the Series values.
`asof`(where)	Return the last row(s) without any NaNs before where.
`astype`(dtype)	Cast a pandas-on-Spark object to a specified dtype `dtype`.
`at_time`(time[, asof, axis])	Select values at particular time of day (example: 9:30AM).
`backfill`([axis, inplace, limit])	Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.
`between`(left, right[, inclusive])	Return boolean Series equivalent to left <= series <= right.
`between_time`(start_time, end_time[, …])	Select values between particular times of the day (example: 9:00-9:30 AM).
`bfill`([axis, inplace, limit])	Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.
`bool`()	Return the bool of a single element in the current object.
`clip`([lower, upper])	Trim values at input threshold(s).
`combine_first`(other)	Combine Series values, choosing the calling Series’s values first.
`compare`(other[, keep_shape, keep_equal])	Compare to another Series and show the differences.
`copy`([deep])	Make a copy of this object’s indices and data.
`corr`(other[, method])	Compute correlation with other Series, excluding missing values.
`count`([axis, numeric_only])	Count non-NA cells for each column.
`cummax`([skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([skipna])	Return cumulative sum over a DataFrame or Series axis.
`describe`([percentiles])	Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding `NaN` values.
`diff`([periods])	First discrete difference of element.
`div`(other)	Return Floating division of series and other, element-wise (binary operator /).
`divide`(other)	Return Floating division of series and other, element-wise (binary operator /).
`divmod`(other)	Return Integer division and modulo of series and other, element-wise (binary operator divmod).
`dot`(other)	Compute the dot product between the Series and the columns of other.
`drop`([labels, index, level])	Return Series with specified index labels removed.
`drop_duplicates`([keep, inplace])	Return Series with duplicate values removed.
`droplevel`(level)	Return Series with requested index level(s) removed.
`dropna`([axis, inplace])	Return a new Series with missing values removed.
`eq`(other)	Compare if the current value is equal to the other.
`equals`(other)	Compare if the current value is equal to the other.
`expanding`([min_periods])	Provide expanding transformations.
`explode`()	Transform each element of a list-like to a row.
`factorize`([sort, na_sentinel])	Encode the object as an enumerated type or categorical variable.
`ffill`([axis, inplace, limit])	Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.
`fillna`([value, method, axis, inplace, limit])	Fill NA/NaN values.
`filter`([items, like, regex, axis])	Subset rows or columns of dataframe according to labels in the specified index.
`first`(offset)	Select first periods of time series data based on a date offset.
`first_valid_index`()	Retrieves the index of the first valid value.
`floordiv`(other)	Return Integer division of series and other, element-wise (binary operator //).
`ge`(other)	Compare if the current value is greater than or equal to the other.
`get`(key[, default])	Get item from object for given key (DataFrame column, Panel slice, etc.).
`get_dtype_counts`()	Return counts of unique dtypes in this object.
`groupby`(by[, axis, as_index, dropna])	Group DataFrame or Series using a Series of columns.
`gt`(other)	Compare if the current value is greater than the other.
`head`([n])	Return the first n rows.
`hist`([bins])	Draw one histogram of the DataFrame’s columns.
`idxmax`([skipna])	Return the row label of the maximum value.
`idxmin`([skipna])	Return the row label of the minimum value.
`isin`(values)	Check whether values are contained in Series or Index.
`isna`()	Detect existing (non-missing) values.
`isnull`()	Detect existing (non-missing) values.
`item`()	Return the first element of the underlying data as a Python scalar.
`items`()	This is an alias of `iteritems`.
`iteritems`()	Lazily iterate over (index, value) tuples.
`keys`()	Return alias for index.
`kurt`([axis, numeric_only])	Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
`kurtosis`([axis, numeric_only])	Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
`last`(offset)	Select final periods of time series data based on a date offset.
`last_valid_index`()	Return index for last non-NA/null value.
`le`(other)	Compare if the current value is less than or equal to the other.
`lt`(other)	Compare if the current value is less than the other.
`mad`()	Return the mean absolute deviation of values.
`map`(arg)	Map values of Series according to input correspondence.
`mask`(cond[, other])	Replace values where the condition is True.
`max`([axis, numeric_only])	Return the maximum of the values.
`mean`([axis, numeric_only])	Return the mean of the values.
`median`([axis, numeric_only, accuracy])	Return the median of the values for the requested axis.
`min`([axis, numeric_only])	Return the minimum of the values.
`mod`(other)	Return Modulo of series and other, element-wise (binary operator %).
`mode`([dropna])	Return the mode(s) of the dataset.
`mul`(other)	Return Multiplication of series and other, element-wise (binary operator *).
`multiply`(other)	Return Multiplication of series and other, element-wise (binary operator *).
`ne`(other)	Compare if the current value is not equal to the other.
`nlargest`([n])	Return the largest n elements.
`notna`()	Detect existing (non-missing) values.
`notnull`()	Detect existing (non-missing) values.
`nsmallest`([n])	Return the smallest n elements.
`nunique`([dropna, approx, rsd])	Return number of unique elements in the object.
`pad`([axis, inplace, limit])	Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.
`pct_change`([periods])	Percentage change between the current and a prior element.
`pipe`(func, args, *kwargs)	Apply func(self, args, *kwargs).
`pop`(item)	Return item and drop from series.
`pow`(other)	Return Exponential power of series of series and other, element-wise (binary operator **).
`prod`([axis, numeric_only, min_count])	Return the product of the values.
`product`([axis, numeric_only, min_count])	Return the product of the values.
`quantile`([q, accuracy])	Return value at the given quantile.
`radd`(other)	Return Reverse Addition of series and other, element-wise (binary operator +).
`rank`([method, ascending])	Compute numerical data ranks (1 through n) along axis.
`rdiv`(other)	Return Reverse Floating division of series and other, element-wise (binary operator /).
`rdivmod`(other)	Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).
`reindex`([index, fill_value])	Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
`reindex_like`(other)	Return a Series with matching indices as other object.
`rename`([index])	Alter Series name.
`rename_axis`([mapper, index, inplace])	Set the name of the axis for the index or columns.
`repeat`(repeats)	Repeat elements of a Series.
`replace`([to_replace, value, regex])	Replace values given in to_replace with value.
`reset_index`([level, drop, name, inplace])	Generate a new DataFrame or Series with the index reset.
`rfloordiv`(other)	Return Reverse Integer division of series and other, element-wise (binary operator //).
`rmod`(other)	Return Reverse Modulo of series and other, element-wise (binary operator %).
`rmul`(other)	Return Reverse Multiplication of series and other, element-wise (binary operator *).
`rolling`(window[, min_periods])	Provide rolling transformations.
`round`([decimals])	Round each value in a Series to the given number of decimals.
`rpow`(other)	Return Reverse Exponential power of series and other, element-wise (binary operator **).
`rsub`(other)	Return Reverse Subtraction of series and other, element-wise (binary operator -).
`rtruediv`(other)	Return Reverse Floating division of series and other, element-wise (binary operator /).
`sample`([n, frac, replace, random_state])	Return a random sample of items from an axis of object.
`sem`([axis, ddof, numeric_only])	Return unbiased standard error of the mean over requested axis.
`shift`([periods, fill_value])	Shift Series/Index by desired number of periods.
`skew`([axis, numeric_only])	Return unbiased skew normalized by N-1.
`sort_index`([axis, level, ascending, …])	Sort object by labels (along an axis)
`sort_values`([ascending, inplace, na_position])	Sort by the values.
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`std`([axis, ddof, numeric_only])	Return sample standard deviation.
`sub`(other)	Return Subtraction of series and other, element-wise (binary operator -).
`subtract`(other)	Return Subtraction of series and other, element-wise (binary operator -).
`sum`([axis, numeric_only, min_count])	Return the sum of the values.
`swapaxes`(i, j[, copy])	Interchange axes and swap values axes appropriately.
`swaplevel`([i, j, copy])	Swap levels i and j in a MultiIndex.
`tail`([n])	Return the last n rows.
`take`(indices)	Return the elements in the given positional indices along an axis.
`to_clipboard`([excel, sep])	Copy object to the system clipboard.
`to_csv`([path, sep, na_rep, columns, header, …])	Write object to a comma-separated values (csv) file.
`to_dataframe`([name])	Convert Series to DataFrame.
`to_dict`([into])	Convert Series to {label -> value} dict or dict-like object.
`to_excel`(excel_writer[, sheet_name, na_rep, …])	Write object to an Excel sheet.
`to_frame`([name])	Convert Series to DataFrame.
`to_json`([path, compression, num_files, …])	Convert the object to a JSON string.
`to_latex`([buf, columns, col_space, header, …])	Render an object to a LaTeX tabular environment table.
`to_list`()	Return a list of the values.
`to_markdown`([buf, mode])	Print Series or DataFrame in Markdown-friendly format.
`to_numpy`()	A NumPy ndarray representing the values in this DataFrame or Series.
`to_pandas`()	Return a pandas Series.
`to_string`([buf, na_rep, float_format, …])	Render a string representation of the Series.
`tolist`()	Return a list of the values.
`transform`(func[, axis])	Call `func` producing the same type as self with transformed values and that has the same axis length as input.
`transpose`(args, *kwargs)	Return the transpose, which is by definition self.
`truediv`(other)	Return Floating division of series and other, element-wise (binary operator /).
`truncate`([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.
`unique`()	Return unique values of Series object.
`unstack`([level])	Unstack, a.k.a.
`update`(other)	Modify Series in place using non-NA values from passed Series.
`value_counts`([normalize, sort, ascending, …])	Return a Series containing counts of unique values.
`var`([axis, ddof, numeric_only])	Return unbiased variance.
`where`(cond[, other])	Replace values where the condition is False.
`xs`(key[, level])	Return cross-section from the Series.

Attributes

`T`	Return the transpose, which is by definition self.
`at`	Access a single value for a row/column label pair.
`axes`	Return a list of the row axis labels.
`dtype`	Return the dtype object of the underlying data.
`dtypes`	Return the dtype object of the underlying data.
`empty`	Returns true if the current object is empty.
`hasnans`	Return True if it has any missing values.
`iat`	Access a single value for a row/column pair by integer position.
`iloc`	Purely integer-location based indexing for selection by position.
`index`	The index (axis labels) Column of the Series.
`is_monotonic`	Return boolean if values in the object are monotonically increasing.
`is_monotonic_decreasing`	Return boolean if values in the object are monotonically decreasing.
`is_monotonic_increasing`	Return boolean if values in the object are monotonically increasing.
`is_unique`	Return boolean if values in the object are unique
`loc`	Access a group of rows and columns by label(s) or a boolean Series.
`name`	Return name of the Series.
`ndim`	Return an int representing the number of array dimensions.
`shape`	Return a tuple of the shape of the underlying data.
`size`	Return an int representing the number of elements in this object.
`values`	Return a Numpy representation of the DataFrame or the Series.

Series

pyspark.pandas.Series.index