xpark.dataset.aggregate.Std#

class xpark.dataset.aggregate.Std(on: str | None = None, ddof: int = 1, ignore_nulls: bool = True, alias_name: str | None = None)#

Defines standard deviation aggregation.

Uses Welford’s online algorithm for numerical stability. This method computes the standard deviation in a single pass. Results may differ slightly from libraries like NumPy or Pandas that use a two-pass algorithm but are generally more accurate.

See: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm

Example

import xpark
from xpark.dataset.aggregate import Std

ds = xpark.dataset.from_range(100)
# Schema: {'id': int64}
ds = ds.add_column("group_key", lambda x: x % 3)
# Schema: {'id': int64, 'group_key': int64}

# Calculating the standard deviation per group:
result = ds.groupby("group_key").aggregate(Std(on="id")).take_all()
# result: [{'group_key': 0, 'std(id)': ...},
#          {'group_key': 1, 'std(id)': ...},
#          {'group_key': 2, 'std(id)': ...}]

Parameters:

on – The name of the column to calculate standard deviation on.
ddof – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N is the number of elements. Default is 1.
ignore_nulls – Whether to ignore null values. Default is True.
alias_name – Optional name for the resulting column.

Methods

`aggregate_block`(block)	Aggregates data within a single block.
`combine`(current_accumulator, new)	Combines a new partial aggregation result with the current accumulator.
`finalize`(accumulator)	Transforms the final accumulated state into the desired output.
`get_agg_name`()	Return the agg name (e.g., 'sum', 'mean', 'count').
`get_target_column`()

xpark.dataset.aggregate.Std#

This Page