xpark.dataset.aggregate.Sum#
- class xpark.dataset.aggregate.Sum(on: str | None = None, ignore_nulls: bool = True, alias_name: str | None = None)#
Defines sum aggregation.
Example
import xpark from xpark.dataset.aggregate import Sum ds = xpark.dataset.from_range(100) # Schema: {'id': int64} ds = ds.add_column("group_key", lambda x: x % 3) # Schema: {'id': int64, 'group_key': int64} # Summing all rows per group: result = ds.aggregate(Sum(on="id")) # result: {'sum(id)': 4950}
- Parameters:
on – The name of the numerical column to sum. Must be provided.
ignore_nulls – Whether to ignore null values during summation. If True (default), nulls are skipped. If False, the sum will be null if any value in the group is null.
alias_name – Optional name for the resulting column.
Methods
aggregate_block(block)Aggregates data within a single block.
combine(current_accumulator, new)Combines a new partial aggregation result with the current accumulator.
finalize(accumulator)Transforms the final accumulated state into the desired output.
get_agg_name()Return the agg name (e.g., 'sum', 'mean', 'count').
get_target_column()