xpark.dataset.aggregate.Unique#

class xpark.dataset.aggregate.Unique(on: str | None = None, ignore_nulls: bool = False, alias_name: str | None = None, encode_lists: bool | ListEncodingMode | None = None)#

Defines unique aggregation.

Example

import xpark
from xpark.dataset.aggregate import Unique

ds = xpark.dataset.from_range(100)
ds = ds.add_column("group_key", lambda x: x % 3)

# Calculating the unique values per group:
result = ds.groupby("group_key").aggregate(Unique(on="id")).take_all()
# result: [{'group_key': 0, 'unique(id)': ...},
#          {'group_key': 1, 'unique(id)': ...},
#          {'group_key': 2, 'unique(id)': ...}]

Parameters:

on – The name of the column from which to collect unique values.
ignore_nulls – Whether to ignore null values when collecting unique items. Default is True (nulls are excluded).
alias_name – Optional name for the resulting column.
encode_lists – If True, encode list elements. If False, encode whole lists (i.e., the entire list is considered as a single object). False by default. Note that this is a top-level flatten (not a recursive flatten) operation.

Methods

`aggregate_block`(block)	Aggregates data within a single block.
`combine`(current_accumulator, new)	Combines a new partial aggregation result with the current accumulator.
`finalize`(accumulator)	Transforms the final accumulated state into the desired output.
`get_agg_name`()	Return the agg name (e.g., 'sum', 'mean', 'count').
`get_target_column`()

xpark.dataset.aggregate.Unique#

This Page