xpark.dataset.aggregate.Quantile#

class xpark.dataset.aggregate.Quantile(on: str | None = None, q: float = 0.5, ignore_nulls: bool = True, alias_name: str | None = None)#

Defines Quantile aggregation.

Example

import xpark
from xpark.dataset.aggregate import Quantile

ds = xpark.dataset.from_range(100)
# Schema: {'id': int64}
ds = ds.add_column("group_key", lambda x: x % 3)
# Schema: {'id': int64, 'group_key': int64}

# Calculating the 50th percentile (median) per group:
result = ds.groupby("group_key").aggregate(Quantile(q=0.5, on="id")).take_all()
# result: [{'group_key': 0, 'quantile(id)': ...},
#          {'group_key': 1, 'quantile(id)': ...},
#          {'group_key': 2, 'quantile(id)': ...}]
Parameters:
  • on – The name of the column to calculate the quantile on. Must be provided.

  • q – The quantile to compute, which must be between 0 and 1 inclusive. For example, q=0.5 computes the median.

  • ignore_nulls – Whether to ignore null values. Default is True.

  • alias_name – Optional name for the resulting column.

Methods

aggregate_block(block)

Aggregates data within a single block.

combine(current_accumulator, new)

Combines a new partial aggregation result with the current accumulator.

finalize(accumulator)

Transforms the final accumulated state into the desired output.

get_agg_name()

Return the agg name (e.g., 'sum', 'mean', 'count').

get_target_column()