xpark.dataset.aggregate.Min#

class xpark.dataset.aggregate.Min(on: str | None = None, ignore_nulls: bool = True, alias_name: str | None = None, zero_factory: Callable[[], ~ray.data.aggregate.SupportsRichComparisonType]=<function Min.<lambda>>)#

Defines min aggregation.

Example

import xpark
from xpark.dataset.aggregate import Min

ds = xpark.dataset.from_range(100)
# Schema: {'id': int64}
ds = ds.add_column("group_key", lambda x: x % 3)
# Schema: {'id': int64, 'group_key': int64}

# Finding the minimum value per group:
result = ds.groupby("group_key").aggregate(Min(on="id")).take_all()
# result: [{'group_key': 0, 'min(id)': 0},
#          {'group_key': 1, 'min(id)': 1},
#          {'group_key': 2, 'min(id)': 2}]
Parameters:
  • on – The name of the column to find the minimum value from. Must be provided.

  • ignore_nulls – Whether to ignore null values. If True (default), nulls are skipped. If False, the minimum will be null if any value in the group is null (for most data types, or follow type-specific comparison rules with nulls).

  • alias_name – Optional name for the resulting column.

  • zero_factory – A callable that returns the initial “zero” value for the accumulator. For example, for a float column, this would be lambda: float(“+inf”). Default is lambda: float(“+inf”).

Methods

aggregate_block(block)

Aggregates data within a single block.

combine(current_accumulator, new)

Combines a new partial aggregation result with the current accumulator.

finalize(accumulator)

Transforms the final accumulated state into the desired output.

get_agg_name()

Return the agg name (e.g., 'sum', 'mean', 'count').

get_target_column()