xpark.dataset.aggregate.ZeroPercentage#

class xpark.dataset.aggregate.ZeroPercentage(on: str, ignore_nulls: bool = True, alias_name: str | None = None)#

Calculates the percentage of zero values in a numeric column.

This aggregation computes the percentage of zero values in a numeric dataset column. It can optionally ignore null values when calculating the percentage. The result is a percentage value between 0.0 and 100.0, where 0.0 means no zero values and 100.0 means all non-null values are zero.

Example

import xpark
from xpark.dataset.aggregate import ZeroPercentage

# Create a dataset with some zero values
ds = xpark.dataset.from_items([
    {"value": 0}, {"value": 1}, {"value": 0},
    {"value": 3}, {"value": 0}
])

# Calculate zero value percentage
result = ds.aggregate(ZeroPercentage(on="value"))
# result: 60.0 (3 out of 5 values are zero)

# With null values and ignore_nulls=True (default)
ds = xpark.dataset.from_items([
    {"value": 0}, {"value": None}, {"value": 0},
    {"value": 3}, {"value": 0}
])
result = ds.aggregate(ZeroPercentage(on="value", ignore_nulls=True))
# result: 75.0 (3 out of 4 non-null values are zero)

# Using with groupby
ds = xpark.dataset.from_items([
    {"group": "A", "value": 0}, {"group": "A", "value": 1},
    {"group": "B", "value": 0}, {"group": "B", "value": 0}
])
result = ds.groupby("group").aggregate(ZeroPercentage(on="value")).take_all()
# result: [{'group': 'A', 'zero_pct(value)': 50.0},
#          {'group': 'B', 'zero_pct(value)': 100.0}]
Parameters:
  • on – The name of the column to calculate zero value percentage on. Must be a numeric column.

  • ignore_nulls – Whether to ignore null values when calculating the percentage. If True (default), null values are excluded from both numerator and denominator. If False, null values are included in the denominator but not the numerator.

  • alias_name – Optional name for the resulting column. If not provided, defaults to “zero_pct({column_name})”.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

aggregate_block(block)

Aggregates data within a single block.

combine(current_accumulator, new)

Combines a new partial aggregation result with the current accumulator.

finalize(accumulator)

Transforms the final accumulated state into the desired output.

get_agg_name()

Return the agg name (e.g., 'sum', 'mean', 'count').

get_target_column()