xpark.dataset.TextPredicateEval#
- class xpark.dataset.TextPredicateEval(predicate: str, /, *, base_url: str, model: str, api_key: str = 'NOT_SET', max_qps: int | None = None, max_retries: int = 0, fallback_response: bool | None = None, **kwargs: dict[str, Any])[source]#
TextPredicateEval processor evaluates whether input texts satisfy a given predicate condition.
This processor uses a Large Language Model (LLM) to determine if each text in a column matches the specified predicate, returning True or False for each input.
- Parameters:
predicate – The predicate to evaluate.
base_url – The base URL of the LLM server.
model – The request model name.
api_key – The request API key.
max_qps – The maximum number of requests per second.
max_retries – The maximum number of retries per request in the event of failures. We retry with exponential backoff upto this specific maximum retries.
fallback_response – The response value to return when the LLM request fails. If set to None, the exception will be raised instead.
**kwargs – Keyword arguments to pass to the openai.AsyncClient.chat.completions.create API.
Examples
from xpark.dataset.expressions import col from xpark.dataset import TextPredicateEval, from_items ds = from_items(["The iconic tower in the capital of France is illuminated with lights."]) ds = ds.with_column( "eval", TextPredicateEval( predicate="The text describes Paris", model="deepseek-v3-0324", base_url=os.getenv("LLM_ENDPOINT"), api_key=os.getenv("LLM_API_KEY"), ) .options(num_workers={"IO": 1}, batch_size=1) .with_column(col("item")), ) print(ds.take_all())
Methods
__call__(texts)Call self as a function.
options(**kwargs)with_column(texts)- __call__(texts: pa.ChunkedArray) pa.Array#
Call self as a function.
- options(**kwargs: Unpack[ExprUDFOptions]) Self#
- with_column(texts: pa.ChunkedArray) pa.Array#