xpark.dataset.TextTranslate#

class xpark.dataset.TextTranslate(to_lang: str = 'en-US', /, *, base_url: str, model: str, from_lang: str = 'AUTO_DETECT', api_key: str = 'NOT_SET', max_qps: int | None = None, max_retries: int = 0, fallback_response: str | None = None, **kwargs: dict[str, Any])[source]#

TextTranslate processor responsible for translating the text into the target language.

Parameters:

to_lang – The target language to translate to. Default is “en_US”. It is recommended to specify the language using either BCP 47 Language Tags or the ISO 639-1 standard. The set of supported languages depends on the capabilities of the LLM model.
base_url – The base URL of the LLM server.
model – The request model name.
api_key – The request API key.
max_qps – The maximum number of requests per second.
max_retries – The maximum number of retries per request in the event of failures. We retry with exponential backoff upto this specific maximum retries.
fallback_response – The response value to return when the LLM request fails. If set to None, the exception will be raised instead.
**kwargs – Keyword arguments to pass to the openai.AsyncClient.chat.completions.create API.

Examples

from xpark.dataset.expressions import col
from xpark.dataset import TextTranslate, from_items

ds = from_items(["Today is a good day."])
ds = ds.with_column(
    "translated",
    TextTranslate(
        model="deepseek-v3-0324",
        base_url=os.getenv("LLM_ENDPOINT"),
        api_key=os.getenv("LLM_API_KEY"),
    )
    .options(num_workers={"IO": 1}, batch_size=1)
    .with_column(col("item")),
)

print(ds.take_all())

Methods

`__call__`(texts)	Call self as a function.
`options`(**kwargs)
`with_column`(texts)

__call__(texts: pa.ChunkedArray) → pa.Array#: Call self as a function.

options(**kwargs: Unpack[ExprUDFOptions]) → Self#

with_column(texts: pa.ChunkedArray) → pa.Array#

xpark.dataset.TextTranslate#

This Page