Processors#

Data Processors#

Text#

xpark.dataset.TextFuzzyDedup([...])

Text fuzzy deduplication using MinHashLSH.

xpark.dataset.TextExactSubstringDedup([...])

Implements a scalable, distributed exact substring deduplication algorithm using a divide-and-conquer strategy.

Audio#

xpark.dataset.AudioCompute(*args, **kwargs)

Image#

xpark.dataset.ImageCompute(*args, **kwargs)

Video#

xpark.dataset.VideoCompute(*args, **kwargs)

AI Processors#

Text#

xpark.dataset.TextEmbedding([_local_model, ...])

Text Embedding processor for CPU, GPU and remote Http requests.

xpark.dataset.TextSummarize(*[, max_words, ...])

TextSummarize processor provides a highly condensed summary of the text.

xpark.dataset.TextMask(labels, /, *, ...[, ...])

TextMask processor replaces sensitive information in the original text with [MASKED] according to the labels.

xpark.dataset.TextGenerate(*, base_url, model)

TextGenerate processor generates content based on the input parameters.

xpark.dataset.TextClassify(labels, /, *, ...)

TextClassify processor extracts the single label string that best matches the text content from the given labels.

xpark.dataset.TextFixGrammar(*, base_url, model)

TextFixGrammar processor corrects grammar mistakes in the input text using LLM model.

xpark.dataset.TextSimilarity(target, /, *[, ...])

TextSimilarity processor calculates similarity between texts using LLM model.

xpark.dataset.TextTranslate([to_lang, ...])

TextTranslate processor responsible for translating the text into the target language.

xpark.dataset.TextSentiment(*[, sentiments, ...])

TextSentiment processor for text sentiment analysis.

xpark.dataset.TextExtract(labels, /, *[, ...])

TextExtract processor extracts structured information from text based on user-defined

xpark.dataset.TextPredicateEval(predicate, ...)

TextPredicateEval processor evaluates whether input texts satisfy a given predicate condition.

Image#

xpark.dataset.ImageNSFWScore([_local_model])

Image NSFW score calculation processor for CPU, GPU

xpark.dataset.ImageTextSimilarityScore(text)

Image text similarity score calculation processor for CPU, GPU

xpark.dataset.ImageAestheticScore([...])

Image aesthetic score calculation processor for CPU, GPU.

Audio#

xpark.dataset.SpeechToText([_local_model, ...])

Speech to text processor for CPU, GPU and remote Http requests.