# TokenSearcherTextCleaner

Task for removing uninfomative data from text. This task uses[ **TokenSearcherPredictor**](https://utca.knowledgator.com/predictors/tokensearcherpredictor)[ ](https://utca.knowledgator.com/predictors/comprehenditpredictor)by default. For more details, see:

{% content-ref url="../predictors/tokensearcherpredictor" %}
[tokensearcherpredictor](https://utca.knowledgator.com/predictors/tokensearcherpredictor)
{% endcontent-ref %}

Subclass of [**NERTask**](https://utca.knowledgator.com/task#nertask)**.**&#x20;

## Module: [implementation](https://utca.knowledgator.com/framework-structure#implementation).tasks

## Methods and properties

Main methods and properties

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**predictor**</mark>**&#x20;(**[**Predictor**](https://utca.knowledgator.com/predictors/predictor)**\[Any, Any], optional):** Predictor that will be used in task. If equals to None, default [**TokenSearcherPredictor**](https://utca.knowledgator.com/predictors/tokensearcherpredictor) will be used. Defaults to None.
* <mark style="color:orange;">**preprocess**</mark>**&#x20;(Optional\[**[**Component**](https://utca.knowledgator.com/core/component)**], optional):** Component executed before predictor. If equals to None, default component will be used. Defaults to None.\
  \
  Default component: \
  [**TokenSearcherTextCleanerPreprocessor**](#tokensearchertextcleanerpreprocessor)
* <mark style="color:orange;">**postprocess**</mark>**&#x20;(Optional\[**[**Component**](https://utca.knowledgator.com/core/component)**], optional):** Component executed after predictor. If equals to None, default component will be used. Defaults to None.\
  \
  Default component: \
  [**TokenSearcherTextCleanerPostprocessor**](#tokensearchertextcleanerpostprocessor)
* <mark style="color:orange;">**input\_class**</mark>**&#x20;(Type\[**[**Input**](https://utca.knowledgator.com/core/schemas#input)**], optional):** Class for input validation. Defaults to [**TokenSearcherTextCleanerInput**](#tokensearchertextcleanerinput)**.**
* <mark style="color:orange;">**output\_class**</mark>**&#x20;(Type\[**[**NEROutputType**](https://utca.knowledgator.com/task#neroutputtype)**], optional):** Class for output validation. Defaults to [**TokenSearcherTextCleanerOutput**](#tokensearchertextcleaneroutput)**.**
* <mark style="color:orange;">**name**</mark>**&#x20;(Optional\[str], optional):** Name for identification. If equals to None, class name will be used. Defaults to None.

***

***

***

## <mark style="color:green;">TokenSearcherTextCleanerInput</mark>

Subclass of [**IOModel**](https://utca.knowledgator.com/core/schemas#iomodel)**.**

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**text**</mark>**&#x20;(str):** Text to clean.

***

***

***

## <mark style="color:green;">TokenSearcherTextCleanerOutput</mark>

Subclass of [**NEROutput**](https://utca.knowledgator.com/task#neroutput). Type of [**NEROutput**](https://utca.knowledgator.com/task#neroutput)**\[**[**Entity**](https://utca.knowledgator.com/objects#entity)**].**

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**text**</mark>**&#x20;(str):** Input text.
* <mark style="color:orange;">**cleaned\_text**</mark>**&#x20;(Optional\[str], optional)**: Cleaned text. Equals to None, if **clean** was set to False in [default postprocessor](#tokensearchertextcleanerpostprocessor).
* <mark style="color:orange;">**output**</mark>**&#x20;(List\[**[**Entity**](https://utca.knowledgator.com/objects#entity)**]):** Uninformative data.

***

***

***

## <mark style="color:green;">TokenSearcherTextCleanerPreprocessor</mark>

Create prompt with providied text. Subclass of [**Action**](https://utca.knowledgator.com/core/action). Type of [**Action**](https://utca.knowledgator.com/core/action)**\[Dict\[str, Any], Dict\[str, Any]].**

***

### <mark style="color:blue;">execute</mark>

#### Arguments:

* <mark style="color:orange;">**input\_data**</mark>**&#x20;(Dict\[str, Any]):** \
  Expected keys:
  * <mark style="color:red;">**"text"**</mark>**&#x20;(str):** Text to process;

#### Returns:

* **Dict\[str, Any]:** \
  Expected keys:
  * <mark style="color:red;">**"inputs"**</mark>**&#x20;(List\[str]):** Model inputs;

***

***

***

## <mark style="color:green;">TokenSearcherTextCleanerPostprocessor</mark>

Format output and clean text if specified. Subclass of [**Action**](https://utca.knowledgator.com/core/action). Type of [**Action**](https://utca.knowledgator.com/core/action)**\[Dict\[str, Any], Dict\[str, Any]].**

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**clean**</mark>**&#x20;(bool):** Remove uninformative data from text. Defaults to False.
* <mark style="color:orange;">**threshold**</mark>**&#x20;(float):** Data threshold score. Defaults to 0.
* <mark style="color:orange;">**name**</mark>**&#x20;(Optional\[str], optional):** Name for identification. If equals to None, class name will be used. Defaults to None.

***

### <mark style="color:blue;">execute</mark>

#### Arguments:

* <mark style="color:orange;">**input\_data**</mark>**&#x20;(Dict\[str, Any]):** \
  Expected keys:
  * <mark style="color:red;">**"output"**</mark>**&#x20;(List\[List\[Dict\[str, Any]]]):** Model output;
  * <mark style="color:red;">**"inputs"**</mark>**&#x20;(List\[str]):** Model inputs;
  * <mark style="color:red;">**"text"**</mark>**&#x20;(str): P**rocessed text;

#### Returns:

* **Dict\[str, Any]:** \
  Expected keys:
  * <mark style="color:red;">**"text"**</mark>**&#x20;(str): P**rocessed text;
  * <mark style="color:red;">**"output"**</mark>**&#x20;(List\[**[**Entity**](https://utca.knowledgator.com/objects#entity)**]):** uninformative data;
  * <mark style="color:red;">**"cleaned\_text"**</mark>**&#x20;(Optional\[str], optional)**: Cleaned text. Equals to None, if **clean** was set to False.

***

***
