TokenSearcherTextCleaner

Task for text cleaning

Task for removing uninfomative data from text. This task uses TokenSearcherPredictor by default. For more details, see:

TokenSearcherPredictor

Subclass of NERTask.

Module: implementation.tasks

Methods and properties

Main methods and properties


__init__

Arguments:




TokenSearcherTextCleanerInput

Subclass of IOModel.


__init__

Arguments:

  • text (str): Text to clean.




TokenSearcherTextCleanerOutput

Subclass of NEROutput. Type of NEROutput[Entity].


__init__

Arguments:

  • text (str): Input text.

  • cleaned_text (Optional[str], optional): Cleaned text. Equals to None, if clean was set to False in default postprocessor.

  • output (List[Entity]): Uninformative data.




TokenSearcherTextCleanerPreprocessor

Create prompt with providied text. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "text" (str): Text to process;

Returns:

  • Dict[str, Any]: Expected keys:

    • "inputs" (List[str]): Model inputs;




TokenSearcherTextCleanerPostprocessor

Format output and clean text if specified. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


__init__

Arguments:

  • clean (bool): Remove uninformative data from text. Defaults to False.

  • threshold (float): Data threshold score. Defaults to 0.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "output" (List[List[Dict[str, Any]]]): Model output;

    • "inputs" (List[str]): Model inputs;

    • "text" (str): Processed text;

Returns:

  • Dict[str, Any]: Expected keys:

    • "text" (str): Processed text;

    • "output" (List[Entity]): uninformative data;

    • "cleaned_text" (Optional[str], optional): Cleaned text. Equals to None, if clean was set to False.



Last updated