TransformersTextEmbedding

Text embedding task

Subclass of Task.

Module: implementation.tasks

Default predictor

This task uses TransformersModel by default with this configuration:

model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
predictor = TransformersModel(
    TransformersModelConfig(
        model=model
    ),
    input_class=TransformersEmbeddingInput,
    output_class=TransformersEmbeddingOutput,
)

See:

Methods and properties

Main methods and properties


__init__

Arguments:

  • predictor (Predictor[Any, Any], optional): Predictor that will be used in task. If equals to None, default predictor will be used. Defaults to None.

  • preprocess (Optional[Component], optional): Component executed before predictor. If equals to None, default component will be used. Defaults to None. Default component: EmbeddingPreprocessor If default chain is used, EmbeddingPreprocessor will use AutoTokenizer from predictor model.

  • postprocess (Optional[Component], optional): Component executed after predictor. If equals to None, default component will be used. Defaults to None. Default component: EmbeddingPostprocessor | ConvertEmbeddingsToNumpyArrays

  • input_class (Type[Input], optional): Class for input validation. Defaults to TextEmbeddingInput.

  • output_class (Type[Output], optional): Class for output validation. Defaults to TextEmbeddingOutput.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.




TextEmbeddingInput

Subclass of IOModel.


__init__

Arguments:

  • texts (List[str]): Texts to process.




TextEmbeddingOutput

Subclass of IOModel.


__init__

Arguments:

  • embeddings (Any)




EmbeddingPreprocessor

Prepare model input. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


__init__

Arguments:

  • tokenizer (Tokenizer): Tokenizer.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "texts" (List[str]): Texts to process;

Returns:

  • Dict[str, Any]: Expected keys:

    • "encodings" (Any): Model inputs;




EmbeddingPostprocessor

Process model output. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "last_hidden_state" (Any): Model output;

Returns:

  • Dict[str, Any]: Expected keys:

    • "embeddings" (Any);




ConvertEmbeddingsToNumpyArrays

Convert embeddings to numpy arrays. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "embeddings" (Any);

Returns:

  • Dict[str, Any]: Expected keys:

    • "embeddings" (Any);



Last updated