TransformersVisualQandA

Visual Q&A task

Subclass of Task.

Module: implementation.tasks

Default predictor

This task uses TransformersModel by default with this configuration:

model = AutoModelForImageClassification.from_pretrained(
    "dandelin/vilt-b32-finetuned-vqa"
)
predictor=TransformersModel(
    TransformersModelConfig(
        model=model
    ),
    input_class=TransformersImageModelInput,
    output_class=TransformersLogitsOutput,
)

See:

Methods and properties

Main methods and properties


__init__

Arguments:




TransformersVisualQandAOutput

Subclass of IOModel.


__init__

Arguments:

  • answer (Optional[Tuple[str, float]])




TransformersVisualQandAMultianswerOutput

Subclass of IOModel.


__init__

Arguments:

  • answers (Dict[str, float])




VisualQandAPreprocessor

Prepare model input. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


__init__

Arguments:

  • processor (Processor): Feature extractor.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "image" (Image.Image): Image to analyze;

    • "question" (str): Question to answer;

Returns:

  • Dict[str, Any]: Expected keys:

    • "input_ids" (Any);

    • "token_type_ids" (Any);

    • "attention_mask" (Any);

    • "pixel_values" (Any);

    • "pixel_mask" (Any);




VisualQandASingleAnswerPostprocessor

Process model output. Subclass of VisualQandAMultianswerPostprocessor.


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "logits" (Any): Model output;

Returns:

  • Dict[str, Any]: Expected keys:

    • "answer" (Optional[Tuple[str, float]]): Answer with highest score, if score higher or equal to threshold, else - None.




VisualQandAMultianswerPostprocessor

Process model output. Subclass of Action. Type of Action[Dict[str, Any], Dict[str, Any]].


__init__

Arguments:

  • labels (List[str]): Labels for classification.

  • threshold (float): Labels threshold score. Defaults to 0.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "logits" (Any): Model output;

Returns:

  • Dict[str, Any]: Expected keys:

    • "answers" (Dict[str, float]): Classified labels and scores.



Last updated