PDF document processing
This example shows multimodal pipeline (processes visual/structured data and text) that processes PDF document.
Coplete source code for example can be found in programs.multimodal.pdf_document_processing.
Imports that will be used
from typing import Any, Dict, List
import logging
import pathlib
PATH = pathlib.Path(__file__).parent.resolve()
from utca.core import (
Evaluator,
ForEach,
SetMemory,
MemorySetInstruction,
GetMemory,
MemoryGetInstruction,
Log,
Flush,
AddData,
ExecuteFunction,
)
from utca.implementation.datasources.pdf import (
PDFRead, PDFExtractTexts, PDFExtractImages, PDFFindTables
)
from utca.implementation.tasks import (
TransformersTextSummarization,
TransformersDocumentQandA
)Utilities functions for custom logic
Pipelines
ExecutionSchema for processing visual data:
The TransformersDocumentQandA task is utilized for processing visual data because it is effective at handling the structural data typically found in documents. About default parameters, see:
TransformersDocumentQandAThe set_name method is utilized to enhance the clarity and structure of step-by-step execution logging.
ExecutionSchema for processing images:
This pipeline extracts images from pages, processes them, and saves their descriptions in memory for future formatting. The process_visual_data pipeline is executed for each found image.
The set_name method is utilized to enhance the clarity and structure of step-by-step execution logging.
ExecutionSchema for processing tables:
Similarly to image_processing pipeline, this pipeline extracts tables from pages, processes them, and saves their descriptions in memory for future formatting. The process_visual_data pipeline is executed for each found table.
The set_name method is utilized to enhance the clarity and structure of step-by-step execution logging.
ExecutionSchema for text summarization:
This pipeline extracts texts from pages, processes them with TransformersTextSummarizationTask, and saves text summaries in memory for future formatting.
The set_name method is utilized to enhance the clarity and structure of step-by-step execution logging.
ExecutionSchema for main pipeline:
Main pipeline that combines described above.
Run program
We wrapped pipeline in Evaluator and provided logging_level to log messages:
Inputs
"path_to_file": path that directs to a file that should be in programs.multimodal.pdf_document_processing.
"pages": pages that will be used.
Results
The results should include formatted output containing descriptions for images and tables, as well as text summaries for each page.
Last updated