Actions for handling PDF documents
Module: implementation.datasources.pdf
PDFRead
Read PDF document. Subclass of Action. Type of Action[Dict[str, Any], Dict[int, Page]]
execute
Arguments:
input_data (Dict[str, Any]): Expected keys:
"path_to_file" (str): Path to PDF file;
"pages" (List[int], optional): Pages to read. If not provided, read complete document;
Returns:
Dict[int, Page]: PDF document pages;
PDFExtractTexts
Extract texts from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, str]]
__init__
Arguments:
tables (bool, optional): If equals to True, include text from tables. Defaults to True.
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
execute
Arguments:
input_data (Dict[int, Page]): PDF document pages.
Returns:
Dict[int, str]: Extracted texts.
PDFFindTables
Find tables on pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Table]]]]
execute
Arguments:
input_data (Dict[int, Page]): PDF document pages.
Returns:
Dict[int, List[Table]]: Founded tables.
PDFExtractTables
Extract tables from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Table]]]]
execute
Arguments:
input_data (Dict[int, Page]): PDF document pages.
Returns:
Dict[int, Any]: Extracted tables.
PDFExtractImages
Extract images from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Image.Image]]]]
execute
Arguments:
input_data (Dict[int, Page]): PDF document pages.
Returns:
Dict[int, List[Image.Image]]: Extracted images.
PDFWrite
Write PDF file. Subclass of Action. Type of Action[Dict[str, Any], None]
execute
Arguments:
input_data (Dict[str, Any]): Data to process. Expected keys:
"path_to_file" (str): Path to audio file;
"page_width" (float, optional): Page width in cm;
"page_height" (float, optional): Page height in cm;
"x_padding" (float, optional): x padding in cm;
"y_padding" (float, optional): y padding in cm;
"text" (str): text to write;
Last updated