PDF

Actions for handling PDF documents

Module: implementation.datasources.pdf



PDFRead

Read PDF document. Subclass of Action. Type of Action[Dict[str, Any], Dict[int, Page]]


execute

Arguments:

  • input_data (Dict[str, Any]): Expected keys:

    • "path_to_file" (str): Path to PDF file;

    • "pages" (List[int], optional): Pages to read. If not provided, read complete document;

Returns:

  • Dict[int, Page]: PDF document pages;




PDFExtractTexts

Extract texts from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, str]]


__init__

Arguments:

  • tables (bool, optional): If equals to True, include text from tables. Defaults to True.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


execute

Arguments:

  • input_data (Dict[int, Page]): PDF document pages.

Returns:

  • Dict[int, str]: Extracted texts.




PDFFindTables

Find tables on pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Table]]]]


execute

Arguments:

  • input_data (Dict[int, Page]): PDF document pages.

Returns:

  • Dict[int, List[Table]]: Founded tables.




PDFExtractTables

Extract tables from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Table]]]]


execute

Arguments:

  • input_data (Dict[int, Page]): PDF document pages.

Returns:

  • Dict[int, Any]: Extracted tables.




PDFExtractImages

Extract images from pages. Subclass of Action. Type of Action[Dict[int, Page], Dict[int, List[Image.Image]]]]


execute

Arguments:

  • input_data (Dict[int, Page]): PDF document pages.

Returns:




PDFWrite

Write PDF file. Subclass of Action. Type of Action[Dict[str, Any], None]


execute

Arguments:

  • input_data (Dict[str, Any]): Data to process. Expected keys:

    • "path_to_file" (str): Path to audio file;

    • "page_width" (float, optional): Page width in cm;

    • "page_height" (float, optional): Page height in cm;

    • "x_padding" (float, optional): x padding in cm;

    • "y_padding" (float, optional): y padding in cm;

    • "text" (str): text to write;



Last updated