Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: move logger error to debug when pdfminer extract fails #3028

Merged
merged 5 commits into from
May 31, 2024

Conversation

yuming-long
Copy link
Contributor

@yuming-long yuming-long commented May 15, 2024

Summary

We are seeing logger error Invalid dictionary construct for hosted APIs, move this logger error to debug level - we still continue partition when pdfminer text extraction fails as before (just don't throw the log error anymore)

Test

I was able to reproduce the logger error with an internal only file (please DM me if needed) and the error trace look like

 File "/Users/yumingl/develops/unstructured/unstructured/partition/pdf.py", line 709, in _process_pdfminer_pages
    annotation_list = get_uris(page.annots, height, coordinate_system, page_number)
  File "/Users/yumingl/develops/unstructured/unstructured/partition/pdf.py", line 1049, in get_uris
    resolved_annots = annots.resolve()
...

we also won't be able to repair pdf structure on get_uris (not a page level) so move this exception to debug level.

@@ -269,7 +269,7 @@ def partition_pdf_or_image(
isinstance(el, Text) and el.text.strip() for el in extracted_elements
)
except Exception as e:
logger.error(e)
logger.debug(e)
Copy link
Contributor

@cragwolfe cragwolfe May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the below line still be a warning, or info?

@yuming-long yuming-long added this pull request to the merge queue May 31, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch May 31, 2024
@yuming-long yuming-long added this pull request to the merge queue May 31, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 31, 2024
@yuming-long yuming-long added this pull request to the merge queue May 31, 2024
Merged via the queue into main with commit 4a96d54 May 31, 2024
46 checks passed
@yuming-long yuming-long deleted the yuming/move_logger_error_from_text_extract branch May 31, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants