-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot decrypt PDF missing 'ID' in trailer #594
Comments
My PR needs more work; Will look into how PDF trailers work. |
from PyPDF2 import PdfFileReader
from PyPDF2.generic import ArrayObject, ByteStringObject, NameObject
with open('encrypted_doc_no_id.pdf', 'rb') as fp:
reader = PdfFileReader(fp)
print(reader.trailer)
reader.trailer[NameObject('/ID')] = ArrayObject([ByteStringObject(b''), ByteStringObject(b'')])
print(reader.trailer)
reader.decrypt('')
print(reader.getDocumentInfo())
page = reader.getPage(1)
print(page.extractText()) produces
and succesfully decrypts the PDF.
|
I've submitted a PR and await review. Thanks! |
Any update on this? I'd love to see this merged as it causes issues with some PDFs importing in paperless. |
I can reproduce this:
|
Closed by #595 |
Bug report
A malformed PDF with an 'Encrypt' key but no 'ID' key in trailer throws a KeyError. The PDFs can be opened without issue by e.g. evince. This is a somewhat similar issue of a malformed PDF causing a KeyError but otherwise not having a fatal error.
produces
Looking at the debug statements with
and running the above code confirms there is no 'ID' key in trailer
I'll submit a PR.
The text was updated successfully, but these errors were encountered: