-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression in page layout that sometimes returned text lines out of order #659
Conversation
@pietermarsman @mikkkee take a look, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, and thanks for the detailed description of the change - very helpful!
One small change needed before this will get merged:
Could you add an entry to the CHANGELOG.md
to reflect your fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The py36 build failure here is definitely not the fault of this PR. Comparing https://github.com/pdfminer/pdfminer.six/runs/3846904701 (successful build) and https://github.com/pdfminer/pdfminer.six/pull/659/checks?check_run_id=3847247616 (unsuccessful build) there seem to be different versions of Python 3.6 and pip being fetched, which in turn causes cryptography to try to build Rust code. |
Thanks again! |
* develop: Check blackness in github actions (pdfminer#711) Changed `log.info` to `log.debug` in six files (pdfminer#690) Update README.md batch for Continuous integration Update actions.yml so that it will run for all PR's Update development tools: travis ci to github actions, tox to nox, nose to pytest (pdfminer#704) Added feature: page labels (pdfminer#680) Remove obsolete returns (pdfminer#707) Revert "Remove obsolete returns" Remove obsolete returns Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True (pdfminer#684) Use logger.warn instead of warnings.warn if warning cannot be prevented by user (pdfminer#673) Change log.info into log.debug to make pdfinterp.py less verbose Fix regression in page layout that sometimes returned text lines out of order (pdfminer#659) export type annotations in package (pdfminer#679) fix typos in PR template (pdfminer#681) pdf2txt: clean up construction of LAParams from arguments (pdfminer#682) Fixes jbig2 writer to write valid jb2 files Add support for JPEG2000 image encoding Added test case for CCITTFaxDecoder (pdfminer#700) Attempt to handle decompression error on some broken PDF files (pdfminer#637)
This fixes issue #658: a regression in text line layout (LTLayoutContainer.group_textboxes) that caused it to merge lines of text into text boxes out of order, by merging two lines as adjacent elements even when there was a third line in between the two.
PR #315 changed the first element of each distance tuple from an int 0 or 1, where 0 was used for initial entries and 1 for subsequently-added groups, to a bool
is_first
where True was used for initial entries. This change broke the desired sort order, because 0 < 1 but True > False, and that was fixed by the followup commit 2bee7d8 inverting the meaning of the bool, but that commit missed adding anot
where the bool (now correctly calledskip_isany
) was used in an if expression. The fix is just to add thatnot
.