Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for boxes_flow, allow None #396

Merged
merged 5 commits into from
Mar 26, 2020

Conversation

jstockwin
Copy link
Member

Pull request

Closes #395

  • Allows passing boxes_flow as None to disable the advanced layout analysis

  • Updates the documentation to this effect (if you want me to call it something other than "advanced layout analysis - let me know!)

  • I've added a private __validate method to LAParams to check that boxes flow is either None, or between -1 and +1. Happy to remove this if you think it's excessive..

  • I've NOT added any other checks about any other params to __validate. I'm happy to do this, but there don't seem to be any other specified ranges in the docs, so I'd just be checking the types, really.

How Has This Been Tested?

I checked before the change that passing None breaks, and after that it works (using extract_pages). The speed changes a lot without the extra layout analysis, so I could tell it was working. tox passes.

Checklist

  • I have added tests that prove my fix is effective or that my feature
    works
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial
    version
  • I have updated the README.md or I am verified that this
    is not necessary
  • I have updated the readthedocs documentation or I
    verified that this is not necessary
  • I have added a consice human-readable description of the change to
    CHANGELOG.md

Copy link
Member

@pietermarsman pietermarsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this beautifull small improvement. It is these things that will make it a pleasure to work with pdfminer.six.

I've added a bunch of minor requests.

CHANGELOG.md Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
@jstockwin jstockwin force-pushed the fix-395-boxes-flow branch from 36ce049 to 20bf807 Compare March 26, 2020 09:56
@jstockwin
Copy link
Member Author

@pietermarsman Thanks for the review - all makes sense.
I've rebased the first commit to resolve conflicts in CONTRIBUTING.MD, and then added a new commit which addresses your comments.
Please take another look when you get a chance.

@jstockwin jstockwin requested a review from pietermarsman March 26, 2020 09:58
Copy link
Member

@pietermarsman pietermarsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge it tonight.

If you have time: i've 2 tiny suggestions

pdfminer/layout.py Outdated Show resolved Hide resolved
pdfminer/layout.py Outdated Show resolved Hide resolved
@jstockwin
Copy link
Member Author

I've just noticed there are a few other places that the documentation needs updating - don't merge until I've fixed that :)

@jstockwin
Copy link
Member Author

Okay, now this should be good 👍

@pietermarsman
Copy link
Member

Wow: one of the tests failed because apparently tox doesn't work with python 3.4 anymore. In this PR we can solve it by specifying an older version of tox.

I'll see if we can drop support for python 3.4 in the future.

@jstockwin
Copy link
Member Author

Agree, except I think this project depends on tox_travis which just installed tox>=2.

This installed the latest version of tox, which is v3.14.6. Tox deprecated Python 3.4 in v3.14.1, but the deprecation says it won't actually get removed until the next major release. (tox changelog)

I don't think specifying a specific version of tox_travis will fix this, as it's the version of tox that matters... I guess I'll try pinning tox to a specific version and hopefully that'll help? Really I think these are issues for tox_travis/tox, though.

That said, I agree dropping python 3.4 support is a good idea.

@jstockwin
Copy link
Member Author

Okay, that worked - tests are now green ✔️

@pietermarsman pietermarsman merged commit e55560f into pdfminer:develop Mar 26, 2020
@jstockwin jstockwin deleted the fix-395-boxes-flow branch March 27, 2020 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LAParams boxes_flow documentation inconsistency
2 participants