Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch tokens deletion in cleartokens command #969

Merged

Conversation

merito
Copy link
Contributor

@merito merito commented Apr 24, 2021

Fixes #651

Description of the Change

Use batches when deleting expired tokens using cleartokens management command. Introduce two setting variables to configure batch size and interval between deletions. The defaults were tested on quite weak machines and ~1.5M deleted tokens. It allows to run cleartokens without any downtime.

Checklist

  • PR only contains one change (considered splitting up PR)
  • unit-test added
  • documentation updated
  • CHANGELOG.md updated (only for user relevant changes)
  • author name in AUTHORS

@merito merito marked this pull request as ready for review April 24, 2021 23:03
@codecov
Copy link

codecov bot commented Apr 25, 2021

Codecov Report

Merging #969 (725c3c9) into master (e4c98c7) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #969      +/-   ##
==========================================
+ Coverage   96.64%   96.67%   +0.03%     
==========================================
  Files          31       31              
  Lines        1756     1775      +19     
==========================================
+ Hits         1697     1716      +19     
  Misses         59       59              
Impacted Files Coverage Δ
oauth2_provider/settings.py 100.00% <ø> (ø)
oauth2_provider/models.py 98.76% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4c98c7...725c3c9. Read the comment docs.

auvipy
auvipy previously requested changes Oct 19, 2021
Copy link
Contributor

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit and integration tests for the changed logics are needed. also need explanation why atomic transacion is not needed here.

oauth2_provider/models.py Show resolved Hide resolved
oauth2_provider/models.py Show resolved Hide resolved
@MattBlack85
Copy link
Contributor

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

@merito
Copy link
Contributor Author

merito commented Oct 22, 2021

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

Note: If you only need to determine the number of records in the set (and don’t need the actual objects), it’s much more efficient to handle a count at the database level using SQL’s SELECT COUNT(*). Django provides a count() method for precisely this reason.

source: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#when-querysets-are-evaluated

Performance is the reason why this PR emerges. I've tried to remove ~1.5M of stale tokens and I've failed because very high RAM usage. When I'd introduced batching it passed without any peaks in resources of machine on postgres installed on very basic AWS instance. Because AccessToken model is simple the default 10000 items batch size is the best balance between speed and low resource usage.

Copy link
Member

@n2ygk n2ygk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed follow up on this PR. It looks good. Please rebase and resolve conflicts. I'm targeting the next minor release for this. Thanks! (I'm afraid to check our production DB to see how many expired tokens are in it;-)

@n2ygk n2ygk added this to the 1.7.0 milestone Dec 22, 2021
Do not check for merge conflicts in AUTHORS file, because ======= in 2nd line triggers the error.
@merito merito force-pushed the feature-improve-cleartokens-performance branch from e4428bc to 2340bdf Compare December 22, 2021 17:14
@merito
Copy link
Contributor Author

merito commented Dec 22, 2021

I had to exclude AUTHORS from check-merge-conflict, because it causes

Check for merge conflicts................................................Failed
- hook id: check-merge-conflict
- exit code: 1

Merge conflict string "=======
" found in AUTHORS:2

You can keep this change or change separator in AUTHORS file.

@merito merito requested a review from auvipy December 22, 2021 17:33
.pre-commit-config.yaml Outdated Show resolved Hide resolved
@n2ygk
Copy link
Member

n2ygk commented Dec 23, 2021

I think I'll approach this by fixing AUTHORS separately as the problem doesn't "belong" to this PR.

Also, since I'm targeting 1.7.0 for this as it's more than a patch, let's hold it off a bit until 1.6.1. gets published.

@n2ygk
Copy link
Member

n2ygk commented Dec 23, 2021

pre-commit/pre-commit-hooks#100 describes the problems with RST underline.

@n2ygk n2ygk dismissed auvipy’s stale review January 1, 2022 16:48

change made but not marked resolved.

@n2ygk n2ygk merged commit c42423c into jazzband:master Jan 1, 2022
@n2ygk
Copy link
Member

n2ygk commented Jan 1, 2022

@auvipy no test existed prior to this PR so let's accept it as is and put writing a test for the command on the backlog.

@@ -101,6 +101,8 @@
# Whether to re-create OAuthlibCore on every request.
# Should only be required in testing.
"ALWAYS_RELOAD_OAUTHLIB_CORE": False,
"CLEAR_EXPIRED_TOKENS_BATCH_SIZE": 10000,
"CLEAR_EXPIRED_TOKENS_BATCH_INTERVAL": 0.1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merito Sorry this is after the fact but wouldn't a default value of 0 be best, especially since the sleep is always executed even if the batch is tiny.
https://github.com/merito/django-oauth-toolkit/blob/725c3c9d8927379c9808abd1badb4fcd9ff1cbaa/oauth2_provider/models.py#L636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cleartokens management command could have some performance improvements
5 participants