Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiter for parallel prehandling #8184

Merged
merged 29 commits into from
Oct 19, 2023

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Oct 11, 2023

What problem does this PR solve?

Issue Number: close #8081

Problem Summary:

In the previous PR, we introduced parallel prehandling for a single big region. However, it's believed there are also some cases that has a few, but more than 1 ongoing big snapshot. In these cases, the second snapshot can't benefit from parallel prehandling.

What is changed and how it works?

The idea is that we introduced a parallel limit, which equals to snap-handle-pool-size. Every subtask of a parallel prehandling task take a parallel unit. If a prehandling task takes more parallel units than what's left, the task will sleep, until some other prehandling subtask is finished.

item master this pr
splits 1(always) 4+3+3
lines - 10451012*3
interval between two ddls - 2m38s
item master this pr
splits 1(always) 4+4+4
lines - 20276455*3
interval between two ddls - 2m42s

Note there is much time wasted before the first snapshot is arrived.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 11, 2023
dbms/src/Storages/KVStore/KVStore.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/KVStore.cpp Outdated Show resolved Hide resolved
f
Signed-off-by: CalvinNeo <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 13, 2023
@guo-shaoge
Copy link
Contributor

/run-integration-test

@CalvinNeo
Copy link
Member Author

/run-integration-test

@CalvinNeo
Copy link
Member Author

/run-unit-test

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

dbms/src/Storages/KVStore/MultiRaft/PreHandlingTrace.h Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp Outdated Show resolved Hide resolved
@CalvinNeo
Copy link
Member Author

/run-all-tests

size_t total_concurrency = 0;
if (proxy_config.valid)
{
total_concurrency = proxy_config.snap_handle_pool_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default value of snap_handle_pool_size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be too big when falling through to std::thread::hardware_concurrency() below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in https://github.com/pingcap/tidb-engine-ext/blob/521fd9dbc55e58646045d88f91c3c35db50b5981/proxy_components/proxy_server/src/config.rs#L50

The idea here is in raftstore-v2 scene, many table has only one snapshots, so if there the only region of this table is serving, TiFlash can't actually serve anything.

However, I think I can adopt the previous strategy, since it is less aggresive.

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Oct 17, 2023
@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Oct 17, 2023

/run-integration-test

The tidb-ci tests returned with exit code 1, let's try rerun it
https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/14712/pipeline/147

path: /home/jenkins/agent/workspace/tiflash-ghpr-integration-tests/tests/tidb-ci
— Print Message
<1s
TAG=b176a5923180e2c167fe5dc348b2253433c46aff BRANCH=master ./run.sh
— Shell Script
5s
script returned exit code 1

z
Signed-off-by: CalvinNeo <[email protected]>
a
Signed-off-by: CalvinNeo <[email protected]>
@CalvinNeo
Copy link
Member Author

/run-integration-test

Signed-off-by: CalvinNeo <[email protected]>
@CalvinNeo CalvinNeo requested a review from JinheLin October 18, 2023 10:18
@CalvinNeo
Copy link
Member Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot added the lgtm label Oct 19, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Oct 19, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 19, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-10-17 09:37:53.088687977 +0000 UTC m=+1736270.675798107: ☑️ agreed by JaySon-Huang.
  • 2023-10-19 03:36:43.418718677 +0000 UTC m=+1887401.005828807: ☑️ agreed by JinheLin.

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@ti-chi-bot ti-chi-bot bot merged commit b77eca9 into pingcap:master Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel prehandle snapshot to speed up catch up with TiKV large region
4 participants