Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: replace busy-wait channel receives with receiveTimeout #448

Closed
wants to merge 2 commits into from

Conversation

dnut
Copy link
Contributor

@dnut dnut commented Dec 17, 2024

Every thread in sig that receives on a channel utilizes a CPU core at 100% from busy-waiting in the receive loop. This PR eliminates the busy-waiting by switching from tryReceive to receiveTimeout in those places. After this change, I see a dramatic decrease in CPU usage while running sig.

I did not remove tryReceive from defer statements or unit tests. It does not seem appropriate to use blocking calls in those contexts if non-blocking calls are already known to be sufficient.

Handling error.Closed

Unlike tryReceive, receiveTimeout can return an error to indicate that the channel is closed. I needed to make a decision about how to handle this error. To replicate the same behavior as tryReceive, the error should simply be ignored with catch null in every scenario. But I didn't take this approach in most cases.

error.Closed provides new information that we didn't have before, and we can use that information to our advantage on a case-by-case basis. If that information can lead us to a definitive conclusion that it would be pointless to remain in a particular context, then I've changed the code to exit from that context. I've done this conservatively: the control flow will remain in a context if there is any possibility of doing more useful work in that context.

I used the following where appropriate:

  • catch return: There is no more meaningful work that can be done in this function if the channel is closed.
  • catch break: The current scope has nothing else to do, but there is some additional work that can be done in the function after exiting the current scope.
  • catch null: There is meaningful work that can be done within the current scope even if the channel is closed. Often this is because we collect a batch of items from the channel before processing any of them.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: 4d824e9 Previous: 40bc433 Ratio
benchmarkGossipService(5k_ping_msgs) 100825920 nanos 1851663 nanos 54.45
benchmarkGossipService(5k_push_msgs) 100750542 nanos 918622 nanos 109.68
benchmarkGossipService(1k_pull_resp_msgs) 200575254 nanos 632798 nanos 316.97
benchmarkPullRequests(1k_data_1k_pull_reqs) 200546996 nanos 610015 nanos 328.76
benchmarkPullRequests(10k_data_1k_pull_reqs) 200568262 nanos 13089478 nanos 15.32
BlockstoreReader.getDataShred - Sequential(_) 2291906 nanos 759764 nanos 3.02
BlockstoreReader.slotRangeConnected(_) 996 millis 455 millis 2.19

This comment was automatically generated by workflow using github-action-benchmark.

@dnut
Copy link
Contributor Author

dnut commented Jan 3, 2025

closing since this has some performance issues, and some other improvements should occur before blocking receive is used. for example #465

@dnut dnut closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant