Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message Execution Profiling #10892

Closed
11 tasks done
Tracked by #10888
jennijuju opened this issue May 19, 2023 · 5 comments
Closed
11 tasks done
Tracked by #10888

Message Execution Profiling #10892

jennijuju opened this issue May 19, 2023 · 5 comments
Assignees

Comments

@jennijuju
Copy link
Member

jennijuju commented May 19, 2023

User Story

Given that message execution must happen within the block time for the network to be healthy, it is very important for everyone interested in the health of the Filecoin network to know with each lotus release:

  1. How much time is being spent overall in the message execution loop of lotus
  2. How much time each part of message execution is taking

This will give users, developers and Filecoin community members confidence that releases are going to maintain the health of the network, and give objective data towards what parts of the lotus software need to be optimized.

Acceptance Criteria

Technical Breakdown

Tasks

Preview Give feedback

Deliverables

Preview Give feedback

Future work

  • Add message execution profiling to a github action that happens automatically
  • Updating the release process to include generation of this visualization for future releases
@jennijuju
Copy link
Member Author

(fvm

@jennijuju jennijuju changed the title message execution profiling Message Execution Profiling Jul 12, 2023
@snissn
Copy link
Contributor

snissn commented Jul 13, 2023

@arajasek notes:
message by message,
message type
block store operations
randomness
lookback state

@snissn snissn self-assigned this Jul 17, 2023
@snissn
Copy link
Contributor

snissn commented Jul 18, 2023

Here's some preliminary data:

go tool pprof -focus=ApplyMessage -top -relative_percentages pprof.out 
File: lotus
Build ID: b9028b4c098ba63f80f4e3063575d3fb97a9d346
Type: cpu
Time: Jul 17, 2023 at 1:16pm (HST)
Duration: 602.58s, Total samples = 660ms ( 0.11%)
Active filters:
   focus=ApplyMessage
Showing nodes accounting for 660ms, 100% of 660ms total
      flat  flat%   sum%        cum   cum%
     370ms 56.06% 56.06%      390ms 59.09%  runtime.cgocall
      30ms  4.55% 60.61%       30ms  4.55%  runtime.heapBitsSetType
      30ms  4.55% 65.15%      100ms 15.15%  runtime.mallocgc
      30ms  4.55% 69.70%       90ms 13.64%  runtime.newobject
      20ms  3.03% 72.73%      550ms 83.33%  github.com/filecoin-project/lotus/chain/vm.(*FVM).ApplyMessage
      10ms  1.52% 74.24%       50ms  7.58%  github.com/filecoin-project/filecoin-ffi/cgo.(*_Ctype_struct_Result_FvmMachineExecuteResponse).destroy
      10ms  1.52% 75.76%      420ms 63.64%  github.com/filecoin-project/filecoin-ffi/cgo.FvmMachineExecuteMessage
      10ms  1.52% 77.27%       10ms  1.52%  github.com/filecoin-project/go-state-types/big.(*Int).MarshalCBOR
      10ms  1.52% 78.79%       70ms 10.61%  github.com/filecoin-project/lotus/chain/vm.(*executionToken).Done
      10ms  1.52% 80.30%       20ms  3.03%  go.opencensus.io/stats/view.recordMeasurement
      10ms  1.52% 81.82%       10ms  1.52%  internal/reflectlite.(*rtype).Comparable
      10ms  1.52% 83.33%       10ms  1.52%  runtime.(*consistentHeapStats).release
      10ms  1.52% 84.85%       10ms  1.52%  runtime.(*itabTableType).find
      10ms  1.52% 86.36%       10ms  1.52%  runtime.(*mheap).allocSpan
      10ms  1.52% 87.88%       10ms  1.52%  runtime.casgstatus
      10ms  1.52% 89.39%       10ms  1.52%  runtime.cgoIsGoPointer
      10ms  1.52% 90.91%       10ms  1.52%  runtime.futex
      10ms  1.52% 92.42%       10ms  1.52%  runtime.gcmarknewobject
      10ms  1.52% 93.94%       20ms  3.03%  runtime.getitab
      10ms  1.52% 95.45%       10ms  1.52%  runtime.lock2
      10ms  1.52% 96.97%       20ms  3.03%  runtime.mapassign_faststr
      10ms  1.52% 98.48%       10ms  1.52%  runtime.save
      10ms  1.52%   100%       10ms  1.52%  time.Since

go tool pprof -svg -focus=ApplyMessage -relative_percentages pprof.out

profile001

this was created by starting with a fresh install of lotus, downloading the latest mainnet dump, then running this command:

timeout 10m  ./lotus daemon   --pprof pprof.out

@Stebalien
Copy link
Member

Looks like:

  • Cbor serialization on message execution to get the "chain length" is taking longer than it really needs to. I wonder if there's something we can do there.
  • Metrics recording when finishing execution looks like it's taking 10% of the time? That really seems unnecessary.
  • Crossing back into the FFI to "destroy" responses is taking ~7% of the time. We can probably fix that?

Overall, it looks like there may be room for a 30% (likely more like 20% speedup).

@snissn
Copy link
Contributor

snissn commented Aug 11, 2023

This work required no changes to lotus! The code for running these executions is available here:

https://github.com/FILCAT/lotus-ec2-tools/tree/main/tasks/profile_message_execution

in order to run the test suite, one simply needs to set up the lotus-ec2-tools repo, including aws credential configuration and then run

LOTUS_GIT_BRANCH=master INSTANCE_TYPE='i4i.2xlarge' ./run_task.sh profile_message_execution

and a spot instance will be created, execute the profiling code, and then update slack with a ipfs link with your results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants