Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove type instabilities and allocations #36

Merged
merged 1 commit into from
Dec 29, 2022

Conversation

efaulhaber
Copy link
Member

@efaulhaber efaulhaber commented Dec 28, 2022

Closes #29.

Here some benchmarks (all on an AMD Ryzen Threadripper 3990X). What used to be main loop in the timer output is now called container_interaction.

Rectangular Tank

One Thread

On main

julia> include("examples/fluid/rectangular_tank_2d.jl");
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 2.0  Time steps: 857 (accepted), 857 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────────
                 Pixie.jl                          Time                    Allocations      
                                          ───────────────────────   ────────────────────────
             Tot / % measured:                 39.8s /  99.7%           3.54GiB /  98.9%    

 Section                          ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────────
 rhs!                              7.82k    39.7s  100.0%  5.08ms   3.51GiB  100.0%   470KiB
   main loop                       7.82k    37.9s   95.5%  4.85ms   2.96GiB   84.4%   397KiB
   update neighborhood searches    7.82k    1.49s    3.7%   190μs    545MiB   15.2%  71.4KiB
   update particle containers      7.82k    299ms    0.8%  38.3μs   14.6MiB    0.4%  1.91KiB
   reset ∂u/∂t                     7.82k   12.2ms    0.0%  1.56μs     0.00B    0.0%    0.00B
   ~rhs!~                          7.82k   3.49ms    0.0%   446ns   2.94KiB    0.0%    0.38B
 ───────────────────────────────────────────────────────────────────────────────────────────

julia> using BenchmarkTools

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 977 samples with 1 evaluation.
 Range (min … max):  4.986 ms …  15.583 ms  ┊ GC (min … max): 0.00% … 66.52%
 Time  (median):     5.097 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.119 ms ± 581.831 μs  ┊ GC (mean ± σ):  0.62% ±  3.69%

        ▃▂▄  ▁▁            ▂▁▅▅▅▇▇█▄▅▂ ▁                       
  ▃▃▅▆▅████▇▇███▇▃▅▃▃▃▂▃▅▆██████████████▇▄▃▄▁▃▂▃▃▃▄▅▄▄▃▆▄▃▃▃▃ ▄
  4.99 ms         Histogram: frequency by time        5.21 ms <

 Memory estimate: 470.27 KiB, allocs estimate: 20866.

This PR

julia> include("examples/fluid/rectangular_tank_2d.jl");
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 2.0  Time steps: 857 (accepted), 857 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ────────────────────────────────────────────────────────────────────────────────────
              Pixie.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              32.6s /  99.6%            135MiB /  71.4%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       7.82k    32.4s  100.0%  4.15ms   96.4MiB  100.0%  12.6KiB
   container interaction    7.82k    30.8s   94.9%  3.94ms     0.00B    0.0%    0.00B
   update nhs               7.82k    1.32s    4.1%   170μs   96.4MiB  100.0%  12.6KiB
   update containers        7.82k    307ms    0.9%  39.3μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              7.82k   12.5ms    0.0%  1.60μs     0.00B    0.0%    0.00B
   ~rhs!~                   7.82k   3.27ms    0.0%   419ns   2.94KiB    0.0%    0.38B
 ────────────────────────────────────────────────────────────────────────────────────

julia> using BenchmarkTools

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 1205 samples with 1 evaluation.
 Range (min … max):  4.115 ms …  4.535 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.146 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.148 ms ± 16.251 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁▂▃▄▇▇▆▇▇▄█▃▆▂▄▅▃▂ ▂▄▂▁▁                 
  ▂▁▃▁▂▁▃▃▃▃▃▅▄▅▄▆▇▆█████████████████████████▆▆▇▆▄▅▅▆▄▃▄▃▃▃▃ ▅
  4.11 ms        Histogram: frequency by time        4.18 ms <

 Memory estimate: 12.56 KiB, allocs estimate: 1.

24 Threads

On Main

julia> include("examples/fluid/rectangular_tank_2d.jl");
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 2.0  Time steps: 857 (accepted), 857 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────────
                 Pixie.jl                          Time                    Allocations      
                                          ───────────────────────   ────────────────────────
             Tot / % measured:                 7.39s /  82.3%            759MiB /  90.4%    

 Section                          ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────────
 rhs!                              7.82k    6.08s  100.0%   778μs    686MiB  100.0%  89.9KiB
   main loop                       7.82k    3.83s   63.0%   490μs    114MiB   16.7%  15.0KiB
   update neighborhood searches    7.82k    1.73s   28.4%   221μs    557MiB   81.2%  72.9KiB
   update particle containers      7.82k    367ms    6.0%  47.0μs   14.9MiB    2.2%  1.95KiB
   reset ∂u/∂t                     7.82k    119ms    2.0%  15.2μs     0.00B    0.0%    0.00B
   ~rhs!~                          7.82k   33.9ms    0.6%  4.33μs   2.94KiB    0.0%    0.38B
 ───────────────────────────────────────────────────────────────────────────────────────────

julia> using BenchmarkTools

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 8701 samples with 1 evaluation.
 Range (min … max):  465.160 μs …   6.153 ms  ┊ GC (min … max): 0.00% … 75.16%
 Time  (median):     558.602 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   568.133 μs ± 105.738 μs  ┊ GC (mean ± σ):  0.28% ±  1.40%

                            ██▂                                  
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▂▂▁▁▂▆███▇▅▅▅▅▆▄▄▅▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂ ▃
  465 μs           Histogram: frequency by time          664 μs <

 Memory estimate: 78.89 KiB, allocs estimate: 117.

This PR

julia> include("examples/fluid/rectangular_tank_2d.jl");
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 2.0  Time steps: 857 (accepted), 857 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ────────────────────────────────────────────────────────────────────────────────────
              Pixie.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              5.00s /  92.8%            147MiB /  73.8%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       7.82k    4.64s  100.0%   593μs    109MiB  100.0%  14.2KiB
   container interaction    7.82k    2.84s   61.3%   364μs   6.68MiB    6.2%     896B
   update nhs               7.82k    1.40s   30.2%   179μs    102MiB   93.8%  13.4KiB
   update containers        7.82k    332ms    7.2%  42.5μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              7.82k   37.0ms    0.8%  4.74μs     0.00B    0.0%    0.00B
   ~rhs!~                   7.82k   27.4ms    0.6%  3.50μs   2.94KiB    0.0%    0.38B
 ────────────────────────────────────────────────────────────────────────────────────

julia> using BenchmarkTools

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 8924 samples with 1 evaluation.
 Range (min … max):  547.782 μs …  2.295 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     556.438 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   558.088 μs ± 25.513 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁▂▄▄▆█▇█▆▅▅▃▃▃▁                                     
  ▁▁▁▂▃▃▃▄▅▆█████████████████▇▇▆▆▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▄
  548 μs          Histogram: frequency by time          575 μs <

 Memory estimate: 13.44 KiB, allocs estimate: 6.

Oscillating Beam

Here, the difference is not big because we only have one type of container. The type instabilities only occur when we couple different types of containers in one simulation.
Note that the remaining allocations in the timer output are just the timers themselves (~0.026 bytes per timer).

One Thread

On main

julia> pixie_include("examples/solid/oscillating_beam_2d.jl", tspan=(0.0, 1.0));
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 1.0  Time steps: 2973 (accepted), 3168 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ─────────────────────────────────────────────────────────────────────────────────────────────
                  Pixie.jl                           Time                    Allocations      
                                            ───────────────────────   ────────────────────────
              Tot / % measured:                  33.9s /  99.9%           12.9MiB /  84.6%    

 Section                            ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────────────
 rhs!                                28.6k    33.9s  100.0%  1.19ms   10.9MiB  100.0%     400B
   update particle containers        28.6k    24.4s   71.9%   853μs   2.18MiB   20.0%    80.1B
     precompute pk1 stress tensor    28.6k    24.3s   71.8%   851μs     0.00B    0.0%    0.00B
     update current coordinates      28.6k   28.2ms    0.1%   987ns     0.00B    0.0%    0.00B
     ~update particle containers~    28.6k   6.29ms    0.0%   220ns   2.18MiB   20.0%    80.1B
   main loop                         28.6k    9.50s   28.0%   332μs   6.54MiB   60.0%     240B
   ~rhs!~                            28.6k   7.84ms    0.0%   274ns   2.94KiB    0.0%    0.11B
   reset ∂u/∂t                       28.6k   3.50ms    0.0%   122ns     0.00B    0.0%    0.00B
   update neighborhood searches      28.6k   1.92ms    0.0%  67.0ns   2.18MiB   20.0%    80.0B
 ─────────────────────────────────────────────────────────────────────────────────────────────

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 4189 samples with 1 evaluation.
 Range (min … max):  1.175 ms … 1.266 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.192 ms             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.192 ms ± 5.722 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▁▂▂▃▆▆▇▇█▇▇▆▇▇▇▆▆▅▆▅▃▂▁▁              
  ▂▁▂▁▂▁▂▃▃▃▅▅▆▆█▇▇▇▆▅█████████████████████████▇▇▆▆▆▅▄▅▄▄▃▃ ▅
  1.18 ms        Histogram: frequency by time        1.2 ms <

 Memory estimate: 400 bytes, allocs estimate: 10.

This PR

julia> pixie_include("examples/solid/oscillating_beam_2d.jl", tspan=(0.0, 1.0));
[...]
────────────────────────────────────────────────────────────────────────────────────────────────────
Pixie simulation finished.  Final time: 1.0  Time steps: 2973 (accepted), 3168 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ────────────────────────────────────────────────────────────────────────────────────
              Pixie.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              33.0s /  99.9%           1.66MiB /   0.2%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       28.6k    32.9s  100.0%  1.15ms   3.67KiB  100.0%    0.13B
   update containers        28.6k    24.4s   74.0%   853μs      752B   20.0%    0.03B
     precompute pk1         28.6k    24.4s   74.0%   853μs     0.00B    0.0%    0.00B
     ~update containers~    28.6k   15.0ms    0.0%   524ns      752B   20.0%    0.03B
   container interaction    28.6k    8.55s   26.0%   299μs     0.00B    0.0%    0.00B
   ~rhs!~                   28.6k   7.87ms    0.0%   276ns   2.94KiB   80.0%    0.11B
   reset ∂u/∂t              28.6k   3.50ms    0.0%   123ns     0.00B    0.0%    0.00B
   update nhs               28.6k    632μs    0.0%  22.1ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

julia> @benchmark Pixie.rhs!($(copy(sol[end])), $(copy(sol[end])), $semi, $0.0)
BenchmarkTools.Trial: 4336 samples with 1 evaluation.
 Range (min … max):  1.142 ms … 1.206 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.151 ms             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.152 ms ± 4.077 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▁▄▅▇██▇▇▇▆▄▅▃▁▁▁▁                            
  ▂▂▂▂▁▁▂▂▂▃▃▃▅▇█████████████████████▇▇▇▇▆▆▆▅▅▄▄▄▃▄▃▃▃▃▃▃▃▃ ▅
  1.14 ms        Histogram: frequency by time       1.16 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

@efaulhaber efaulhaber mentioned this pull request Dec 28, 2022
7 tasks
@efaulhaber efaulhaber marked this pull request as ready for review December 28, 2022 22:49
@efaulhaber efaulhaber requested a review from LasNikas December 28, 2022 22:49
Copy link
Collaborator

@LasNikas LasNikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!
The rhs! looks way cleaner now.

Comment on lines +37 to +38
# This NHS will never be used, so we just return an empty NHS.
# To keep actions on the tuple of NHS type-stable, we return something of the same type as the other NHS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: i'll need the NHS for the BoundaryParticleContainer in the Crespo BC.

@LasNikas LasNikas merged commit 0832426 into trixi-framework:main Dec 29, 2022
@efaulhaber efaulhaber deleted the performance1 branch May 19, 2023 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrap_array allocates
2 participants