-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for plots with millions of points to render, without requirement of 32bit index support in imgui #41
Conversation
Test code:
This version is also slightly faster than current master (around 3% faster). This code uses knowledge of how PrimReserve function works, which is not a good thing and probably ImGui itself needs to get refactored in this area to better support scenarios when widgets push a ton of stuff to render. Edit: Edit: |
At a high level, can you explain what this is doing before dig into the code. Also, |
The problem with original code is with PrimReserve - you can't use it for more points than current index size can fit. If you try to do that, you get triangle mess on the screen. In short new logic checks how many points you can fit in current draw command and does as many different reservations as needed to add more points without violating index size restrictions. I'll fix constexpr, it's only there to remove visual studio compiler warning about constant conditions |
Got it, sounds like a great addition. I'll get around to merging this in a few days. Thank you for the work! |
I hadn't noticed this before. We can't merge code that relies on an open ImGui PR, so I suggest we wait and see what their response to the issue is. Otherwise, my initial feedback is to:
static void(*render_fn[32])(Getter & getter, ImDrawList & DrawList, int count, int offset, float line_weight, ImU32 col_line, int y_axis) =
{
&RenderLineStrip<0>,&RenderLineStrip<1>,&RenderLineStrip<2>,&RenderLineStrip<3>, &RenderLineStrip<4>,&RenderLineStrip<5>,&RenderLineStrip<6>,&RenderLineStrip<7>,
&RenderLineStrip<8>,&RenderLineStrip<9>,&RenderLineStrip<10>,&RenderLineStrip<11>, &RenderLineStrip<12>,&RenderLineStrip<13>,&RenderLineStrip<14>,&RenderLineStrip<15>,
&RenderLineStrip<16>,&RenderLineStrip<17>,&RenderLineStrip<18>,&RenderLineStrip<19>,&RenderLineStrip<20>,&RenderLineStrip<21>,&RenderLineStrip<22>,&RenderLineStrip<23>,
&RenderLineStrip<24>,&RenderLineStrip<25>,&RenderLineStrip<26>,&RenderLineStrip<27>,&RenderLineStrip<28>,&RenderLineStrip<29>,&RenderLineStrip<30>,&RenderLineStrip<31>
};
int which_fn_to_use = (RenderLineStrip_Offset & -(offset != 0))
| (RenderLineStrip_Extents & -(gp.FitThisFrame != false))
| (RenderLineStrip_LogX & -(HasFlag(plot->XAxis.Flags, ImPlotAxisFlags_LogScale) != false))
| (RenderLineStrip_LogY & -(HasFlag(plot->YAxis[y_axis].Flags, ImPlotAxisFlags_LogScale) != false))
| (RenderLineStrip_Cull & -(cull != false)); |
You also need ImGui's PR 3232 ocornut/imgui#3232
<ocornut/imgui#3232> for this to render without
issues
I shouldn't have mentioned it in the first place. Existing master already
has issues which PR3232 fixes, so it's not going to be worse than it
already is
1. regarding constexpr - without it compiler generates a bunch of useless
warnings. I don't like warnings and I also don't like muting warnings
either. I do empty it out with define for pre c++17 code. Is there a
problem with constexpr now ?
2 and 3 - Any Proposals ? RenderLineStrip could be split into 2 , separate
one for AA version. Then a few flags I consider relatively useless: cull -
just always use culling, what's the purpose of not using it? Offset flag is
completely useless imho, and if it is needed, it can be achieved using
callback interface of ImPlot. "Fit this frame" flag is relatively
inexpensive and can always be computed (and then results discarded when not
needed). Then you'd get with 4 combinations (LogX, LogY) - sounds ok ?
Regards,
Sergey
…On Fri, May 29, 2020 at 2:35 PM Evan Pezent ***@***.***> wrote:
This code uses knowledge of how PrimReserve function works, which is not a
good thing and probably ImGui itself needs to get refactored in this area
to better support scenarios when widgets push a ton of stuff to render.
Edit:
You also need ImGui's PR 3232 ocornut/imgui#3232
<ocornut/imgui#3232> for this to render without
issues
I hadn't noticed this before. We can't merge code that relies on an open
ImGui PR, so I suggest we wait and see what their response to the issue is.
Otherwise, my initial feedback is to:
1. remove if constexpr entirely (could template specializations be
used here?)
2. break up RenderLineStrip into smaller parts
3. this will be difficult to follow and maintain:
static void(*render_fn[32])(Getter & getter, ImDrawList & DrawList, int count, int offset, float line_weight, ImU32 col_line, int y_axis) =
{
&RenderLineStrip<0>,&RenderLineStrip<1>,&RenderLineStrip<2>,&RenderLineStrip<3>, &RenderLineStrip<4>,&RenderLineStrip<5>,&RenderLineStrip<6>,&RenderLineStrip<7>,
&RenderLineStrip<8>,&RenderLineStrip<9>,&RenderLineStrip<10>,&RenderLineStrip<11>, &RenderLineStrip<12>,&RenderLineStrip<13>,&RenderLineStrip<14>,&RenderLineStrip<15>,
&RenderLineStrip<16>,&RenderLineStrip<17>,&RenderLineStrip<18>,&RenderLineStrip<19>,&RenderLineStrip<20>,&RenderLineStrip<21>,&RenderLineStrip<22>,&RenderLineStrip<23>,
&RenderLineStrip<24>,&RenderLineStrip<25>,&RenderLineStrip<26>,&RenderLineStrip<27>,&RenderLineStrip<28>,&RenderLineStrip<29>,&RenderLineStrip<30>,&RenderLineStrip<31>
};
int which_fn_to_use = (RenderLineStrip_Offset & -(offset != 0))
| (RenderLineStrip_Extents & -(gp.FitThisFrame != false))
| (RenderLineStrip_LogX & -(HasFlag(plot->XAxis.Flags, ImPlotAxisFlags_LogScale) != false))
| (RenderLineStrip_LogY & -(HasFlag(plot->YAxis[y_axis].Flags, ImPlotAxisFlags_LogScale) != false))
| (RenderLineStrip_Cull & -(cull != false));
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFDDORXBB7XZKG6XFKZSNLRT6TX3ANCNFSM4NHC3GKA>
.
|
I saw the empty define, and I'm not a fan of it. I'm sure there are other ways to write this code warning free without constexpr.
Fair point, this was previously in place because our culling scheme was imperfect and didn't always look right. This was fixed, and so we can probably just always cull now
Are you suggesting we remove offset from the API? It's not useless -- it's the primary mechanism that allows realtime plots to work efficiently with circular buffers.
Agreed.
Sounds better, but I'm still not seeing the point entirely. Is the goal to move all the Transformer operations into the body of the function? |
Sounds better, but I'm still not seeing the point entirely. Is the goal to
move all the Transformer operations into the body of the function?
Transformers do a lot of operations which do not have to be done per point,
which is also achieved by moving them into RenderLineStrip directly. Then
further speed improvements can be achieved with vectorization, and doing
that across Transformer is not possible atm.
…On Fri, May 29, 2020 at 3:14 PM Evan Pezent ***@***.***> wrote:
1. regarding constexpr - without it compiler generates a bunch of
useless
warnings. I don't like warnings and I also don't like muting warnings
either. I do empty it out with define for pre c++17 code. Is there a
problem with constexpr now ?
I saw the empty define, and I'm not a fan of it. I'm sure there are other
ways to write this code warning free without constexpr.
Then a few flags I consider relatively useless: cull -
just always use culling, what's the purpose of not using it?
Fair point, this was previously in place because our culling scheme was
imperfect and didn't always look right. This was fixed, and so we can
probably just always cull now
Offset flag is completely useless imho, and if it is needed, it can be
achieved using
callback interface of ImPlot.
Are you suggesting we remove offset from the API? It's not useless -- it's
the primary mechanism that allows realtime plots to work efficiently with
circular buffers.
"Fit this frame" flag is relatively inexpensive and can always be computed
(and then results discarded when notneeded).
Agreed.
Then you'd get with 4 combinations (LogX, LogY) - sounds ok ?
Sounds better, but I'm still not seeing the point entirely. Is the goal to
move all the Transformer operations into the body of the function?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFDDOWH2RTDAH5KNOEAUCTRT6YJRANCNFSM4NHC3GKA>
.
|
@epezent - the refactoring steps I described are enough to get this merged in ? Or you don't feel like letting transformers go in RenderLineStrip ? |
I would prefer to keep Transformers separate for ease of maintenance and understanding. However, I don't want to do redundant calculations either. I'm looking at |
In one of your 'log' transformers for example: |
Would caching the memory fetches into member variables of the structs improve performance and vectorization in this case. Also, is it not the case that the compiler will inline the |
Would caching the memory fetches into member variables of the structs
improve performance and vectorization in this case.
You could cache the whole expression - replace division with multiplication
by inverse and following interpolations into few multiply-adds. That's kind
of what I did and I saw a small but non-zero perf improvement (around 2-3
percent).
Also, is it not the case that the compiler will inline the operator() into
the RenderLineStrip function since Transformers are templated arguments?
It would, but .Range.Min is of float type if I'm not mistaken, and compiler
does not know that this place is not going to be overwritten. So I expect
ranges will be refetched all the time and division will not be optimized
out. This of course are just assumptions and you need to actually look at
what code compiler generates. From my experience it is usually worse than
what you expect.
Having said that, before doing anything, code should get profiled first to
make sure you are not write-bandwidth limited. (I think you are not, but I
did not spend much time on this).
…On Sun, May 31, 2020 at 5:32 PM Evan Pezent ***@***.***> wrote:
Would caching the memory fetches into member variables of the structs
improve performance and vectorization in this case.
Also, is it not the case that the compiler will inline the operator()
into the RenderLineStrip function since Transformers are templated
arguments?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFDDOVL5UZJWC3UWG2UHDTRUJ2CPANCNFSM4NHC3GKA>
.
|
Would you care to try some of this (caching inside of Transform structs, etc.) in a separate PR? |
I can get transforms back and just leave the 16bit workaround logic - this
will be minimal changes. Current perf is enough for my needs for now, I'm
more concerned about accuracy of the rendering atm..
…On Sun, May 31, 2020 at 5:56 PM Evan Pezent ***@***.***> wrote:
Would you care to try some of this (caching inside of Transform structs,
etc.) in a separate PR?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFDDOSJNTE6IZEDL76OJDTRUJ4ZXANCNFSM4NHC3GKA>
.
|
Transformers are back. I also noticed you have added RenderLineFill function, which should suffer from same issues as RenderLineStrip and I assume based on this commit you can figure out how to fix RenderLineFill as well. |
Before I merge, is this dependent on a particular version of ImGui or PR? |
no
…On Tue, Jun 2, 2020 at 12:38 AM Evan Pezent ***@***.***> wrote:
Before I merge, is this dependent on a particular version of ImGui or PR?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFDDOQON2SRRO26SXUCFQTRUQUW3ANCNFSM4NHC3GKA>
.
|
I have a couple questions: 1) int cnt = (int)ImMin(size_t(prims), (((size_t(1) << sizeof(ImDrawIdx) * 8) - 1 - DrawList._VtxCurrentIdx) / 4)); Is the 4 at the very end the vertex count per segment, or is it related to something else? 2) if (cnt >= ImMin(64, prims)) What is the meaning of the number 64 here? 3) cnt = (int)ImMin(size_t(prims), (((size_t(1) << sizeof(ImDrawIdx) * 8) - 1 - 0/*DrawList._VtxCurrentIdx*/) / 4)); For |
1 - yes, as each new segment will up the index by 4 |
Also, please check "Allow edits from maintainers" so that I can push my edits so far. |
Ok, how do I do both of these things :) ? Zooming doesn't work for me on a benchmark plot |
Comment out this line:
For the second, it should be on the right side of this page for you. |
I see can these problems , but with a fix applied the problem goes away. You should also see the same problem without any of my changes. With your original code try pan the plot so that it's completely hidden (Culled out), and you'll see same garbage lines. I've enabled checkbox to allow you to commit. |
I see. So, essentially this PR is incomplete without the fix on ImGui's side. I see two short term solutions:
|
3rd option would be to apply same fix to ImGui but on the ImPlot side. Not sure which out of the 3 solutions is ugliest. I don't understand why that 2 line fix doesn't get merged either. |
They are all pretty ugly. 😄 Option 4 is to somehow count the number of culled segments before PrimReserve. The naive approach would be to have two loops; one that counts the number of culls need for PrimReserve, and one that does the rendering. That would require two calls to Getter/Transformer for every point though. The best alternative I can come up with is to do an inverse transformer operation on the plot bounds instead, and then check the untransformed points against this. That saves one call to Transformer per point, but still requires two calls to Getter (which isn't that expensive). Thoughts? |
I suggest to wait until my PR fixing this issue gets merged in or it gets fixed some other way. I don't see any way to upvote an importance of a PR other than complaining in the comment section. I've poked them once again, and at this point I don't feel like spending any more time on other workarounds. |
I'm curious what the cost of doing a PrimReserve for each point that passes the cull check would be. It might have overhead the first time through, but not too bad thereafter. |
it's not difficult to try it out and see how much the fps drops. Though doing it per segment is not a scalable approach. I like current idea of preallocating space because it has a lot of optimization potential. |
Though not optimal, I think we could temporarily go with either the 4th or 5th option until ImGui is fixed. |
Two Loops (based off master branch code): int will_render = 0;
const ImVec2 uv = DrawList._Data->TexUvWhitePixel;
for (int i1 = i_start; i1 != i_end; i1 = i1 + 1 < count ? i1 + 1 : i1 + 1 - count) {
ImVec2 p2 = transformer(getter(i1));
if (!cull || gp.BB_Grid.Overlaps(ImRect(ImMin(p1, p2), ImMax(p1, p2))))
will_render++;
p1 = p2;
}
ImVec2 p1 = transformer(getter(offset));
DrawList.PrimReserve(will_render * 6, will_render * 4);
for (int i1 = i_start; i1 != i_end; i1 = i1 + 1 < count ? i1 + 1 : i1 + 1 - count) {
ImVec2 p2 = transformer(getter(i1));
if (!cull || gp.BB_Grid.Overlaps(ImRect(ImMin(p1, p2), ImMax(p1, p2))))
RenderLine(DrawList, p1, p2, line_weight, col_line, uv);
p1 = p2;
} |
I've tested latest master and problem seem to have been fixed. |
Sounds good. I need to rework it a bit to accommodate the other plots which use PrimReserve directly, and then I will merge this. |
@sergeyn, I made two substantial edits:
I think everything still works as expected in the demo, and indeed this fixed 16-bit indices when used with Omar's latest commit to ImGui. Would you mind giving it a thorough review one last time before we merge? Thanks again for your hard work! PS: Compared with master, I see an increase in FPS on the benchmark plot from 390 to 430 when using 16-bit indices and your rendering algorithm! |
minor cleanup
I've added 'static' as you don't want to have integer division when not necessary. |
This may reduce performance, but I was able to get implot to run out of vertices with a relatively small dataset of a few tens of thousands of points. The upstream is working on this: epezent/implot#41
Fixes issues when using 16-bit indices. Slight performance gains. Refactors how offsetting data works in the implementation. Improves Offset and Stride demo.
No description provided.