-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider marking split_at as #[inline] #1301
Comments
Conventional wisdom says it's already generic, so it's already possible for the compiler to inline it. But it depends a lot on mode of optimization. If it's just generic, that is no longer enough? |
Seems so, at least I'm observing a case where ArrayViewMut::split_at is not inlined as of Rust 1.70. |
It sounds like you use rayon::iter::split. Take note that ndarray already implements split-based parallelism natively using rayon, in that case see Zip and parallel in the docs. :slightly_smiling_face: |
Being generic implies being available for cross-crate inlining by necessity, which is the same as putting |
I am doing a stencil computation (every output point is a weighted sum of a window of input points) and got the impression that this particular computation pattern is not supported by ndarray's built in parallelism facilities. |
Here is some links with info (they were useful to me and could be to others)
With that said, adding |
Indeed, I regularly set codegen-units = 1 in my performance-sensitive projects in order to work around lost inlining of important std methods like That being said, setting codegen-units = 1 does comes at the cost of a major degradation in compilation time (as does fat LTO, which is often necessary because in my experience the inlining heuristics of thin LTO are less good). Therefore, when I think that my use case for an In my opinion, caller-side inline directives as recently introduced by clang would often be better, but alas we don't have those in Rust. I opened an internal threads to see if there would be interest in them. Putting on my library maintainer hat, I will usually add |
Out of curiosity, @HadrienG2, are you using any sort of An example program where the (non)presence of (Further general discussion is in the irlo thread; this is what's immediately relevant here.)
General wisdom has in fact been to let the inlining heuristics do their job by default, so if a function is generic (thus, at least historically, already necessarily monomorphized into each CGU that uses it and available to inlining) then it shouldn't also be marked Developers have historically been really bad at intuiting good inlining heuristics, to the point that C/C++ compilers typically ignore the The inlining heuristics are pretty good at their job. We should prefer not to adjust them until observing evidence that they aren't sufficient for a specific case, and to not adjust them ahead of time off of "inlining is probably outsizedly beneficial here" vibes. In the typical case, the optimizer knows more accurately than you when stuff should be inlined. (Not always! Which is why the hints exist. But usually.) |
@CAD97 Since you also asked your question about compiler flags in the irlo thread, I replied to it there. The short answer is that no, I am not using -Zshare-generics or similar, just the standard rustc v1.70 stable configuration. The code is public, I will extract some reproducers of pathological behavior once I have something better than a cellphone at hand. I guess I should rather post these on irlo than here, since they are probably not of particular interest to an ndarray audience. Regarding the wisdom of applying If rustc wants me to trust it, it will need to get good enough to earn my trust, right now it's unfortunately not quite there yet in the presence of multiple crates and multiple codegen units, the latter of which is both the default configuration (i.e. what my users will see unless they or I do something to change it) and highly preferable for compile times (i.e. not something I would like to change myself if I can avoid to). |
There's a bunch of method calls used used between ArrayViewMut::split_at and RawViewMut::split_at that look like small methods that should be
It's not really the same thing but I think it's fun and I want to share the information: we have definition-side inline directives on closures and that almost looks like caller side. I've experimented with using those instead of macro based unrolling (matrixmultiply). It looks like this in use: |
In my use case, it was specifically
As far as I understand inlining directives, this would only force rustc to inline the closure, and not the other functions transitively called by the closure. But maybe @CAD97 can comment on this. |
In recursive domain decomposition (the easiest if not most performant way to parallelize iteration using rayon::iter::split() and optimize for cache locality at all cache levels), split_at can easily become a bottleneck. Letting it be inlined into the caller greatly reduces its performance impact.
The text was updated successfully, but these errors were encountered: