-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vector_algorithms.cpp
: minmax
for 64-bit elements: replace ugly x86 workaround with a nice one
#4661
Conversation
Thanks, this is great! 😻 I pushed further changes to centralize the logic, please meow if you have concerns. |
This centralization is already done in #4659 , there would just be more conflicts, after doing again here |
Oh, you also did the implementation of |
I'm drowning in PRs right now, so I think I'm going to have to clear out the backlog in multiple batches (in between investigating a non-STL bug that I can't weasel out of investigating forever). I'd like to land this PR first, then resolve conflicts in #4659. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for noticing how to make this code way more elegant! 🪄 😻 💚 |
This: piece
STL/stl/src/vector_algorithms.cpp
Lines 1116 to 1123 in 8dc4faa
works around the oddity of not having
_mm_cvtsi128_si64
on 32-bit x86It has been problematic:
min/max/minmax_element
for 64-bit types on x86 #2821I have discovered a nicer workaround!
If we spill the reg into the stack, the spill will optimize away.
On 32-bit with at least
/arch:SSE2
it even produces better code than the existing workaround.Demo: https://godbolt.org/z/ErGWz8GYT
It still does the actual spill on
/arch:IA32
. But given that this path is executed only once per function call (there are no intermediate reductions for 64-bit elements), and there's a plan to lift to/arch:SSE2
, I think that's fine.