Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector_algorithms.cpp: minmax for 64-bit elements: replace ugly x86 workaround with a nice one #4661

Merged
merged 7 commits into from
May 20, 2024

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented May 7, 2024

This: piece

static uint64_t _Get_any_u(const __m128i _Cur) noexcept {
#ifdef _M_IX86
return (static_cast<uint64_t>(static_cast<uint32_t>(_mm_extract_epi32(_Cur, 1))) << 32)
| static_cast<uint64_t>(static_cast<uint32_t>(_mm_cvtsi128_si32(_Cur)));
#else // ^^^ x86 / x64 vvv
return static_cast<uint64_t>(_mm_cvtsi128_si64(_Cur));
#endif // ^^^ x64 ^^^
}

works around the oddity of not having _mm_cvtsi128_si64 on 32-bit x86

It has been problematic:

I have discovered a nicer workaround!

If we spill the reg into the stack, the spill will optimize away.
On 32-bit with at least /arch:SSE2 it even produces better code than the existing workaround.
Demo: https://godbolt.org/z/ErGWz8GYT

It still does the actual spill on /arch:IA32. But given that this path is executed only once per function call (there are no intermediate reductions for 64-bit elements), and there's a plan to lift to /arch:SSE2, I think that's fine.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner May 7, 2024 17:36
@StephanTLavavej StephanTLavavej added the enhancement Something can be improved label May 7, 2024
@StephanTLavavej StephanTLavavej self-assigned this May 7, 2024
stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved
stl/src/vector_algorithms.cpp Show resolved Hide resolved
stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

Thanks, this is great! 😻 I pushed further changes to centralize the logic, please meow if you have concerns.

@StephanTLavavej StephanTLavavej removed their assignment May 12, 2024
@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented May 13, 2024

This centralization is already done in #4659 , there would just be more conflicts, after doing again here

@AlexGuteniev
Copy link
Contributor Author

Oh, you also did the implementation of _Get_any via _Get_v_pos. Didn't think about it, bu seems it will be no worse, as 0 is expected to constant propagate.

@StephanTLavavej
Copy link
Member

I'm drowning in PRs right now, so I think I'm going to have to clear out the backlog in multiple batches (in between investigating a non-STL bug that I can't weasel out of investigating forever). I'd like to land this PR first, then resolve conflicts in #4659.

@StephanTLavavej StephanTLavavej self-assigned this May 17, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 910275c into microsoft:main May 20, 2024
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for noticing how to make this code way more elegant! 🪄 😻 💚

@AlexGuteniev AlexGuteniev deleted the fake_spill branch May 21, 2024 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something can be improved
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants