Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually vectorize for at least SSE4.2 #4550

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

AlexGuteniev
Copy link
Contributor

Resolves #4536

Also checked and improved where can take advantage of new assumptions:

  • _mm_set1_epi8 replaced with _mm_shuffle_epi8 with zero input, slight improvement for 1-byte elements find and count
  • Simplified and improved __std_bitset_to_string_*
    • Used _mm_shuffle_epi8 to populate a __m128i variable instead of the sequence
    • Used _mm_blendv_epi8 to avoid xor

Didn't measure the results. If measured on a modern machine, that would be artificial to alter __isa_enabled, and still not relevant.

Also removed SSE level of instructions from comments. Kept __popcnt comment though.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 1, 2024 10:45
@StephanTLavavej StephanTLavavej added the enhancement Something can be improved label Apr 1, 2024
@StephanTLavavej StephanTLavavej self-assigned this Apr 1, 2024
@StephanTLavavej StephanTLavavej removed their assignment Apr 3, 2024
@StephanTLavavej
Copy link
Member

Perfect! 😻

@StephanTLavavej StephanTLavavej self-assigned this Apr 8, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 8346265 into microsoft:main Apr 9, 2024
35 checks passed
@StephanTLavavej
Copy link
Member

Thanks for dramatically reducing our risk here, and slightly improving perf for modern processors! 😻 🎉 🚀

@AlexGuteniev AlexGuteniev deleted the sse42 branch April 10, 2024 03:26
@OlafvdSpek
Copy link

Thanks for dramatically reducing our risk here, and slightly improving perf for modern processors! 😻 🎉 🚀

What risk does this reduce?

@AlexGuteniev
Copy link
Contributor Author

What risk does this reduce?

See the description of the issue closed by this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something can be improved
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

vector_algorithms.cpp: Remove the distinction between SSE2 and SSE4.2
3 participants