-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize find_first_of
for 8 and 16 bit elements with SSE4.2 pcmpestri
#4466
Conversation
As we discovered on Discord a while ago, some people call std::find_first_of with a single-element needle. This should be forwarded to std::find (or memchr or something), even if vectorization is otherwise disabled. |
I remember this, but it is so much unrelated, that I think it better fits a separate PR |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for vectorizing more algorithms! 🚀 ⏩ 🎉 |
Suggested by @Alcaro in #4415 (comment)
For element set that fits SSE register, that is length of "needle" is up to 16 for 8-bit element, up to 8 for 16-bit element
Possible future work:
find
instead for 1-element cases as @Alcaro suggestedbasic_string::find_first_of
. Certainly not the general case with user-provided char traits, but maybe for standard cases;Benchmark result
The first non-type template parameter in the benchmark results is the position where the value is found, "haystack" length is twice that.
The second non-type template parameter in the benchmark results is the "needle" size.
Before:
After:
Explanation:
bm<uint8_t, 9, 3>
row, the vectorization is engaged;bm<uint8_t, 22, 5>
already shows significant improvement;bm<uint16_t, 1011, 11>
falls back to the scalar algorithm due to "needle" length not fitting SSE register.