-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help the compiler vectorize std::iota
#4627
Conversation
I admit I can still vectorize this better that the compiler. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
I pushed changes because I thought the following case was broken.
Probably we need coverage |
…'t need to be C++20 `consteval`. They're always stored in `constexpr` variables.
…ons. The `_Ugly` functions aren't marked `_EXPORT_STD`.
This matches how we implement `_Is_standard_unsigned_integer` in `<__msvc_bit_utils.hpp>`.
This answers the question of whether `static_cast<_Ty>(_Size)` is value-preserving.
Good catch! After talking on Discord, I've replaced the increasingly complicated helper function with C++20 Allocating 2 gigs on a 32-bit system would be problematic, so I think that the current level of test coverage should be fine. I reran the benchmarks to verify that the optimization is still effective:
|
This comment was marked as resolved.
This comment was marked as resolved.
…rt` in `_Min_limit`/`_Max_limit`. These are internal helpers, and the "public" `_In_range` validates the user-provided types.
🔢 🔢 🔢 |
Based on microsoft#4627 fixup that extracts `_In_range` This also makes some place dirctly using `_In_range`, but mostly `_Max_limit` is used
Ranges version of microsoft#4627
The compiler vectorization of
iota
algorithm exists, but it is very fragile; one of issues reported as DevCom-10593477Can do
ranges::iota
as well, if this is considered acceptable approach.Benchmark results are with default benchmark options, I think it is SSE2 and not AVX2.
Before:
After: