-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is gl_SubgroupSize restricted to be a power of 2? #45
Comments
While most of the subgroup operations would probably still be well-defined with non-power of two sizes, there are some common algorithms (like butterfly reduction) that assume the size is a power of two and would have undefined behavior if run with a nonpower of two size. So we might be able to relax this a bit, but would need to do so carefully. I'm having trouble finding a similar restriction in the SPIR-V or Vulkan specs. If there's no restriction in SPIR-V or Vulkan then IMO there shouldn't be in GLSL, though it is also likely to be an oversight. |
Though I suppose the reality right now is that if GLSL requires power-of-two, then no vulkan implementation would choose to set VkPhysicalDeviceSubgroupProperties.subgroupSize to a non-power-of-two, since that would prevent it from supporting GLSL subgroups. |
If we were to relax this, we'd have to be particularly careful about what happens with ClusteredReduce operations. |
This has been discussed internally, but I don't think I can copy any of that discussion here without permission. Labelling as Resolving Inside Khronos for now. |
The restriction was intended to be in Vulkan as well but it was left out of the spec. It'll be added back into a future Vulkan spec release. It is useful for some subgroup algorithms to know that subgroup sizes are a power of two and this is common in hardware implementations. Other implementations can provide subgroups that are power of two sized by either breaking large subgroups into smaller power of two sizes, or by padding up to a power of two size with inactive invocations. |
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant
We have a target that executes a non-power-of-2 number of threads in lock-step. It would be natural to make a subgroup have one member per thread, but we are not able to because GL_KHR_shader_subgroup states:
Why is there a restriction that it must be a power of 2? Is there any real need for that?
(As far as I can see, neither SPIR-V nor vulkan have this restriction, it is only in GLSL, so we could in theory have shaders written in other languages running on vulkan with a non-power-of-2 subgroup size.)
The text was updated successfully, but these errors were encountered: