Why is gl_SubgroupSize restricted to be a power of 2? #45

stevec611 · 2018-12-07T14:44:57Z

We have a target that executes a non-power-of-2 number of threads in lock-step. It would be natural to make a subgroup have one member per thread, but we are not able to because GL_KHR_shader_subgroup states:

If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupSize> is the number of invocations within a subgroup, and its
value is always a power of 2.

Why is there a restriction that it must be a power of 2? Is there any real need for that?

(As far as I can see, neither SPIR-V nor vulkan have this restriction, it is only in GLSL, so we could in theory have shaders written in other languages running on vulkan with a non-power-of-2 subgroup size.)

jeffbolznv · 2018-12-07T15:24:39Z

While most of the subgroup operations would probably still be well-defined with non-power of two sizes, there are some common algorithms (like butterfly reduction) that assume the size is a power of two and would have undefined behavior if run with a nonpower of two size. So we might be able to relax this a bit, but would need to do so carefully.

I'm having trouble finding a similar restriction in the SPIR-V or Vulkan specs. If there's no restriction in SPIR-V or Vulkan then IMO there shouldn't be in GLSL, though it is also likely to be an oversight.

stevec611 · 2018-12-10T11:02:44Z

If there's no restriction in SPIR-V or Vulkan then IMO there shouldn't be in GLSL

Though I suppose the reality right now is that if GLSL requires power-of-two, then no vulkan implementation would choose to set VkPhysicalDeviceSubgroupProperties.subgroupSize to a non-power-of-two, since that would prevent it from supporting GLSL subgroups.

nhaehnle · 2019-08-21T08:10:28Z

If we were to relax this, we'd have to be particularly careful about what happens with ClusteredReduce operations.

gnl21 · 2019-08-21T09:40:46Z

This has been discussed internally, but I don't think I can copy any of that discussion here without permission. Labelling as Resolving Inside Khronos for now.

gnl21 · 2019-08-21T16:33:35Z

The restriction was intended to be in Vulkan as well but it was left out of the spec. It'll be added back into a future Vulkan spec release.

It is useful for some subgroup algorithms to know that subgroup sizes are a power of two and this is common in hardware implementations. Other implementations can provide subgroups that are power of two sized by either breaking large subgroups into smaller power of two sizes, or by padding up to a power of two size with inactive invocations.

* subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45) * force 16 sequential threads per block * make 16 subgroup size a constant

pdaniell-nv added the Vulkan Functionality applies to Vulkan API label Dec 11, 2018

gnl21 added the Resolving Inside Khronos label Aug 21, 2019

gnl21 removed the Resolving Inside Khronos label Aug 21, 2019

gnl21 closed this as completed Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is gl_SubgroupSize restricted to be a power of 2? #45

Why is gl_SubgroupSize restricted to be a power of 2? #45

stevec611 commented Dec 7, 2018 •

edited

Loading

jeffbolznv commented Dec 7, 2018

stevec611 commented Dec 10, 2018

nhaehnle commented Aug 21, 2019

gnl21 commented Aug 21, 2019

gnl21 commented Aug 21, 2019

Why is gl_SubgroupSize restricted to be a power of 2? #45

Why is gl_SubgroupSize restricted to be a power of 2? #45

Comments

stevec611 commented Dec 7, 2018 • edited Loading

jeffbolznv commented Dec 7, 2018

stevec611 commented Dec 10, 2018

nhaehnle commented Aug 21, 2019

gnl21 commented Aug 21, 2019

gnl21 commented Aug 21, 2019

stevec611 commented Dec 7, 2018 •

edited

Loading