Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is gl_SubgroupSize restricted to be a power of 2? #45

Closed
stevec611 opened this issue Dec 7, 2018 · 5 comments
Closed

Why is gl_SubgroupSize restricted to be a power of 2? #45

stevec611 opened this issue Dec 7, 2018 · 5 comments
Labels
Vulkan Functionality applies to Vulkan API

Comments

@stevec611
Copy link

stevec611 commented Dec 7, 2018

We have a target that executes a non-power-of-2 number of threads in lock-step. It would be natural to make a subgroup have one member per thread, but we are not able to because GL_KHR_shader_subgroup states:

If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupSize> is the number of invocations within a subgroup, and its
value is always a power of 2.

Why is there a restriction that it must be a power of 2? Is there any real need for that?

(As far as I can see, neither SPIR-V nor vulkan have this restriction, it is only in GLSL, so we could in theory have shaders written in other languages running on vulkan with a non-power-of-2 subgroup size.)

@jeffbolznv
Copy link
Contributor

While most of the subgroup operations would probably still be well-defined with non-power of two sizes, there are some common algorithms (like butterfly reduction) that assume the size is a power of two and would have undefined behavior if run with a nonpower of two size. So we might be able to relax this a bit, but would need to do so carefully.

I'm having trouble finding a similar restriction in the SPIR-V or Vulkan specs. If there's no restriction in SPIR-V or Vulkan then IMO there shouldn't be in GLSL, though it is also likely to be an oversight.

@stevec611
Copy link
Author

If there's no restriction in SPIR-V or Vulkan then IMO there shouldn't be in GLSL

Though I suppose the reality right now is that if GLSL requires power-of-two, then no vulkan implementation would choose to set VkPhysicalDeviceSubgroupProperties.subgroupSize to a non-power-of-two, since that would prevent it from supporting GLSL subgroups.

@pdaniell-nv pdaniell-nv added the Vulkan Functionality applies to Vulkan API label Dec 11, 2018
@nhaehnle
Copy link

If we were to relax this, we'd have to be particularly careful about what happens with ClusteredReduce operations.

@gnl21
Copy link
Contributor

gnl21 commented Aug 21, 2019

This has been discussed internally, but I don't think I can copy any of that discussion here without permission. Labelling as Resolving Inside Khronos for now.

@gnl21
Copy link
Contributor

gnl21 commented Aug 21, 2019

The restriction was intended to be in Vulkan as well but it was left out of the spec. It'll be added back into a future Vulkan spec release.

It is useful for some subgroup algorithms to know that subgroup sizes are a power of two and this is common in hardware implementations. Other implementations can provide subgroups that are power of two sized by either breaking large subgroups into smaller power of two sizes, or by padding up to a power of two size with inactive invocations.

@gnl21 gnl21 closed this as completed Aug 21, 2019
0cc4m pushed a commit to ggerganov/llama.cpp that referenced this issue Nov 30, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
ggerganov pushed a commit to ggerganov/ggml that referenced this issue Dec 3, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
ggerganov pushed a commit to ggerganov/ggml that referenced this issue Dec 3, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
ggerganov pushed a commit to ggerganov/whisper.cpp that referenced this issue Dec 5, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
ggerganov pushed a commit to ggerganov/whisper.cpp that referenced this issue Dec 8, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
arthw pushed a commit to arthw/llama.cpp that referenced this issue Dec 20, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
github-actions bot pushed a commit to martin-steinegger/ProstT5-llama that referenced this issue Dec 30, 2024
* subgroup 64 version with subgroup add. 15% faster

scalable version

tested for subgroup sizes 16-128

* check for subgroup multiple of 16 and greater than 16

* subgroup sizes are always a power of 2 (KhronosGroup/GLSL#45)

* force 16 sequential threads per block

* make 16 subgroup size a constant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Vulkan Functionality applies to Vulkan API
Projects
None yet
Development

No branches or pull requests

5 participants