Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naga crash compiling for spir-v backend: function call with pointer to field in a struct #6875

Open
rcoreilly opened this issue Jan 8, 2025 · 8 comments
Labels
api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: bug Something isn't working

Comments

@rcoreilly
Copy link

Description
naga crashes on valid code that works fine on the metal backend, when compiling for spir-v backend, for a function call with an argument that is a pointer to a field in a struct:

fn OuterFunc(ps: ptr<function,ParamStruct>) {
   Func(&((*ps).Sub));
}

with the error message:

internal error: entered unreachable code: Expression [2] is not cached!

I don't yet know if the OuterFunc context is important or not.

Repro steps
paste the attached shader code into hello_compute example and run.

Expected vs observed behavior
not crashing vs. crashing :)

Extra materials

The crash is in naga/src/back/spv/block.rs:2674 on latest trunk as of today a8a9173 :

                Statement::Call {
                    function: local_function,
                    ref arguments,
                    result,
                } => {
                    let id = self.gen_id();
                    self.temp_list.clear();
                    for &argument in arguments {
2674:                        self.temp_list.push(self.cached[argument]);
                    }

The same crash occurs in 23.0.1 as well.

Here's a shader .wgsl file to reproduce:

struct SubStruct {
	Tau: f32,
	Dt: f32,
	pad:  f32,
	pad1: f32,
}
struct ParamStruct {
	Sub: SubStruct,
}

@group(0) @binding(0)
var<storage, read> Params: array<ParamStruct>;

@compute @workgroup_size(64, 1, 1)
fn main() {
	var params = Params[0];
	OuterFunc(&params);
}

fn OuterFunc(ps: ptr<function,ParamStruct>) {
 	Func(&((*ps).Sub)); // this is what causes the crash
}

fn Func(ps: ptr<function,SubStruct>) {
}

Platform

This occurs on linux building for an NVIDIA gpu (I got the same crash across two very different ones) using the vulkan backend.

This is a blocker bug for me: my wgsl code (auto-generated from Go source) is full of such expressions, which implement method calls in Go. Fortunately it has been working fine on my primary development platform (mac), but now it prevents running on large compute servers. Hopefully it is an easy fix!?

@cwfitzgerald cwfitzgerald added type: bug Something isn't working api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion naga Shader Translator lang: SPIR-V Vulkan's Shading Language labels Jan 8, 2025
@rcoreilly
Copy link
Author

I just confirmed that this simpler case also causes the crash:

struct SubStruct {
	Tau: f32,
	Dt: f32,
	pad:  f32,
	pad1: f32,
}
struct ParamStruct {
	Sub: SubStruct,
}

@group(0) @binding(0)
var<storage, read> Params: array<ParamStruct>;

@compute @workgroup_size(64, 1, 1)
fn main() {
	var params = Params[0];
 	Func(&(params.Sub)); // this is what causes the crash
}

fn Func(ps: ptr<function,SubStruct>) {
}

@Vipitis
Copy link

Vipitis commented Jan 8, 2025

might be the same as #4517 ?

@rcoreilly
Copy link
Author

Yeah and also #4827 which was another dupe, and perhaps another listed in that one too. Given that this issue has been around since 2022, and keeps showing up, hopefully it can be fixed, but I guess the fact that it hasn't suggests that it might be tricky?

@Vipitis
Copy link

Vipitis commented Jan 8, 2025

there is also #4540 which isn't exactly the same but maybe related. That issue has some more links and maintainer comments saying it's essentially intended for the current spec.

I am tracking this issue because it's a case where the code passes validation and then causes a panic further down. So I am not necessarily looking for a solution or workaround, but rather have it be caught during validation.

@rcoreilly
Copy link
Author

Yikes, I missed that aspect of the spec! Also, tint validates the same code as well, so I guess both are missing the enforcement. I guess the metal backend uses pointers natively so it just works?

In my use-case, all such pointers-to-fields are on read-only data structures. Coming from a "standard" programming language perspective (C, Go), the only reason I'm using a pointer here is to avoid needless copying of the data structures. But I have a vague understanding that somehow I could just pass a non-pointer "value" of the sub-struct and nothing would actually be copied in the end?

@cwfitzgerald
Copy link
Member

There's no guarentee, and arrays love to throw backend compilers off, but it should. But definitely monitor your occupancy

@rcoreilly
Copy link
Author

In addition to fixing the validation, probably the most relevant point of this issue at this point is about the docs in the spec: I just read through the section on memory views: https://www.w3.org/TR/WGSL/#memory-views and could not find the critical constraint. As far as I could tell, it only appears in the definition of the unrestricted pointer_parameters language extension, in the negative form it seems, in that it the extension removes this constraint that otherwise obtains, but it literally reads like it is imposing this constraint.

Each argument of pointer type to a user-defined function must have the same memory view as its root identifier.

The section on Alias Analysis https://www.w3.org/TR/WGSL/#alias-analysis seems to provide the relevant definitions but doesn't appear to explicitly state or provide an example of this constraint.

This nice succinct tour does provide a clear statement: https://google.github.io/tour-of-wgsl/types/pointers/passing_pointers/

@rcoreilly
Copy link
Author

For the record here, per #6688, not using pointers and just passing direct values of sub-structs appears to work just fine in terms of overall performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

3 participants