Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard __restrict usage for CUDA #5061

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Nov 1, 2024

Followup to #4958.

Fixes VSO-2295101 "[RWC][prod/fe][Regression] Cutlass failed with error C2146: syntax error: missing ')' before identifier '_Dest'". It appears that CUDA 12.4 doesn't understand __restrict, so when it attempts to split code between the device (GPU) and host (MSVC) compilers, it mangles what's sent to MSVC.

Apparently this is only an issue when the template is instantiated, which is why it wasn't found by our CUDA test coverage (that includes all headers and does nothing with them). It was only encountered in our Real World Code test suite where we wisely harnessed NVIDIA's Cutlass project.

I am slightly nervous that bad kitties have grabbed _RESTRICT but we're following that form for _LIKELY etc. and I don't want to pre-emptively avoid issues when we have all legitimate rights to this identifier.

Fixes AB#2295101.

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Nov 1, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner November 1, 2024 18:51
@CaseyCarter
Copy link
Member

CaseyCarter commented Nov 1, 2024

It's _LIKELY that someone else who considers themselves to be "implementation" is using _RESTRICT, but I suppose we may as well try. We should consider consistently using a common prefix (_STL_ seems the clear candidate) for our macro definitions. (Not for this PR.)

@CaseyCarter CaseyCarter removed their assignment Nov 1, 2024
@@ -2016,5 +2016,11 @@ compiler option, or define _ALLOW_RTCc_IN_STL to suppress this error.
#define _CONST_CALL_OPERATOR const
#endif // ^^^ !defined(__cpp_static_call_operator) ^^^

#ifdef __CUDACC__ // TRANSITION, CUDA 12.4 doesn't recognize __restrict
#define _RESTRICT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CUDA knows __restrict__, see here.

@StephanTLavavej StephanTLavavej self-assigned this Nov 7, 2024
@StephanTLavavej
Copy link
Member Author

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 4b697a8 into microsoft:main Nov 8, 2024
39 checks passed
@StephanTLavavej StephanTLavavej deleted the cuda-restrict branch November 8, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants