-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AlpineARM32 Only, running dotnet new command will be Stuck with long time high CPU consumption after installing net9.0.100-preview.3.24175.24 #100536
Comments
@ChenhuiYuan01 is it only dotnet new that gets stuck or other commands as well? CC @MiYanni @joeloff if this is a templating issue. |
@marcpopMSFT |
This issue is also repro on 9.0.100-preview.4.24178.10 from https://github.com/dotnet/installer?tab=readme-ov-file. |
Seems like we're blocked on alpine arm but I don't know the how critical that platform is. @nkolev92 @dsplaisted from the above list, it kind of looks like any command that hits nuget is hanging. dotnet new doesn't build but does do a restore after creating the project. Thoughts on this one? |
I assume this is alpine arm64 only. If so, based on telemetry data I would not consider this a blocker. I checked the data and the arm64 alpine usage is <1% of all alpine usage and alpine usage is <1% of linux usage (from the data we have access to). |
It could be something with restore, or with MSBuild. I would guess that it's something to do with the network access that restore does. We could do |
@marcpopMSFT @dsplaisted This is blocking Alpine Arm32 official .NET container images. Observing this with the same FROM arm32v7/alpine:3.19
ENV \
# Configure web servers to bind to port 8080 when present
ASPNETCORE_HTTP_PORTS=8080 \
# Enable detection of running in a container
DOTNET_RUNNING_IN_CONTAINER=true \
# Do not generate certificate
DOTNET_GENERATE_ASPNET_CERTIFICATE=false \
# Do not show first run text
DOTNET_NOLOGO=true \
# SDK version
DOTNET_SDK_VERSION=9.0.100-preview.3.24175.24 \
# Set the invariant mode since ICU package isn't included
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=true \
# Enable correct mode for dotnet watch (only mode supported in a container)
DOTNET_USE_POLLING_FILE_WATCHER=true \
# Skip extraction of XML docs - generally not useful within an image/container - helps performance
NUGET_XMLDOC_MODE=skip \
# PowerShell telemetry for docker image usage
POWERSHELL_DISTRIBUTION_CHANNEL=PSDocker-DotnetSDK-Alpine-3.19-arm32
RUN apk add --upgrade --no-cache \
ca-certificates-bundle \
\
# .NET dependencies
libgcc \
libssl3 \
libstdc++ \
zlib \
curl
# Install .NET SDK
RUN wget -O dotnet.tar.gz https://dotnetbuilds.azureedge.net/public/Sdk/$DOTNET_SDK_VERSION/dotnet-sdk-$DOTNET_SDK_VERSION-linux-musl-arm.tar.gz \
&& dotnet_sha512='c0a702b295f275b135d7fc845322b71ff9298fc35771fa0ae4118e5766d3d2c16a3658757a7b8cc41ec095a8c532b58322c91c1239ba851d48ba30480932fb95' \
&& echo "$dotnet_sha512 dotnet.tar.gz" | sha512sum -c - \
&& mkdir -p /usr/share/dotnet \
&& tar -oxzf dotnet.tar.gz -C /usr/share/dotnet \
&& rm dotnet.tar.gz \
&& ln -s /usr/share/dotnet/dotnet /usr/bin/dotnet
# Run any arbitrary .NET command
RUN dotnet --version \
&& dotnet help It can be built with the command I used an arm64 machine in DTL to repro it. It also hangs under Docker qemu emulation on an amd64 machine. Related: dotnet/dotnet-docker#5309 |
@marcpopMSFT is this blocking .NET 9 Preview 3 release? |
@lbussell if you hit this, did you happen to be able to collect a dump or attach a debugger to get a callstack? Could you try a And @NicoleWang001 no idea what the priority of this specific config is. That's a question for tactics. |
@marcpopMSFT Sorry, my repro was not clear enough. The |
You're saying it hangs when running |
@marcpopMSFT yes it hangs when running I cannot get a dump without some further help with this - |
@lbussell can you install a different version of the SDK that's working in a different location in order to install the dump tool? |
CC @elinor-fung @agocke in case the issue is in the host itself as --help shouldn't do much. |
Here's what I'm trying -
The Here's the updated Dockerfile I'm using to do this - FROM mcr.microsoft.com/dotnet/nightly/sdk:9.0-preview-alpine3.19-arm32v7 as installer
RUN dotnet tool install --global dotnet-dump
FROM arm32v7/alpine:3.19
ENV ... same as above
RUN apk add ... same as above
COPY --from=installer [ "/usr/share/dotnet/", "/usr/share/dotnet-working/" ]
COPY --from=installer [ "/root/.dotnet/tools", "/root/.dotnet/tools" ]
# Install .NET SDK
RUN wget -O dotnet.tar.gz https://dotnetbuilds.azureedge.net/public/Sdk/$DOTNET_SDK_VERSION/dotnet-sdk-$DOTNET_SDK_VERSION-linux-musl-arm.tar.gz \
&& mkdir -p /usr/share/dotnet-broken \
&& tar -oxzf dotnet.tar.gz -C /usr/share/dotnet-broken \
&& rm dotnet.tar.gz \
&& ln -s /usr/share/dotnet-working/dotnet /usr/bin/dotnet
|
Per @agocke in tactics, moving to runtime repo. |
@mangod9 can you take a look? not sure where this high cpu may be coming from. also including @elinor-fung |
@ChenhuiYuan01, assuming this is a consistent repro are you able to capture a dump with dotnet-dump when the high cpu is occurring? Thanks |
@mangod9 We could not get a dump either as the dotnet-dump hangs.
|
@NicoleWang001 I would suggest to generate a core dump without any .NET tool help (using OS bundled native tools). Then run |
I am experiencing hangs (stuck forever, manually aborted by me after 8 hours) for all FROM mcr.microsoft.com/dotnet/nightly/sdk:9.0.100-preview.3-alpine3.19-arm32v7
WORKDIR /srv
RUN dotnet help FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine3.18-arm32v7
WORKDIR /srv
RUN dotnet help FROM mcr.microsoft.com/dotnet/sdk:7.0-alpine3.18-arm32v7
WORKDIR /srv
RUN dotnet help FROM mcr.microsoft.com/dotnet/sdk:6.0-alpine3.18-arm32v7
WORKDIR /srv
RUN dotnet help docker build -f 'arm32v7-test.Dockerfile' --platform 'linux/arm32v7' --pull -t 'arm32v7-test:latest' ./empty/ Tested with a docker host that is Windows 10 x64 (with WSL2 integration to Ubuntu) or an Ubuntu 22.04/23.10 VM (VirtualBox, with a Windows 10 x64 host). So these are environments that use qemu to emulate BTW, downloading and running the SDK in a custom build (buildroot) qemu busybox image targeting Here are results from
Differences are the
/cc @richlander |
@lauxjpn the behavior for the 6.0, 7.0, and 8.0 Docker images is unexpected. Can you please post an issue for that in https://github.com/dotnet/dotnet-docker? |
I have done some inital investigations. I have found the issue is related to the new exception handling that was enabled in preview 3 by default. There is some exception that occurs on a secondary thread during the shutdown of the process and the function that the new exception handling uses to iterate over stack frames keeps returning the same frame over and over. This is actually an explicit frame ( |
There is an edge case during exception handling on arm32 where an active InlinedCallFrame is not popped from the explicit frame list. That later leads to various kinds of failures / crashes. For example, the on Alpine arm32, the `dotnet help` hangs eating 100% of one CPU core. That happens due to code executing after the exception was handled and its stack overwriting the explicit frame contents. This can only occur when the pinvoke is inlined in a method that calls it inside of a try region with catch in the same method and exception occurs e.g. due to the target native function or the shared library not existing. What happens is that when we pop the explicit frame, we pop frames that are below the SP of the resume location after catch. But the InlinedCallFrame is in this case above that SP, as it was created in the prolog of the method. To fix that, we need to pop that frame too. The fix uses the same condition as the old EH was using. Closes dotnet#100536
There is an edge case during exception handling on arm32 where an active InlinedCallFrame is not popped from the explicit frame list. That later leads to various kinds of failures / crashes. For example, the on Alpine arm32, the `dotnet help` hangs eating 100% of one CPU core. That happens due to code executing after the exception was handled and its stack overwriting the explicit frame contents. This can only occur when the pinvoke is inlined in a method that calls it inside of a try region with catch in the same method and exception occurs e.g. due to the target native function or the shared library not existing. What happens is that when we pop the explicit frame, we pop frames that are below the SP of the resume location after catch. But the InlinedCallFrame is in this case above that SP, as it was created in the prolog of the method. To fix that, we need to pop that frame too. The fix uses the same condition as the old EH was using. Closes dotnet#100536
* Fix missing explicit frame pop on arm32 There is an edge case during exception handling on arm32 where an active InlinedCallFrame is not popped from the explicit frame list. That later leads to various kinds of failures / crashes. For example, the on Alpine arm32, the `dotnet help` hangs eating 100% of one CPU core. That happens due to code executing after the exception was handled and its stack overwriting the explicit frame contents. This can only occur when the pinvoke is inlined in a method that calls it inside of a try region with catch in the same method and exception occurs e.g. due to the target native function or the shared library not existing. What happens is that when we pop the explicit frame, we pop frames that are below the SP of the resume location after catch. But the InlinedCallFrame is in this case above that SP, as it was created in the prolog of the method. To fix that, we need to pop that frame too. The fix uses the same condition as the old EH was using. Closes dotnet#100536 * Remove forcing crossgen and filtering by target arch for the test * Reflect PR feedback --------- Co-authored-by: Jan Vorlicek <jan.vorlicek@volny,cz>
* Fix missing explicit frame pop on arm32 There is an edge case during exception handling on arm32 where an active InlinedCallFrame is not popped from the explicit frame list. That later leads to various kinds of failures / crashes. For example, the on Alpine arm32, the `dotnet help` hangs eating 100% of one CPU core. That happens due to code executing after the exception was handled and its stack overwriting the explicit frame contents. This can only occur when the pinvoke is inlined in a method that calls it inside of a try region with catch in the same method and exception occurs e.g. due to the target native function or the shared library not existing. What happens is that when we pop the explicit frame, we pop frames that are below the SP of the resume location after catch. But the InlinedCallFrame is in this case above that SP, as it was created in the prolog of the method. To fix that, we need to pop that frame too. The fix uses the same condition as the old EH was using. Closes dotnet#100536 * Remove forcing crossgen and filtering by target arch for the test * Reflect PR feedback --------- Co-authored-by: Jan Vorlicek <jan.vorlicek@volny,cz>
Reproduction Steps
Excepted:
The project will be created successfully
Actual Behavior
Project is blocked ---》Stuck with long time high CPU consumption
Running dotnet new console ------🡪 Stuck with long time high CPU consumption.
Dotnet --info
.NET SDK:
Version: 9.0.100-preview.3.24175.24
Commit: 09d6f381e6
Workload version: 9.0.100-manifests.77bb7ba9
MSBuild version: 17.10.0-preview-24175-03+89b42a486
Runtime Environment:
OS Name: alpine
OS Version: 3.20
OS Platform: Linux
RID: linux-musl-arm
Base Path: /root/mytest/sdk/9.0.100-preview.3.24175.24/
.NET workloads installed:
There are no installed workloads to display.
Host:
Version: 9.0.0-preview.3.24172.9
Architecture: arm
Commit: 9e6ba1f
.NET SDKs installed:
9.0.100-preview.3.24175.24 [/root/mytest/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 9.0.0-preview.3.24172.13 [/root/mytest/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 9.0.0-preview.3.24172.9 [/root/mytest/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
Not set
global.json file:
Not found
Learn more:
https://aka.ms/dotnet/info
The text was updated successfully, but these errors were encountered: