-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/telemetry: crash on Windows when counter file is extended #68358
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
Investigating briefly, it seems likely that this is due to the unmapping of the existing file before it is remapped: |
Thanks for the investigation. What happens if you delete the munmapFile call? |
In https://go.dev/cl/597279, I tested exactly that (see TestMultipleMaps). Windows seems happy to provide multiple views of the same file. My hypothesis is that the unmap is done only for tests, which fail to clean up if there are any outstanding mappings of the counter file. See also the SetFinalizer call to mappedFile.close - the need for a setfinalizer is that counter.file.current may be read without holding counter.file.mu, and therefore the lifecycle of mappedFiles is unmanaged. I believe the correct solution is to turn |
After a bit more testing (ensuring that other mappings remain valid when one is closed), I believe the solution described in #68358 (comment) is the path forward. Therefore marking this as NeedsFix. I will aim for a fix today. The fix will not be completely trivial, but there are only 7 non-test calls to |
Change https://go.dev/cl/597278 mentions this issue: |
Well, after much iteration I realized that the existing code was very close to unmapping files correctly: we just need to close the mapping after invalidating counters, which must be done after unlocking the So, we're able to (1) remove the unsafe unmapping, (2) remove the runtime.SetFinalizer logic, and (3) resolve that TODO, without introducing additional mechanisms. A satisfactory change. I believe this should be a relatively safe CL to land. |
Changes in this release This release updates the golang.org/x/telemetry dependency to pick up fixes for the following bugs: golang/go#68946: unnecessary downloading of the golang.org/x/telemetry/config module golang/go#68358: a potential crash on Windows when the telemetry counter file is extended golang/go#68311: a potential hanging process if the telemetry file is truncated Additionally, this release changes the gopls/telemetryprompt/accepted counter to be recorded each time the prompt state is checked (golang/go#68770). None of these issues were particularly urgent, but we also wanted to make a gopls release to exercise our recently added release automation (golang/go#57643).
On Windows, unmapping the previous mappedFile in mmap.Mmap caused panics when counter pointers were read concurrent to the remapping. Upon investigation, it appears that the unmapping was only necessary for tests, to ensure that previous mappings were cleaned up (and therefore that test files can be deleted). A call to runtime.SetFinalizer(..., mappedFile.close) appeared to serve a similar purpose, yet for an unknown reason finalizers were never run. Deeper investigation revealed that there was simply one bug in file cleanup (coincidentally already noted in a TODO): after storing the newly mapped file in file.newCounter1 and invalidating counters, we can close the previous mappedFile. Therefore: - fix the cleanup in file.newCounter1 - remove the unmap in mmap.Mmap on windows - remove the now unnecessary 'existing' parameter in mmap APIs - remove the SetFinalizer call - add a test for multiple concurrent mappings of a file - add an end-to-end test for concurrent file extension - change ReadCounter to read by memory mapping the file, in an attempt to avoid msync issues For golang/go#68311 Fixes golang/go#68358 Change-Id: I27b6f4f4939e93f7c76f920d553848bf014be236 Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/597278 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Hyang-Ah Hana Kim <[email protected]>
Discovered while investigating #68311: telemetry counter file extension can cause windows processes to panic. For example: this crash is from a new end-to-end test added in https://go.dev/cl/597278, without any changes to the x/telemetry logic.
https://ci.chromium.org/ui/p/golang/builders/try/x_telemetry-go1.21-windows-amd64/b8742848988042691361/test-results?q=ExactID%3Agolang.org%2Fx%2Ftelemetry%2Finternal%2Fcounter.TestConcurrentExtension+VHash%3Afb236f0bcc2ba24d&clean=&sortby=&groupby=
I don't think this is a theoretical or pathological setup. With the high concurrency of toolchain processes, it seems likely that Windows users will encounter crashes due to this bug. Furthermore, these crashes will only become more likely as we add more instrumentation. Therefore, marking as a release blocker.
CC @golang/telemetry
The text was updated successfully, but these errors were encountered: