-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: sporadic memory corruption on 386 (32-bit) builders #37881
Comments
These 2020-03-16T22:31:39-2fbca94/linux-386-sid |
This appears to be a dramatic uptick in memory corruption across the board on the 32-bit builders. Given that it is only appearing on the 32-bit builders, and that we have not (to my knowledge) made any particularly racy or dangerous changes in |
I also can't rule out a compiler bug as the cause, given the number of changes to the rewrite rules so far. (CC @josharian @randall77) |
Note that some of the affected builders are also TryBots: |
I'm kinda stumped on this one. I've tried to reproduce to no avail, and pouring over CL 222782 doesn't reveal anything that might cause intermittent issues like this. On a possibly related note, I just mailed a change to add more addressing mode modifications for amd64. When that goes in, whether or not we start seeing this on amd64 will be illuminating. |
I have not followed this much and I am shooting from the hip here, but perhaps rolling back the rewrite rule changes just to rule them out might be helpful (and then role them back in, selectively). That would be cheap in terms of man-power. |
Change https://golang.org/cl/224457 mentions this issue: |
I think this is another example? This time in a trybot. https://storage.googleapis.com/go-build-log/899dcded/linux-386_238c26dc.log |
Because of the index, these ops can't guarantee faulting if arg0 is nil. Clean up the PPC64 index ops - they can't take a sym or an offset. Noticed while debugging #37881. I don't think it is the cause, but I guess there is a chance. Update #37881 Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c Reviewed-on: https://go-review.googlesource.com/c/go/+/224457 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
New failure today, so I think CL224457 has not fixed it: |
Change https://golang.org/cl/224837 mentions this issue: |
Rolling back portions of CL 222782 to see if that helps issue #37881 any. Update #37881 Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/224837 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]>
Looks like that last CL didn't help - there was still a failure: |
Change https://golang.org/cl/225057 mentions this issue: |
Update #37881 Change-Id: I1f9a3f57f6215a19c31765c257ee78715eab36b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/225057 Run-TryBot: Keith Randall <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Change https://golang.org/cl/225197 mentions this issue: |
Change https://golang.org/cl/225217 mentions this issue: |
Change https://golang.org/cl/225218 mentions this issue: |
This reverts commit CL 225057. Reason for revert: Undoing partial reverts of CL 222782 Update #37881 Change-Id: Iee024cab2a580a37a0fc355e0e3c5ad3d8fdaf7d Reviewed-on: https://go-review.googlesource.com/c/go/+/225197 Reviewed-by: Bryan C. Mills <[email protected]>
This reverts commit CL 224837. Reason for revert: Reverting partial reverts of 222782. Update #37881 Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826 Reviewed-on: https://go-review.googlesource.com/c/go/+/225217 Reviewed-by: Bryan C. Mills <[email protected]>
This reverts commit CL 222782. Reason for revert: Reverting to see if 386 errors go away Update #37881 Change-Id: I74f287404c52414db1b6ff1649effa4ed9e5cc0c Reviewed-on: https://go-review.googlesource.com/c/go/+/225218 Reviewed-by: Bryan C. Mills <[email protected]>
Change https://golang.org/cl/225798 mentions this issue: |
I might have figured out what is wrong. Read the description of the CL above for all the gory details. |
Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX*1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX*1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
Unfortunately, https://build.golang.org/log/b634dbda2b877a9859e65cd5eba87577e9e33dee looks like it may be another instance of this failure after the most recent attempt.
|
This looks somewhat different. The original failure looks like the GC found (temporary) bad pointers. The new one is a segfault. |
Change https://golang.org/cl/226437 mentions this issue: |
Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as that may create an ephemeral out-of-bounds pointer. I have not seen an actual bug caused by this yet, but we've seen them in the 386 port so I'm fixing this issue for amd64 as well. The load-combining rules needed to be reworked somewhat to still work without the above broken rule. Update #37881 Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a Reviewed-on: https://go-review.googlesource.com/c/go/+/226437 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
I'm going to declare this fixed. The original CL with fixed rules is in, both for 386 and amd64. |
Various
sweep increased allocation count
errors have started cropping up in or neargo/build
invocations in variousgo
commands.2020-03-15T08:13:55-32dbccd/linux-386-clang
2020-03-14T07:03:15-d774d97/linux-386-sid
2020-03-16T20:59:27-ff1eb42/linux-386-387
2020-03-15T08:13:55-32dbccd/linux-386-clang
2020-03-14T07:03:15-d774d97/linux-386-sid
2020-03-13T20:43:12-e2a9ea0/openbsd-386-62
Since
go/build
is involved, this may be related to thego/types
crash cluster (#37602, #37507, #37690).CC @griesemer @matloob @aclements @danscales @mknyszek @cherrymui
The text was updated successfully, but these errors were encountered: