-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sudden stall at 100% CPU #858
Comments
Suspects:
|
Traces before the issue. View the trace with |
Stackdump: Process with stackparse |
@alanshaw could you post the go version and linux version? |
The go suspect:
and
Clearly there is work to run, but it looks like findrunnable isn't finding anything. We may need to take a crack at this with gdb to confirm that this. This is why the Linux version and go version matter. |
Looking deeper, I'm seeing streams stuck reading 30 minutes in the DHT. They're supposed to be reset after 10 minutes. |
go 1.14 I’ll check the linux version asap |
$ hostnamectl
Static hostname: n/a
Transient hostname: gke-cluster-1-default-pool-b49ccc89-w9n3
Icon name: computer-vm
Chassis: vm
Machine ID: 224bd461662f618bdb8f842af3ce3abf
Boot ID: 9f7917a78a30441e8a2fed11f51bde86
Virtualization: kvm
Operating System: Container-Optimized OS from Google
Kernel: Linux 4.14.138+
Architecture: x86-64 |
@alanshaw could we try upgrading libp2p and upgrading yamux: |
git log 7ba322..76c1f95 --pretty=oneline
76c1f95caa7a7e22a1d971e8e186d9129ba77a74 (tag: v0.6.1) Merge pull request #846 from libp2p/chore/dep-update
7b280dd74dc58f65e312b3bdb8fbde6b857af1a9 chore: update yamux
891aeda705813f93d2d0b8108aa93e7ef141b61f Merge pull request #843 from libp2p/dependabot/go_modules/github.com/libp2p/go-libp2p-yamux-0.2.4
71e2d4ed5ba1f0b8247e668a17995bb4d2b42ddb build(deps): bump github.com/libp2p/go-libp2p-yamux from 0.2.2 to 0.2.4
463fa2a0dcbd5ee139011f36eb4975af63e05b1a Merge pull request #838 from libp2p/dependabot/go_modules/github.com/multiformats/go-multiaddr-net-0.1.3
ca57cf904e462e0977b41c7cb382c75d673778d9 build(deps): bump github.com/multiformats/go-multiaddr-net
9ad477ba8b7a369cad709790eca366e8afe2d3e0 (tag: v0.6.0) Merge pull request #836 from libp2p/feat/reliable-mock-notif
5f75aa2068f276cc5140879cda7fed1f0b131afe feat(mock): reliable notifications
4bbf43e8b2deeeb3f5db32cb94ae0286a95812b5 Merge pull request #835 from libp2p/fix/autorelay-doc
61634630e8114d4883fb1881c39554b26adbf3fa Merge pull request #833 from libp2p/dependabot/go_modules/github.com/libp2p/go-libp2p-mplex-0.2.2
8248da96b0a406501d72a8442a545eff6997e280 doc(options): fix autorelay documentation
241a61a246fab62af3341f022d1788829074fe12 build(deps): bump github.com/libp2p/go-libp2p-mplex from 0.2.1 to 0.2.2
8a52e862ee20881eb7e4c4cf6b33dd617664b873 Merge pull request #832 from libp2p/dependabot/go_modules/github.com/libp2p/go-libp2p-yamux-0.2.2
15fa3033c50088580b692871540401839a58b1f1 Merge pull request #834 from libp2p/dependabot/go_modules/github.com/libp2p/go-libp2p-peerstore-0.2.0
b5853020f816dad92542a0a03d33258ffdc0d0b5 build(deps): bump github.com/libp2p/go-libp2p-peerstore
a3bbeef22e33e1fefefdb52bc73e9c1a03a93618 build(deps): bump github.com/libp2p/go-libp2p-yamux from 0.2.1 to 0.2.2
e641f58681af49b132f38c2b7053eaea2addb0d6 Merge pull request #817 from libp2p/dependabot/go_modules/github.com/multiformats/go-multiaddr-0.2.1
34e69d491c213cd4566ee505b9adab85ff10a8e5 Merge pull request #796 from libp2p/generalize-private-network
cd4c6d3d115a7713ed6fe609c63da197c06788f1 Merge pull request #827 from libp2p/update-readme
4ccf0171045a60476b5c05d0e844fe60d3ebb3e4 Merge pull request #820 from libp2p/fix/remove-goroutine
d6d208197105b2287ff1aba14a23573490c4a62c Merge pull request #810 from libp2p/fix/relays-without-content-routing
68f9fd0e13bce74a8743fc394d54cca860b6b03c change PrivateNetwork to accept a PSK, update constructor magic
ebb9cb8af12faae59ec1cae4c8198cfea26cb422 Merge pull request #815 from Wondertan/fix/reset-error
5a6c36d346dda699f0bd99ff3b32cb9e1d23fbdd Merge pull request #826 from vasco-santos/docs/uniform
969eedc98f44ff9c362e6452dc8a842ed2ea0dc9 docs: remove note to non cancelled dev calls
b55baf18e967d8dd509c3b430b4217fa50a6a1e1 docs: remove deprecated packages
fe2c13c71b1c091293c712d6d0fef4bdfdf985f0 docs: uniform comment sentences
6cdc40418044dc96a9a90aefb0cc24faff647ffe fix: remove an unnecessary goroutine
6b1179649ff1276f10a8a6fe8b384959a281d743 build(deps): bump github.com/multiformats/go-multiaddr
5e30cdeaacdfdb1394048e6aa9668a4bfe78cb36 use of mux.ErrReset
78ca1bf57c421b79ce278075c7a311f7775f72b8 fix: can enable autorelay without content routing as long as there are static relays configured I'm running a Hydra with |
Sorry I missed this - I'll deploy this in a few hours and let you know. |
Tell me how this test fairs first. |
Ok, found a go bug that looks like the issue: golang/go#38070. I have a repro: package main
import (
"fmt"
"time"
)
func main() {
ticker := time.NewTicker(time.Millisecond)
defer ticker.Stop()
exit := make(chan struct{})
running := 0
for {
for i := 0; i < 100; i++ {
running++
go func() {
timer := time.NewTimer(1 * time.Microsecond)
for {
for k := 0; k < 100; k++ {
timer.Reset(100 * time.Microsecond)
}
select {
case <-timer.C:
case exit <- struct{}{}:
return
}
}
}()
if running > 4000 {
<-ticker.C
select {
case <-exit:
running--
default:
}
}
}
fmt.Printf("running: %d\n", running)
}
} This works on go 1.13 but grinds to a halt on go 1.14.1 (even with the preemptive scheduler disabled). |
Issue submitted upstream: golang/go#38119. It looks like the code above "works" on the latest master (which has fixed a timer reset deadlock). However, it still doesn't work well. |
This reverts commit c88a690. Unfortunately, go 1.14's new scheduler and timer changes have a few bugs that need to be ironed out before it's ready for production use. See: * libp2p/go-libp2p#858 * golang/go#38119
Still stable running |
Yeah, that sounds like the write coalescing is causing it. Try the latest go-lib2p release with the patches I've posted. |
might be related or not. There's a complete debug log attached from the start until it freezes plus the stack trace after kill (if this helps?) |
It's not related. |
Resolved by updating to go 1.14.2. |
Knowns:
Likely: golang/go#38070
The text was updated successfully, but these errors were encountered: