Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/k8sattributesprocessor] Fix memory leak #30842

Closed
wants to merge 3 commits into from

Conversation

crobert-1
Copy link
Member

Description:

This PR contains 3 main changes:

  1. Fix a goroutine leak. Previously, a goroutine was started when a new kube client was created. This goroutine was started whether or not creation was successful. If creation of the kube client was unsuccessful, this goroutine was leaked as there was no way to stop it. I've updated the logic to start the goroutine in Start instead of New. This should allow proper behavior. This also means that Start should always be called on the kube client, rather than simply when passthroughMode: false, as the Start method now internally accounts for passthroughMode. I've filed [processor/k8sattributesprocessor] Leaking goroutine caused by invalid kube client init #30841 to discuss this bug and potential solution.
  2. Add a shutdown method to the mutlitest object, and call it in many places. This was the source of many leaks as the log, metric, and trace processors were being started but never shutdown.
  3. Add goleak checks to each package under k8sattributeprocessor.

Link to tracking Issue:
#30438
Fixes #30841

Testing:
Existing tests are all passing, added goleak checks are now passing as well.

@crobert-1
Copy link
Member Author

crobert-1 commented Jan 30, 2024

Investigating e2e test failure - I believe it is an existing failure, unrelated to my changes, but it needs to be resolved before merging.

@crobert-1 crobert-1 marked this pull request as draft January 30, 2024 00:25
@crobert-1
Copy link
Member Author

From debugging locally it looks like this is an issue within the internal/k8stest package, which creates and deletes kubernetes objects. I believe there's some issue going on inside the delete object functionality that somehow isn't fully cleaning up the HTTP requests. Here are the call stacks of the leaks occurring in the e2e test.

 Goroutine 50 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x7f3358458888, 0x72)
	/opt/hostedtoolcache/go/1.20.13/x64/src/runtime/netpoll.go:306 +0x89
internal/poll.(*pollDesc).wait(0xc0002c0100?, 0xc000648000?, 0x0)
	/opt/hostedtoolcache/go/1.20.13/x64/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/opt/hostedtoolcache/go/1.20.13/x64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0002c0100, {0xc000648000, 0x1a80, 0x1a80})
	/opt/hostedtoolcache/go/1.20.13/x64/src/internal/poll/fd_unix.go:167 +0x299
net.(*netFD).Read(0xc0002c0100, {0xc000648000?, 0xc000648005?, 0xef?})
	/opt/hostedtoolcache/go/1.20.13/x64/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000014220, {0xc000648000?, 0x410625?, 0xc000219db0?})
	/opt/hostedtoolcache/go/1.20.13/x64/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc000fe87f8, {0xc000648000?, 0xc000fe87f8?, 0x0?})
	/opt/hostedtoolcache/go/1.20.13/x64/src/crypto/tls/conn.go:788 +0x3d
bytes.(*Buffer).ReadFrom(0xc000219e90, {0x221fe20, 0xc000fe87f8})
	/opt/hostedtoolcache/go/1.20.13/x64/src/bytes/buffer.go:202 +0x98
crypto/tls.(*Conn).readFromUntil(0xc000219c00, {0x2224000?, 0xc000014220}, 0x1a80?)
	/opt/hostedtoolcache/go/1.20.13/x64/src/crypto/tls/conn.go:810 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc000219c00, 0x0)
	/opt/hostedtoolcache/go/1.20.13/x64/src/crypto/tls/conn.go:617 +0x116
crypto/tls.(*Conn).readRecord(...)
	/opt/hostedtoolcache/go/1.20.13/x64/src/crypto/tls/conn.go:583
crypto/tls.(*Conn).Read(0xc000219c00, {0xc0003c5000, 0x1000, 0x0?})
	/opt/hostedtoolcache/go/1.20.13/x64/src/crypto/tls/conn.go:1316 +0x16f
bufio.(*Reader).Read(0xc000617f80, {0xc0001444a0, 0x9, 0xc0004f4388?})
	/opt/hostedtoolcache/go/1.20.13/x64/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0x221fc60, 0xc000617f80}, {0xc0001444a0, 0x9, 0x9}, 0x9)
	/opt/hostedtoolcache/go/1.20.13/x64/src/io/io.go:332 +0x9a
io.ReadFull(...)
	/opt/hostedtoolcache/go/1.20.13/x64/src/io/io.go:351
golang.org/x/net/http2.readFrameHeader({0xc0001444a0?, 0x9?, 0xc000000000?}, {0x221fc60?, 0xc000617f80?})
	/home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000144460)
	/home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x95
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc0006c9f98)
	/home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/transport.go:2275 +0x12e
golang.org/x/net/http2.(*ClientConn).readLoop(0xc000139980)
	/home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/transport.go:2170 +0x6f
created by golang.org/x/net/http2.(*Transport).newClientConn
	/home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/transport.go:821 +0xc1f
]

How do we get to newClientConn?

http2.(*Transport).newClientConn (transport.go:739) golang.org/x/net/http2
http2.(*Transport).NewClientConn (transport.go:735) golang.org/x/net/http2
http2.(*addConnCall).run (client_conn_pool.go:198) golang.org/x/net/http2
http2.(*clientConnPool).addConnIfNeeded.func1 (client_conn_pool.go:179) golang.org/x/net/http2
runtime.goexit (asm_amd64.s:1650) runtime

How do we get to addConnIfNeeded?

http2.(*clientConnPool).addConnIfNeeded (client_conn_pool.go:179) golang.org/x/net/http2
http2.configureTransports.func1 (transport.go:255) golang.org/x/net/http2
http.(*Transport).dialConn (transport.go:1764) net/http
http.(*Transport).dialConnFor (transport.go:1467) net/http
http.(*Transport).queueForDial.func1 (transport.go:1436) net/http
runtime.goexit (asm_amd64.s:1650) runtime
 - Async Stack Trace
http.(*Transport).queueForDial (transport.go:1436) net/http

How do we get to queueForDial?

http.(*Transport).queueForDial (transport.go:1436) net/http
http.(*Transport).getConn (transport.go:1390) net/http
http.(*Transport).roundTrip (transport.go:591) net/http
http.(*Transport).RoundTrip (roundtrip.go:17) net/http
transport.(*userAgentRoundTripper).RoundTrip (round_trippers.go:168) k8s.io/client-go/transport
http.send (client.go:260) net/http
http.(*Client).send (client.go:181) net/http
http.(*Client).do (client.go:724) net/http
http.(*Client).Do (client.go:590) net/http
rest.(*Request).request (request.go:1023) k8s.io/client-go/rest
rest.(*Request).Do (request.go:1063) k8s.io/client-go/rest
dynamic.(*dynamicResourceClient).Create (simple.go:138) k8s.io/client-go/dynamic
k8stest.CreateObject (k8s_objects.go:29) github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8stest
k8stest.CreateCollectorObjects (k8s_collector.go:39) github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8stest
k8sattributesprocessor.TestE2E (e2e_test.go:72) github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor
testing.tRunner (testing.go:1595) testing
testing.(*T).Run.func1 (testing.go:1648) testing
runtime.goexit (asm_amd64.s:1650) runtime
 - Async Stack Trace
testing.(*T).Run (testing.go:1648) testing

@crobert-1
Copy link
Member Author

I'm going to close for now until I have more time to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/k8sattributes k8s Attributes processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processor/k8sattributesprocessor] Leaking goroutine caused by invalid kube client init
2 participants