Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] Testintegration getting timed out #26097

Closed
samiura opened this issue Aug 24, 2023 · 9 comments
Closed

[Flaky Test] Testintegration getting timed out #26097

samiura opened this issue Aug 24, 2023 · 9 comments
Labels
bug Something isn't working flaky test a test is flaky receiver/flinkmetrics

Comments

@samiura
Copy link

samiura commented Aug 24, 2023

Description

Flaky test:
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/5966235030/job/16185463868?pr=26086#step:5:609
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/5966235030/job/16186427803

FAIL	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/flinkmetricsreceiver	360.040s
?   	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/flinkmetricsreceiver/internal/mocks	[no test files]
?   	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/flinkmetricsreceiver/internal/models	[no test files]
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/flinkmetricsreceiver/internal/metadata	0.044s [no tests to run]
FAIL
make[2]: *** [../../Makefile.Common:116: mod-integration-test] Error 1
make[1]: *** [Makefile:174: receiver/flinkmetricsreceiver] Error 2
make[2]: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/flinkmetricsreceiver'
make: *** [Makefile:76: integration-test] Error 2
make[1]: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
Error: Process completed with exit code 2.
2023/08/24 16:57:17 🐳 Creating container for image flink:1.17.0
2023/08/24 16:57:18 🐳 Creating container for image flink:1.17.0
2023/08/24 16:57:19 🐳 Creating container for image flink:1.17.0
2023/08/24 16:57:20 🐳 Creating container for image flink:1.17.0
2023/08/24 16:57:21 🐳 Creating container for image flink:1.17.0
2023/08/24 16:57:22 🐳 Creating container for image flink:1.17.0
panic: test timed out after 6m0s
running tests:
	TestIntegration (6m0s)

goroutine 8485 [running]:
testing.(*M).startAlarm.func1()
	/opt/hostedtoolcache/go/1.20.7/x64/src/testing/testing.go:2241 +0x219
created by time.goFunc
	/opt/hostedtoolcache/go/1.20.7/x64/src/time/sleep.go:176 +0x48
@samiura samiura added bug Something isn't working needs triage New item requiring triage labels Aug 24, 2023
@atoulme atoulme added flaky test a test is flaky receiver/flinkmetrics and removed needs triage New item requiring triage labels Aug 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners for receiver/flinkmetrics: @JonathanWamsley @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme atoulme changed the title Flaky Testintegration getting timed out [Flaky Test] Testintegration getting timed out Aug 24, 2023
@nslaughter
Copy link
Contributor

nslaughter commented Aug 25, 2023

This issue came up in workflows for #24774.

In one case, noted what I thought were a few key points for summary...

Also want to throw out there that on a follow-up run trying to work through this I saw the same pattern with kafkametricsreceiver. To me that meant it might be JVM related or just big memory use related, but not convinced that it's otherwise related to flinkmetrics receiver in particular.

Hoping to come up with some more concrete causes and changes from there.

@djaglowski
Copy link
Member

Thanks @nslaughter.

kafkametricsreceiver and flinkmetrics tests both rely on scraperinttest. This package was created in response to instability problems in complex tests, primarily those which depend on containers. It helps by adding retry logic around failure points that are not directly the purpose of the test (e.g. container failing to start is part of test setup, not test validation, so retry). Obviously scraperinttest is not perfect but it's been a major improvement upon what we formerly had.

If you are looking into this, it's likely a good idea to look for a solution that can be built into the package, so that this type of failure can be prevented broadly.

@dmitryax
Copy link
Member

Run into this couple times in https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6003150441/job/16281731431?pr=26125

I think we should skip the test for now

@dmitryax
Copy link
Member

Submitted #26194

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 30, 2023
@hughesjj
Copy link
Contributor

I'll note that I've had issues with NamedContainer and NetworkRequest and am diving into it

@github-actions github-actions bot removed the Stale label Dec 22, 2023
@djaglowski
Copy link
Member

I've proposed removing this test in #30218

@djaglowski
Copy link
Member

Closed by #30218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky test a test is flaky receiver/flinkmetrics
Projects
None yet
Development

No branches or pull requests

6 participants