Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetricsreceiver] Test failure in internal/scraper/processscraper #36468

Closed
jade-guiton-dd opened this issue Nov 20, 2024 · 4 comments · Fixed by #36526
Closed

Comments

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Nov 20, 2024

Component(s)

receiver/hostmetrics

What happened?

Description

When running the tests for receiver/hostmetricsreceiver locally, I consistently get a failure in receiver/hostmetricsreceiver/internal/scraper/processscraper/process_scraper_test.go on lines 179 and 180:

internal.AssertContainsAttribute(t, attr, conventions.AttributeProcessCommand)
internal.AssertContainsAttribute(t, attr, conventions.AttributeProcessCommandLine)

I suspect this may be an OS-dependent issue.

Steps to Reproduce

Run go test in receiver/hostmetricsreceiver.

Expected Result

Tests pass.

Actual Result

Tests fail.

Collector version

Latest main (75ceb55)

Environment information

Environment

OS: macOS Sonoma 14.7.1 (on M3 hardware)
Compiler: go 1.22.9

Log output

The following output is repeated many times:

Error:          Should be true
Test:           TestScrape/Enable_memory_utilization
testutils.go:17: 
Error Trace:    [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/testutils.go:17
                [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper/process_scraper_test.go:179
                [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper/process_scraper_test.go:128
Error:          Should be true
Test:           TestScrape/Enable_memory_utilization
testutils.go:17: 
Error Trace:    [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/testutils.go:17
                [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper/process_scraper_test.go:180
                [...]/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper/process_scraper_test.go:128
Error:          Should be true
Test:           TestScrape/Enable_memory_utilization

=== FAIL: internal/scraper/processscraper TestScrape (re-run 1) (0.25s)
@jade-guiton-dd jade-guiton-dd added bug Something isn't working needs triage New item requiring triage labels Nov 20, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

VihasMakwana commented Nov 22, 2024

@jade-guiton-dd can you run the tests as sudo and see if they work?
They're failing if you run with a "normal" user, most probably due to insufficient permissions.

EDIT: Tried the same on my Mac and the tests work with sudo

@VihasMakwana VihasMakwana added waiting for author and removed needs triage New item requiring triage labels Nov 22, 2024
@jade-guiton-dd
Copy link
Contributor Author

jade-guiton-dd commented Nov 22, 2024

@VihasMakwana You're right, running the tests as root works.

Is this expected behavior? If yes, it may be nice to document it (if it isn't already), and/or change test expectations depending on the current user, since I doubt most devs would run local tests as root by default.

@braydonk
Copy link
Contributor

braydonk commented Nov 25, 2024

Just took a look at this. The test does pass for me locally on a Linux machine, but in reading the test code I found an important mistake that makes it a generally broken test.

I don't have a Mac and know very little about darwin in general, but I can at least point out where the permissions issue is. Getting the command/command line of the process happens here:

https://github.com/shirou/gopsutil/blob/62a52f910bd9ad7d48a7416f5c2899386d50cb37/process/process_darwin.go#L364

My guess is that this syscall requires the same permissions as the owner of the process to read this information. (Operating word being "guess" there). If that is the case, then the reason this fails on Mac is that the first process is owned by root (so probably pid 1 or something along those lines) and it can't get the command line for it, causing the test to fail.

However, as I mentioned earlier, this entire assertion is busted. While looking at this myself I caught this:

for i := 0; i < resourceMetrics.Len(); i++ {
attr := resourceMetrics.At(0).Resource().Attributes()

It's just checking the first process over and over again. The fact that it works locally on Linux is actually just because of this issue. When I fix this to look at every resource in the scraped metrics, it fails basically every test on some random attribute being missing. I'm going to try and come up with a new way to do this test that isn't an assertion of every resource attribute being present; we have a few known errors of special processes missing particular attributes in a way that is completely fine and expected, and this test when actually running like it should will fail on those. I'll probably make it check for only a representative subset or something, I have to mess with it a bit.

dmitryax pushed a commit that referenced this issue Jan 6, 2025
…6526)

#### Description
Previously, this test was repeatedly testing the first process resource
in the list. This would usually pass (on non-Mac machines), but is not
testing what it is supposed to. When changing it to test every resource
in the scrape, it came up with issues due to the fact that not all
resource attributes can always be found for every process.

This PR changes the test to look at all process resources in the scrape,
and instead of asserting the presence of every resource attribute, it
checks for one mandatory attribute (the process ID) and ensures that no
unexpected resource attributes are added. This should reduce the
flakiness of this test on other environments, while still asserting this
on every process resource in the list.

#### Link to tracking issue
Fixes #36468

#### Documentation
A comment was added to the assertion helper, and more detailed error
messages are produced on failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants