Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test - sequential/test-performance-eventloopdelay #41286

Open
mhdawson opened this issue Dec 22, 2021 · 10 comments
Open

Investigate flaky test - sequential/test-performance-eventloopdelay #41286

mhdawson opened this issue Dec 22, 2021 · 10 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@mhdawson
Copy link
Member

Test

sequential/test-performance-eventloopdelay

Platform

linuxone

Console output

node:assert:399
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(histogram.min > 0)

    at Timeout.spinAWhile [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-linuxone/test/sequential/test-performance-eventloopdelay.js:65:7)
    at listOnTimeout (node:internal/timers:568:17)
    at processTimers (node:internal/timers:510:7) {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Node.js v18.0.0-pre

Build links

https://ci.nodejs.org/job/node-test-commit-linuxone/30015/

Additional information

No response

@mhdawson mhdawson added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Dec 22, 2021
@richardlau
Copy link
Member

According to https://github.com/nodejs/reliability/issues?q=is%3Aissue+test-performance-eventloopdelay this first showed up on 22 December 2021.

@richardlau
Copy link
Member

richardlau commented Jan 4, 2022

Walking backwards from commits landed on 22 December 2021, I've started two stress runs around 23637e9:

665b404: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/306/nodes=rhel7-s390x/ (✔️ no failures in 1000 runs)
23637e9: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/307/nodes=rhel7-s390x/ (❌ 100 failures in 1000 runs)

@richardlau
Copy link
Member

I've added some debug to the test

diff --git a/test/sequential/test-performance-eventloopdelay.js b/test/sequential/test-performance-eventloopdelay.js
index f262e9de3f..eddc9348fa 100644
--- a/test/sequential/test-performance-eventloopdelay.js
+++ b/test/sequential/test-performance-eventloopdelay.js
@@ -62,7 +62,7 @@ const { sleep } = require('internal/util');
       histogram.disable();
       // The values are non-deterministic, so we just check that a value is
       // present, as opposed to a specific value.
-      assert(histogram.min > 0);
+      assert(histogram.min > 0, `histogram.min (${histogram.min}) > 0`);
       assert(histogram.max > 0);
       assert(histogram.stddev > 0);
       assert(histogram.mean > 0);

and kicked off another stress run (based on current master branch): https://ci.nodejs.org/view/Stress/job/node-stress-single-test/309/nodes=rhel7-s390x/console
and it looks like in the failure case histogram.min is 0:
e.g.

18:00:25 + /usr/bin/python3.6 tools/test.py -p tap --logfile test20.tap --mode=release sequential/test-performance-eventloopdelay
18:00:25 TAP version 13
18:00:25 1..1
18:00:32 not ok 1 sequential/test-performance-eventloopdelay
18:00:32   ---
18:00:32   duration_ms: 7.169
18:00:32   severity: fail
18:00:32   exitcode: 1
18:00:32   stack: |-
18:00:32     node:assert:399
18:00:32         throw err;
18:00:32         ^
18:00:32     
18:00:32     AssertionError [ERR_ASSERTION]: histogram.min (0) > 0
18:00:32         at Timeout.spinAWhile [as _onTimeout] (/home/iojs/build/workspace/node-stress-single-test/nodes/rhel7-s390x/test/sequential/test-performance-eventloopdelay.js:65:7)
18:00:32         at listOnTimeout (node:internal/timers:559:17)
18:00:32         at processTimers (node:internal/timers:502:7) {
18:00:32       generatedMessage: false,
18:00:32       code: 'ERR_ASSERTION',
18:00:32       actual: false,
18:00:32       expected: true,
18:00:32       operator: '=='
18:00:32     }
18:00:32     
18:00:32     Node.js v18.0.0-pre

mhdawson added a commit to mhdawson/io.js that referenced this issue Jan 5, 2022
Refs: nodejs#41286

This is one of the remaining high indicende failures
in the CI. Mark as flaky while it is investigated.

Signed-off-by: Michael Dawson <[email protected]>
nodejs-github-bot pushed a commit that referenced this issue Jan 6, 2022
Refs: #41286

This is one of the remaining high indicende failures
in the CI. Mark as flaky while it is investigated.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41409
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
targos pushed a commit that referenced this issue Jan 14, 2022
Refs: #41286

This is one of the remaining high indicende failures
in the CI. Mark as flaky while it is investigated.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41409
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Linkgoron pushed a commit to Linkgoron/node that referenced this issue Jan 31, 2022
Refs: nodejs#41286

This is one of the remaining high indicende failures
in the CI. Mark as flaky while it is investigated.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: nodejs#41409
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
danielleadams pushed a commit that referenced this issue Feb 1, 2022
Refs: #41286

This is one of the remaining high indicende failures
in the CI. Mark as flaky while it is investigated.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41409
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
@Trott
Copy link
Member

Trott commented Mar 19, 2022

Happened again today (and I'm sure many, many other days, but since it's marked as flaky, people probably don't log it).

https://ci.nodejs.org/job/node-test-commit-linuxone/nodes=rhel8-s390x/31271/console

00:16:18 not ok 3582 sequential/test-performance-eventloopdelay # TODO : Fix flaky test
00:16:25   ---
00:16:25   duration_ms: 7.211
00:16:25   severity: flaky
00:16:25   exitcode: 1
00:16:25   stack: |-
00:16:25     node:assert:399
00:16:25         throw err;
00:16:25         ^
00:16:25     
00:16:25     AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
00:16:25     
00:16:25       assert(histogram.min > 0)
00:16:25     
00:16:25         at Timeout.spinAWhile [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-linuxone/test/sequential/test-performance-eventloopdelay.js:65:7)
00:16:25         at listOnTimeout (node:internal/timers:559:17)
00:16:25         at processTimers (node:internal/timers:502:7) {
00:16:25       generatedMessage: true,
00:16:25       code: 'ERR_ASSERTION',
00:16:25       actual: false,
00:16:25       expected: true,
00:16:25       operator: '=='
00:16:25     }
00:16:25     
00:16:25     Node.js v18.0.0-pre
00:16:25   ...

@Trott
Copy link
Member

Trott commented Mar 19, 2022

Is this a bug in the test and the docs? Is it possible for the minimum recorded event loop delay to be 0 (or rather, to round to 0) on a sufficiently fast machine? @jasnell

@jasnell
Copy link
Member

jasnell commented Mar 19, 2022

Most likely test bug. I'll be able to look tomorrow

@mhdawson
Copy link
Member Author

@jasnell any chance you are goin to be able to take a look at this soon?

mhdawson added a commit that referenced this issue Dec 24, 2024
It was excluded as it was failing intermittently. Likely
that s390 was just so fast times were rounded down to 0.

Increase the spin time on s390x only.

Signed-off-by: Michael Dawson <[email protected]>
PR-URL: #56228
Refs: #41286
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
aduh95 pushed a commit that referenced this issue Jan 2, 2025
It was excluded as it was failing intermittently. Likely
that s390 was just so fast times were rounded down to 0.

Increase the spin time on s390x only.

Signed-off-by: Michael Dawson <[email protected]>
PR-URL: #56228
Refs: #41286
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
@jellelicht
Copy link

jellelicht commented Jan 6, 2025

I'm not sure if I should rather open a new issue or continue posting in this old one, but I can confirm that this test is flaky on my build system that has 32 cores @ 5.7 ghz with exactly the same test failure as reported here.

It happens relatively regularly (my estimate: 25% of the time) if my system is not doing a lot of other things at the time of testing.

This is a bog-standard x86_64 GNU/Linux system, fwiw.
EDIT:
This is still the case for v22.12.0

@mhdawson
Copy link
Member Author

mhdawson commented Jan 6, 2025

@jellelicht can you confirm you have the commit for 96a3804 in where you see the failure?

@mhdawson
Copy link
Member Author

mhdawson commented Jan 6, 2025

@jellelicht I guess nevermind since I only 96a3804 only modified the time for s390x.

It would be usefull if you could check if a similar change on your system reduces the flakiness. I'm guessing its a similar issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

No branches or pull requests

5 participants