Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestHaTrackerWithMemberList #10350

Open
dimitarvdimitrov opened this issue Jan 6, 2025 · 4 comments · Fixed by #10364
Open

Flaky TestHaTrackerWithMemberList #10350

dimitarvdimitrov opened this issue Jan 6, 2025 · 4 comments · Fixed by #10364

Comments

@dimitarvdimitrov
Copy link
Contributor

dimitarvdimitrov commented Jan 6, 2025

Failed on a helm PR:

--- FAIL: TestHaTrackerWithMemberList (1.02s)
    ha_tracker_test.go:310: expected <nil>, got replicas did not match: r1 != r2
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg="Get - not found" key=prefixuser/cluster
level=debug msg=CAS key=prefixuser/cluster modify_index=0 value="\"\\x15P\\n\\x05first\\x10告\\xd7\\xc32 告\\xd7\\xc32\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=2 value="\"\\x15P\\n\\x05first\\x10告\\xd7\\xc32 告\\xd7\\xc32\""
level=debug msg=CAS key=prefixuser/cluster modify_index=2 value="\"\\x18\\\\\\n\\x06second\\x10\\xf5ߊ\\xd7\\xc32 \\xf5ߊ\\xd7\\xc32(\\x01\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=3 value="\"\\x18\\\\\\n\\x06second\\x10\\xf5ߊ\\xd7\\xc32 \\xf5ߊ\\xd7\\xc32(\\x01\""
level=debug msg=CAS key=prefixuser/cluster modify_index=3 value="\"\\x17X\\n\\x05first\\x10\\x9d\\xa6\\x8b\\xd7\\xc32 \\x85\\xae\\x8b\\xd7\\xc32(\\x02\""
2025/01/06 09:11:52 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/06 09:11:52 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/06 09:11:52 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/06 09:11:52 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/06 09:11:52 label __name__ is overwritten. Check if Prometheus reserved labels are used.
level=info msg="server listening on addresses" http=127.0.0.1:38001 grpc=127.0.0.1:43873
level=warn method=/httpgrpc.HTTP/Handle duration=324.143µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{},Body:[104 101 108 108 111],}" msg=gRPC err="rpc error: code = Code(415) desc = unsupported content type: , supported: [application/json, application/x-protobuf]"
level=warn method=/httpgrpc.HTTP/Handle duration=332.009µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObjectCB: expect { or n, but found i, error found in #1 byte of ...|invalid|..., bigger context ...|invalid|..."
level=warn method=/httpgrpc.HTTP/Handle duration=244.975µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[10 246 22 10 211 2 10 29 10 17 99 111 110 116 97 105 110 101 114 46 114 117 110 116 105 109 101 18 8 10 6 100 111 99 107 101 114 10 39 10 18 99 111 110 116 97 105 110 101 114 46 104],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObjectCB: expect { or n, but found \ufffd, error found in #2 byte of ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011co|..., bigger context ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011container.runtime\u0012\u0008\n\u0006docker\n'\n\u0012container.h|..."
level=warn method=/httpgrpc.HTTP/Handle duration=3.83794ms request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[123 34 114 101 115 111 117 114 99 101 77 101 116 114 105 99 115 34 58 32 91 123 34 115 99 111 112 101 77 101 116 114 105 99 115 34 58 32 91 123 34 109 101 116 114 105 99 115 34 58 32 91 123 34 110 97 109 101 34 58 32 34 114 101 112 111 114 116 95 115 101 114 118 101 114 95 101 114 114 111 114 34 44 32 34 103 97 117 103 101 34 58 32 123 34 100 97 116 97 80 111 105 110 116 115 34 58 32 91 123 34 116 105 109 101 85 110 105 120 78 97 110 111 34 58 32 34 49 54 55 57 57 49 50 52 54 51 51 52 48 48 48 48 48 48 48 34 44 32 34 97 115 [68](https://github.com/grafana/mimir/actions/runs/12629687043/job/35188085554?pr=10346#step:8:69) 111 117 98 108 101 34 58 32 49 48 46 54 54 125 93 125 125 93 125 93 125 93 125],}" msg=gRPC err="rpc error: code = Code(503) desc = some random push error"
level=info msg="=== Handler.Stop()'d ==="
FAIL

https://github.com/grafana/mimir/actions/runs/12629687043/job/35188085554?pr=10346

@dimitarvdimitrov
Copy link
Contributor Author

FYI @NickAnge since you added that test recently. Can you take a look when you get a minute?

@NickAnge
Copy link
Contributor

NickAnge commented Jan 7, 2025

Hey @dimitarvdimitrov . Yes I am gonna have a look.

@NickAnge
Copy link
Contributor

NickAnge commented Jan 7, 2025

Hey @dimitarvdimitrov . I have created this PR #10364, in which I am trying to solve the issue. Explaining my decision. I think this should be enough to address the flakyness. Let me know what you think

@NickAnge NickAnge self-assigned this Jan 7, 2025
@narqo
Copy link
Contributor

narqo commented Jan 10, 2025

I'm not sure it's resolved. I bumped into the flake in TestHaTrackerWithMemberList in #10376 (another change in the Helm chart) after I've rebased the PR atop the latest main.

The build failed with the stack trace:

--- FAIL: TestHaTrackerWithMemberList (2.01s)
    ha_tracker_test.go:311: expected <nil>, got replicas did not match: r1 != r2
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg="Get - not found" key=prefixuser/cluster
level=debug msg=CAS key=prefixuser/cluster modify_index=0 value="\"\\x15P\\n\\x05first\\x10\\xe1\\xf7\\xe9\\x8d\\xc52 \\xe1\\xf7\\xe9\\x8d\\xc52\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=2 value="\"\\x15P\\n\\x05first\\x10\\xe1\\xf7\\xe9\\x8d\\xc52 \\xe1\\xf7\\xe9\\x8d\\xc52\""
level=debug msg=CAS key=prefixuser/cluster modify_index=2 value="\"\\x18\\\\\\n\\x06second\\x10\\xf1\\xc5\\xea\\x8d\\xc52 \\xf1\\xc5\\xea\\x8d\\xc52(\\x01\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=3 value="\"\\x18\\\\\\n\\x06second\\x10\\xf1\\xc5\\xea\\x8d\\xc52 \\xf1\\xc5\\xea\\x8d\\xc52(\\x01\""
level=debug msg=CAS key=prefixuser/cluster modify_index=3 value="\"\\x17X\\n\\x05first\\x10\\x99\\x8c\\xeb\\x8d\\xc52 \\x81\\x94\\xeb\\x8d\\xc52(\\x02\""
2025/01/10 19:39:23 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/10 19:39:23 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/10 19:39:23 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/10 19:39:23 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2025/01/10 19:39:23 label __name__ is overwritten. Check if Prometheus reserved labels are used.
level=info msg="server listening on addresses" http=127.0.0.1:43539 grpc=127.0.0.1:35977
level=warn method=/httpgrpc.HTTP/Handle duration=3.470481ms request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{},Body:[104 101 108 108 111],}" msg=gRPC err="rpc error: code = Code(415) desc = unsupported content type: , supported: [application/json, application/x-protobuf]"
level=warn method=/httpgrpc.HTTP/Handle duration=615.236µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObjectCB: expect { or n, but found i, error found in #1 byte of ...|invalid|..., bigger context ...|invalid|..."
level=warn method=/httpgrpc.HTTP/Handle duration=643.873µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[10 246 22 10 211 2 10 29 10 17 99 111 110 116 97 105 110 101 114 46 114 117 110 116 105 109 101 18 8 10 6 100 111 99 107 101 114 10 39 10 18 99 111 110 116 97 105 110 101 114 46 104],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObjectCB: expect { or n, but found \ufffd, error found in #2 byte of ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011co|..., bigger context ...|\n\ufffd\u0016\n\ufffd\u0002\n\u001d\n\u0011container.runtime\u0012\u0008\n\u0006docker\n'\n\u0012container.h|..."
level=warn method=/httpgrpc.HTTP/Handle duration=891.675µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[123 34 114 101 115 111 117 114 99 101 77 101 116 114 105 99 115 34 58 32 91 123 34 115 99 111 112 101 77 101 116 114 105 99 115 34 58 32 91 123 34 109 101 116 114 105 99 115 34 58 32 91 123 34 110 97 109 101 34 58 32 34 114 101 112 111 114 116 95 115 101 114 118 101 114 95 101 114 114 111 114 34 44 32 34 103 97 117 103 101 34 58 32 123 34 100 97 116 97 80 111 105 110 116 115 34 58 32 91 123 34 116 105 109 101 85 110 105 120 78 97 110 111 34 58 32 34 49 54 55 57 57 49 50 52 54 51 51 52 48 48 48 48 48 48 48 34 44 32 34 97 115 68 111 117 98 108 101 34 58 32 49 48 46 54 54 125 93 125 125 93 125 93 125 93 125],}" msg=gRPC err="rpc error: code = Code(503) desc = some random push error"
level=info msg="=== Handler.Stop()'d ==="
FAIL
FAIL	github.com/grafana/mimir/pkg/distributor	94.033s

@narqo narqo reopened this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants