Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Opamp after operator upgrade on 0.44.2 version #985

Open
flenoir opened this issue Dec 21, 2023 · 6 comments
Open

Issue with Opamp after operator upgrade on 0.44.2 version #985

flenoir opened this issue Dec 21, 2023 · 6 comments

Comments

@flenoir
Copy link

flenoir commented Dec 21, 2023

Hi,

i'm trying to upgrade operator to version 0.44.2 of helm chart.

I get some errors regarding opampbridge but couldn't find how to solve it.

pod logs reports :

`{"level":"error","ts":"2023-12-21T13:20:30Z","logger":"controller-runtime.source.EventHandler","msg":"failed to get informer from cache","error":"failed to get API group resources: unable to retrieve the complete list of server APIs: autoscaling/v2: the server could not find the requested resource","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:68\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:73\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:74\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56"}

`
{"level":"error","ts":"2023-12-21T13:21:30Z","msg":"Could not wait for Cache to sync","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","error":"failed to wait for opentelemetrycollector caches to sync: timed out waiting for cache to be synced for Kind *v2.HorizontalPodAutoscaler","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:203\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}
{"level":"error","ts":"2023-12-21T13:21:30Z","logger":"setup","msg":"problem running manager","error":"failed to wait for opentelemetrycollector caches to sync: timed out waiting for cache to be synced for Kind *v2.HorizontalPodAutoscaler","stacktrace":"main.main\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:311\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:267"}

I did uninstall, re-install everything, patched CRDS but can't make it work. i did found this issue
but still struggling on CR yaml file. i found this example => #938 (comment)

apiVersion: opentelemetry.io/v1alpha1
kind: OpAMPBridge
metadata:
  name: test
spec:
  image: "ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:v0.88.0"
  endpoint: ws://opamp-server:4320/v1/opamp
  capabilities:
    AcceptsOpAMPConnectionSettings: true
    AcceptsOtherConnectionSettings: true
    AcceptsRemoteConfig: true
    AcceptsRestartCommand: true
    ReportsEffectiveConfig: true
    ReportsHealth: true
    ReportsOwnLogs: true
    ReportsOwnMetrics: true
    ReportsOwnTraces: true
    ReportsRemoteConfig: true
    ReportsStatus: true
  componentsAllowed:
    receivers:
    - otlp
    processors:
    - memory_limiter
    exporters:
    - logging

but still searching on this endpoint.

Should it be a websocket ? (ws://) should the port stay on 4320 as my service only seems to expose port 80 ?

An example of CR opamp file would be helpfull

@JaredTan95
Copy link
Member

pls following open-telemetry/opentelemetry-operator#2314.

@flenoir
Copy link
Author

flenoir commented Dec 21, 2023

yes, i did uninstall operator through helm. i also did updated manual the 3 crds but still have issue. should crds be applied after or before helm update ?

Then should a custom resource like below should applied ?

apiVersion: opentelemetry.io/v1alpha1
kind: OpAMPBridge
metadata:
  name: otelbridge
spec:
  image: "ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:v0.90.0"
  endpoint: ws://opamp-server:4320/v1/opamp
  capabilities:
    AcceptsOpAMPConnectionSettings: true
    AcceptsOtherConnectionSettings: true
    AcceptsRemoteConfig: true
    AcceptsRestartCommand: true
    ReportsEffectiveConfig: true
    ReportsHealth: true
    ReportsOwnLogs: true
    ReportsOwnMetrics: true
    ReportsOwnTraces: true
    ReportsRemoteConfig: true
    ReportsStatus: true
  componentsAllowed:
    receivers:
      - otlp
      - jaeger
      - kafka/traces_fab
      - kafka/traces_prod
      - zipkin
      - kafka/metrics_fab
      - kafka/metrics_prod
      - prometheus/receiver
    processors:
      - memory_limiter
      - span/statuscode
    exporters:
      - debug
      - otlp/tempo
      - otlphttp
      - prometheusremotewrite
      - otlp/vm
      - otlphttp

@TylerHelmuth
Copy link
Member

Can you try completely removing the crds

@JaredTan95
Copy link
Member

Then should a custom resource like below should applied ?

Not necessarily, and I don't think that's the root cause of the error

@Duanjax
Copy link

Duanjax commented Apr 1, 2024

Having the same issue "failed to wait for opampbridge caches to sync: timed out waiting for cache to be synced for Kind *v1alpha", the manager container restarts every couple of minutes.

try the solution open-telemetry/opentelemetry-operator#2314, seems not work.

any updates here?

@alibahramian
Copy link

Seems like missing autoscaling/v2 in k8s cluster is the issue and in my case I had autoscaling/v1, you may check it with kubectl api-versions | grep autoscaling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants