Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel Demo App not deploying properly #447

Closed
avillela opened this issue Oct 19, 2022 · 19 comments
Closed

OTel Demo App not deploying properly #447

avillela opened this issue Oct 19, 2022 · 19 comments
Labels
bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart required-for-v1 Must be resolved before a 1.0 release

Comments

@avillela
Copy link
Contributor

The OTel Demo App is deploying with errors when I attempt to use the opentelemetry-demo Helm chart.

Errors include:

  • CrashLoopBackOff on the featureflagservice
  • Error in recommendationservice (see attached screen shot)
  • Error: name resolution failed for target dons:my-otel-demo-product-catlog-service:8080 (see attached OTel Collector log snippet)
  • Error: name resolution failed for target dons:my-otel-demo-cart-service:8080 (see attached OTel Collector log snippet)
  • Traces not rendered to Observability back-end. values.yaml snippet for open telemetry-collector config:
opentelemetry-collector:
  nameOverride: otelcol
  mode: deployment
  extraEnvs:
    - name: LS_TOKEN
      valueFrom:
        secretKeyRef:
          key: LS_TOKEN
          name: otel-collector-secret
  config:
    exporters:
      otlp/ls:
        endpoint: ingest.lightstep.com:443
        timeout: 30s
        tls:
          insecure_skip_verify: true
        headers:
          "lightstep-access-token": "${LS_TOKEN}"

      logging:
        logLevel: debug

    service:
      pipelines:
        metrics:
          exporters:
            - logging
            - otlp/ls
        traces:
          exporters:
            - logging
            - otlp/ls

recommendationservice-error

otelcol-logs-2

otelcol-logs-1

@TylerHelmuth
Copy link
Member

Thanks for documenting these errors. The error with recommendation service is being tracked here: #436

@TylerHelmuth TylerHelmuth added bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart labels Oct 19, 2022
@TylerHelmuth
Copy link
Member

@avillela does exporting traces work with the default jaeger backend?

@avillela
Copy link
Contributor Author

avillela commented Oct 19, 2022

@TylerHelmuth I haven't tried that yet. It's next on my troubleshooting list.

@avillela
Copy link
Contributor Author

@TylerHelmuth this was the error in the featureflagservice:

To address the second, you can run "mix ecto.drop" followed by                                                                                            │
│ "mix ecto.create". Alternatively you may configure Ecto to use                                                                                            │
│ another table and/or repository for managing migrations:                                                                                                  │
│                                                                                                                                                           │
│     config :featureflagservice, Featureflagservice.Repo,                                                                                                  │
│       migration_source: "some_other_table_for_schema_migrations",                                                                                         │
│       migration_repo: AnotherRepoForSchemaMigrations                                                                                                      │
│                                                                                                                                                           │
│ The full error report is shown below.                                                                                                                     │
│                                                                                                                                                           │
│ ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2962ms. This means requests are coming in and your co │
│                                                                                                                                                           │
│   1. Ensuring your database is available and that you can connect to it                                                                                   │
│   2. Tracking down slow queries and making sure they are running fast enough                                                                              │
│   3. Increasing the pool_size (although this increases resource consumption)                                                                              │
│   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval                                                                     │
│                                                                                                                                                           │
│ See DBConnection.start_link/2 for more information                                                                                                        │
│                                                                                                                                                           │
│     (ecto_sql 3.8.2) lib/ecto/adapters/sql.ex:932: Ecto.Adapters.SQL.raise_sql_call_error/1                                                               │
│     (elixir 1.13.3) lib/enum.ex:1593: Enum."-map/2-lists^map/1-0-"/2                                                                                      │
│     (ecto_sql 3.8.2) lib/ecto/adapters/sql.ex:1024: Ecto.Adapters.SQL.execute_ddl/4                                                                       │
│     (ecto_sql 3.8.2) lib/ecto/migrator.ex:696: Ecto.Migrator.verbose_schema_migration/3                                                                   │
│     (ecto_sql 3.8.2) lib/ecto/migrator.ex:510: Ecto.Migrator.lock_for_migrations/4                                                                        │
│     (ecto_sql 3.8.2) lib/ecto/migrator.ex:422: Ecto.Migrator.run/4                                                                                        │
│     (ecto_sql 3.8.2) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3                                                                                  │
│     (featureflagservice 0.1.0) lib/featureflagservice/release.ex:12: anonymous fn/2 in Featureflagservice.Release.migrate/0   

@TylerHelmuth
Copy link
Member

That error happens if the featureflag serivce starts before postgres is ready. The featureflag service is supposed to crash and then restart until postgres is available, but thats not working for me locally, despite it workin in our CI.

@austinlparker
Copy link
Member

The service does crash and restart, but it can't find the server at first?

kubectl logs demo-featureflagservice-69656c9875-c9tlb -n otel-demo
18:00:57.743 [error] Postgrex.Protocol (#PID<0.136.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (demo-ffs-postgres:5432): non-existing domain - :nxdomain
18:00:57.743 [error] Postgrex.Protocol (#PID<0.135.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (demo-ffs-postgres:5432): non-existing domain - :nxdomain
18:01:00.408 [error] Postgrex.Protocol (#PID<0.136.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (demo-ffs-postgres:5432): non-existing domain - :nxdomain
18:01:00.656 [error] Postgrex.Protocol (#PID<0.135.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (demo-ffs-postgres:5432): non-existing domain - :nxdomain
18:01:00.720 [error] Could not create schema migrations table. This error usually happens due to the following:```

@avillela
Copy link
Contributor Author

@TylerHelmuth it looks like this was my problem and not a problem with the Helm charts. I had a custom version of values.yaml which I forgot to update after y'all update the Helm Chart. FML.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Oct 19, 2022

@austinlparker crashing and restarting is expected, it'll restart until the postgres service is up and it can connect, at which point it should stay up. Does yours eventually startup? I think I just banged my head against the wall for an hour trying to figure out why mine doesn't work and CI does and I think its bc I'm on an M1 Pro: open-telemetry/opentelemetry-demo#396

@TylerHelmuth
Copy link
Member

Error: name resolution failed for target dons:my-otel-demo-product-catlog-service:8080 (see attached OTel Collector log snippet)
Error: name resolution failed for target dons:my-otel-demo-cart-service:8080 (see attached OTel Collector log snippet)

@avillela are you still experiencing these issues with the latest chart and values.yaml?

@austinlparker
Copy link
Member

@austinlparker crashing and restarting is expected, it'll restart until the postgres service is up and it can connect, at which point it should stay up. Does yours eventually startup? I think I just banged my head against the wall for an hour trying to figure out why mine doesn't work and CI does and I think its bc I'm on an M1 Pro: open-telemetry/opentelemetry-demo#396

I haven't tried on k8s locally... I'm on a M1 Pro and it does work in Docker. Let me look at that other issue, I was getting some weird behavior if I didn't build images. I think we have a fix going in to build multi-arch so we stop dropping amd64 images on arm

@avillela
Copy link
Contributor Author

avillela commented Oct 19, 2022

Error: name resolution failed for target dons:my-otel-demo-product-catlog-service:8080 (see attached OTel Collector log snippet)

Error: name resolution failed for target dons:my-otel-demo-cart-service:8080 (see attached OTel Collector log snippet)

@avillela are you still experiencing these issues with the latest chart and values.yaml?

@TylerHelmuth it works fine after I updated my local values.yaml to the latest values.yaml.

@TylerHelmuth
Copy link
Member

@austinlparker @puckpuck @joshleecreates I was able to build the featureflag service locally and then point to the image for the helm chart but it still doesn't start right so I don't think my machine is too blame. I've point-pointed that the issue started with the bump to 0.6.0-beta locally, but our testing is showing the service is able to correctly restart in a kind cluster.

@puckpuck
Copy link
Contributor

@TylerHelmuth what does your values.yaml look like in this?

@TylerHelmuth
Copy link
Member

@puckpuck

default:
  image:
    repository: ghcr.io/open-telemetry/demo

@TylerHelmuth
Copy link
Member

Thought maybe it was my local kind/k8s version, but if I bump our kind action to use 0.16.0 and latest k8s it still restarts as expected: #448

@TylerHelmuth
Copy link
Member

Can't get featureflag service up on minikube either, even when only applying the collector, featureflag, and postgres. Featureflag logs always says:

Segmentation fault (core dumped)
Database setup or migrations failed.

This is using a arm64 image built locally. Anyone else with an M1 having any luck?

@TylerHelmuth TylerHelmuth added documentation Improvements or additions to documentation required-for-v1 Must be resolved before a 1.0 release and removed documentation Improvements or additions to documentation labels Oct 21, 2022
@TylerHelmuth
Copy link
Member

@avillela can you take another look at the demo chart? With the latest release all the Apple Silicon issues should be fixed, and I believe other releases fix other issues mentioned.

@TylerHelmuth
Copy link
Member

@avillela I am pretty sure all the issues you've raised have been address. I'm going to close this for now, please reopen if you're still having issues.

@avillela
Copy link
Contributor Author

@avillela I am pretty sure all the issues you've raised have been address. I'm going to close this for now, please reopen if you're still having issues.

@TylerHelmuth Apologies for the delay. Yes, I was finally able to look at this today and it appears that the issues are resolved. Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart required-for-v1 Must be resolved before a 1.0 release
Projects
None yet
Development

No branches or pull requests

4 participants