forked from open-telemetry/opentelemetry-demo
-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Opensearch otel demo observability tutorial (#5)
* add tutorial for using observability and diagnostic of a real use case * adding otel landing page markdown dashboard * adding tutorial markdown dashboards and documents * adding Observability Introduction.md tutorial markdown dashboards and documents * add dependencies to the docker-compose.yml * exposing docker ports locally Signed-off-by: YANGDB <[email protected]> --------- Signed-off-by: YANGDB <[email protected]>
- Loading branch information
Showing
17 changed files
with
470 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{"attributes":{"description":"OTEL demo landing page","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"OTEL demo landing page","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"OTEL demo landing page\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"\\n![](https://raw.githubusercontent.com/opensearch-project/.github/main/profile/banner.jpg)\\n# OpenSearch Observability OTEL Demo\\n\\nWelcome to the [OpenSearch](https://opensearch.org/docs/latest) OpenTelemetry [Demo](https://opentelemetry.io/docs/demo/) documentation, which covers how to install and run the demo, and some scenarios you can use to view OpenTelemetry in action.\\n\\n## Purpose\\nThe purpose of this demo is to demonstrate the different capabilities of OpenSearch Observability to investigate and reflect your system.\\n\\n![](../../../.github/img/DemoFlow.png)\\n\\n### Services\\n[OTEL DEMO](https://opentelemetry.io/docs/demo/services/) Describes the list of services that are composing the Astronomy Shop.\\n\\nThe main services that are open to user interactions:\\n\\n- [Dashboards](https://observability.playground.opensearch.org/)\\n\\n- [Demo Proxy](https://observability.playground.demo-proxy.opensearch.org/)\\n\\n- [Demo loader](https://observability.playground.demo-loader.opensearch.org/)\\n\\n- [Demo feature-flag](https://observability.playground.demo-feature-flag.opensearch.org/)\\n\\n### Screenshots\\n![](https://opentelemetry.io/docs/demo/screenshots/frontend-1.png)\\n\\n_**The shopping App**_\\n![](https://opentelemetry.io/docs/demo/screenshots/frontend-2.png)\\n\\n_**The feature flag**_\\n![](https://opentelemetry.io/docs/demo/screenshots/feature-flag-ui.png)\\n\\n_**The load generator**_\\n![](https://opentelemetry.io/docs/demo/screenshots/load-generator-ui.png)\\n\\n---\\n### Ingestion\\nThe ingestion capabilities for OpenSearch is to be able to support multiple pipelines:\\n- [Data-Prepper](https://github.com/opensearch-project/data-prepper/) is an OpenSearch ingestion project that allows ingestion of OTEL standard signals using Otel-Collector\\n- [Jaeger](https://opensearch.org/docs/latest/observing-your-data/trace/trace-analytics-jaeger/) is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch\\n- [Fluent-Bit](https://docs.fluentbit.io/manual/pipeline/outputs/opensearch) is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch\\n\\n### Integrations -\\nThe integration service is a list of pre-canned assets that are loaded in a combined manner to allow users the ability for simple and automatic way to discover and review their services topology.\\n\\nThese (demo-sample) integrations contain the following assets:\\n- components & index template mapping\\n- datasources\\n- data-stream & indices\\n- queries\\n- dashboards\\n \\n\"}}"},"id":"dd4bebe0-f66a-11ed-9518-f5d5eb1d70bf","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-19T17:30:35.804Z","version":"WzM0LDJd"} | ||
{"attributes":{"description":"OTEL demo landing page","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"0e0c418a-81f8-4d85-8ba7-8d8ef6e2b1d7\"},\"panelIndex\":\"0e0c418a-81f8-4d85-8ba7-8d8ef6e2b1d7\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"OTEL demo landing page","version":1},"id":"e66e2da0-f66a-11ed-9518-f5d5eb1d70bf","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"dd4bebe0-f66a-11ed-9518-f5d5eb1d70bf","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-19T17:30:51.130Z","version":"WzM1LDJd"} | ||
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{"attributes":{"description":"this Pattern present a memory leak diagnostic procedure tutorial","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"mem-leak-diagnostic","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"mem-leak-diagnostic\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# Memory Leak Investigation Tutorial\\n\\n## Tutorial Definition\\n\\nThe following tutorial describes Using Metrics and Traces to diagnose a memory leak\\nApplication telemetry, such as the kind that OpenTelemetry can provide, is very useful for diagnosing issues in a\\ndistributed system. In this scenario, we will walk through a scenario demonstrating how to move from high-level metrics\\nand traces to determine the cause of a memory leak.\\n\\n## Diagnosis\\n\\nThe first step in diagnosing a problem is to determine that a problem exists. Often the first stop will be a metrics\\ndashboard provided by a tool such as metrics analytics under open search observability.\\n\\n## Dashboards\\n\\nThis tutorial contains the OTEL demo dashboards with a number of charts:\\n\\n- Recommendation Service (CPU% and Memory)\\n- Service Latency (from SpanMetrics)\\n- Error Rate\\n\\nRecommendation Service charts are generated from OpenTelemetry Metrics exported to Prometheus, while the Service Latency\\nand Error Rate charts are generated through the OpenTelemetry Collector Span Metrics processor.\\n\\nFrom our dashboard, we can see that there seems to be anomalous behavior in the recommendation service – spiky CPU\\nutilization, as well as long tail latency in our p95, 99, and 99.9 histograms. We can also see that there are\\nintermittent spikes in the memory utilization of this service.\\nWe know that we’re emitting trace data from our application as well, so let’s think about another way that we’d be able\\nto determine that a problem exist.\\n\\n### Traces exploration\\n\\nOpenSearch Observability Trace analytics allows us to search for traces and display the end-to-end latency of an entire\\nrequest with visibility into each individual part of the overall request. Perhaps we noticed an increase in tail latency\\non our frontend requests. Traces dashboard allows us to then search and filter our traces to include only those that\\ninclude requests to recommendation service.\\n\\nBy sorting by latency, we’re able to quickly find specific traces that took a long time. Clicking on a trace in the\\nright panel, we’re able to view the waterfall view.\\nWe can see that the recommendation service is taking a long time to complete its work, and viewing the details allows us\\nto get a better idea of what’s going on.\\n\\n### Confirming the Diagnosis\\n\\nWe can see in our waterfall view that the app.cache_hit attribute is set to false, and that the `app.products.count` value\\nis extremely high.\\n\\nReturning to the search UI, filter to `recommendationservice` in the Service dropdown, and search for app.cache_hit=true\\nin the Tags box.\\n\\nNotice that requests tend to be faster when the cache is hit. Now search for `app.cache_hit=false` and compare the\\nlatency.\\n\\nYou should notice some changes in the visualization at the top of the trace list.\\n\\nNow, since this is a contrived scenario, we know where to find the underlying bug in our code. However, in a real-world\\nscenario, we may need to perform further searching to find out what’s going on in our code, or the interactions between\\nservices that cause it.\\n\\n### SOP flow context aware\\n\\nThe next diagram shows the context aware phases within this SOP.\\n\\nThe user can be shown the summary of the flow for solving his issue and in addition can focus on the actual step he is\\ncurrently performing.\\n\\nThe overall process is mapped into a **state machine** in-which each step has a state with a **transition**.\\n\\nWhen user goes into a different **scope** (`time based` ,`service based`, `log based`) this is defined as a indexed Context (`Ctx[1]`,`Ctx[2]`,...)\\n\\n---\\n\\nThis sequence outlines a process for investigating memory leaks that begins with gathering service data from both Prometheus and OpenSearch. Upon combining and reviewing latency of these services, an anomaly detection leads to a review of service traces, followed by log correlation, log fetching, and eventually an overlay of logs to highlight differences.\\n\\n```mermaid\\n Info[Memory Leak Investigation]\\n |\\n V\\nGet All Services --> Query?[Prometheus]\\n | |\\n | V\\n |--> Query?[OpenSearch]\\n | |\\n V V\\nCombine --> Review[Services Latency]\\n |\\n V\\nIdentify Anomaly --> Query?[Service@traces]\\n | |\\n | V\\n |--> Time Based --> Review[Services traces]\\n | |\\n V V\\nWhats Next? --> Suggest[Correlation with logs]\\n | |\\n | V\\n |--> Fetch Logs --> Review[logs]\\n | |\\n V V\\nWhats Next? --> Suggest[logs overlay]\\n | |\\n | V\\n |--> Fetch Logs --> Review[logs diff]\\n | |\\n V V\\nEnd <------------------ End\\n\\n```\\n\"}}"},"id":"92546710-f751-11ed-b6d0-850581e4a72d","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-20T21:02:03.776Z","version":"WzUxLDVd"} | ||
{"attributes":{"description":"this Pattern present a memory leak diagnostic procedure tutorial","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"a1954dc7-8655-4ea8-9a75-67cbe201b80c\"},\"panelIndex\":\"a1954dc7-8655-4ea8-9a75-67cbe201b80c\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"mem-leak-dignostic","version":1},"id":"9aa66080-f751-11ed-b6d0-850581e4a72d","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"92546710-f751-11ed-b6d0-850581e4a72d","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-20T21:02:17.736Z","version":"WzUyLDVd"} | ||
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{"attributes":{"description":"OTEL Astronomy Demo Application architecture","kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"query\":\"\",\"language\":\"kuery\"},\"filter\":[]}"},"title":"otel-architecture","uiStateJSON":"{}","version":1,"visState":"{\"title\":\"otel-architecture\",\"type\":\"markdown\",\"aggs\":[],\"params\":{\"fontSize\":12,\"openLinksInNewTab\":false,\"markdown\":\"# OTEL Astronomy Demo Application\\n\\nThe following diagram presents the OTEL Astronomy shop services architecture:\\n\\n![](img/DemoServices.png)\\n\\n\\n### Trace Collectors\\nGaining a macro-level perspective on incoming data, such as sample counts and cardinality, is essential for comprehending the collector’s internal dynamics. However, when delving into the details, the interconnections can become complex. The Collector Data Flow Dashboard aims to demonstrate the capabilities of the OpenTelemetry demo application, offering a solid foundation for users to build upon.\\n\\nMonitoring data flow through the OpenTelemetry Collector is crucial for several reasons.\\n - All services are traces in all the development languages\\n - Auto instrumented\\n - Manual spans and attributes\\n - Span events\\n - Span links\\n\\nTrace Headers are propagated across all services (**Context propagation**)\\n\\n\\n### Metric Collectors\\nCollecting all the KPI information into Prometheus time series storage including:\\n - runtime metrics\\n - HTTP / gRPC latency distribution\\n\\n### Data Flow Overview\\nCollector Data Flow Dashboard provides valuable guidance on which metrics to monitor. Users can tailor their own dashboard variations by adding necessary metrics specific to their use cases, such as memory_delimiter processor or other data flow indicators. This demo dashboard serves as a starting point, enabling users to explore diverse usage scenarios and adapt the tool to their unique monitoring needs.\\n\\nThe diagram below provides an overview of the system components, showcasing the configuration derived from the OpenTelemetry Collector (otelcol) configuration file utilized by the OpenTelemetry demo application. Additionally, it highlights the observability data (traces and metrics) flow within the system.\\n\\n#### Simple purchase use case\\nThis flow diagram shows the trace evolution from the user selecting a purchased item going through different backend services until reaching the storage database.\\n![](img/DemoFlow.png)\\n\\n## Reference\\n**_OTEL Demo info_**\\n- [architecture](https://opentelemetry.io/docs/demo/architecture/)\\n- [collector-data-flow-dashboard](https://opentelemetry.io/docs/demo/collector-data-flow-dashboard/)\\n- [services](https://opentelemetry.io/docs/demo/services/)\\n\\n**_OTEL Demo youtubes_**\\n - [Cloud Native Live: OpenTelemetry community demo](https://www.youtube.com/watch?v=kD0EAjly9jc)\\n\"}}"},"id":"5c297aa0-f750-11ed-b6d0-850581e4a72d","migrationVersion":{"visualization":"7.10.0"},"references":[],"type":"visualization","updated_at":"2023-05-20T20:53:23.402Z","version":"WzQ3LDVd"} | ||
{"attributes":{"description":"OTEL Astronomy Demo Application architecture","hits":0,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"query\":{\"language\":\"kuery\",\"query\":\"\"},\"filter\":[]}"},"optionsJSON":"{\"hidePanelTitles\":false,\"useMargins\":true}","panelsJSON":"[{\"version\":\"2.7.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"cc8d389f-6ab0-4590-bd7d-140ed04a28b1\"},\"panelIndex\":\"cc8d389f-6ab0-4590-bd7d-140ed04a28b1\",\"embeddableConfig\":{},\"panelRefName\":\"panel_0\"}]","timeRestore":false,"title":"otel-demo-architecture","version":1},"id":"67e37e40-f750-11ed-b6d0-850581e4a72d","migrationVersion":{"dashboard":"7.9.3"},"references":[{"id":"5c297aa0-f750-11ed-b6d0-850581e4a72d","name":"panel_0","type":"visualization"}],"type":"dashboard","updated_at":"2023-05-20T20:53:43.076Z","version":"WzQ4LDVd"} | ||
{"exportedCount":2,"missingRefCount":0,"missingReferences":[]} |
Oops, something went wrong.