diff --git a/site/bps/BP-44-use-metrics.md b/site/bps/BP-44-use-metrics.md index b4526aa94f2..e09be5af144 100644 --- a/site/bps/BP-44-use-metrics.md +++ b/site/bps/BP-44-use-metrics.md @@ -6,7 +6,7 @@ release: "N/A" --- ### Motivation -Based on our experience (at Splunk) running many BookKeeper clusters in production, from very small to very large deployments (in terms of number of bookies, size of VMs and load) we have identified a number of short-comings with the current BookKeeper metrics that make it harder than it should be to identify bottlenecks in performance. The USE method is an effective strategy for diagnosing where bottlenecks in a system lie but the current metrics do not always expose metrics related to utilization and saturation. Also, even if you have a good mental model for how BookKeeper works internally, there are blindspots in the metrics that make it difficult to know what is happening at times. +Based on our experience (at Splunk) running many BookKeeper clusters in production, from very small to very large deployments (in terms of number of bookies, size of VMs and load) we have identified a number of short-comings with the current BookKeeper metrics that make it harder than it should be to identify bottlenecks in performance. The [USE method](https://www.brendangregg.com/usemethod.html) (Utilization, Saturation, Errors) is an effective strategy for diagnosing where bottlenecks in a system lie but the current metrics do not always expose metrics related to utilization and saturation. Also, even if you have a good mental model for how BookKeeper works internally, there are blindspots in the metrics that make it difficult to know what is happening at times. Finally, many of the metrics are aggregated, such as journal and DbLedgerStorage. When these components are configured with multiple directories, it is currently not possible to inspect the metrics of only a single journal or DbLedgerStorage instance. One bad volume can be hard to identify.