(only shows metrics that are documented. generated with metrics2docs)
api.cluster.speculative.attempts
:
how many peer queries resulted in speculationapi.cluster.speculative.requests
:
how many speculative http requests made to peersapi.cluster.speculative.wins
:
how many peer queries were improved due to speculationapi.get_target
:
how long it takes to get a targetapi.iters_to_points
:
how long it takes to decode points from a chunk iteratorapi.request.%s
:
the latency of each request by request path.api.request.%s.size
:
the size of each response by request pathapi.request.%s.status.%d
:
the count of the number of responses for each request path, status code combination. eg.api.requests.metrics_find.status.200
andapi.request.render.status.503
api.request.render.chosen_archive
:
the archive chosen for the request. 0 means original data, 1 means first agg level, 2 means 2ndapi.request.render.points_fetched
:
the number of points that need to be fetched for a /render request.api.request.render.points_returned
:
the number of points the request will return best effort: not aware of summarize(), aggregation functions, runtime normalization. but does account for runtime consolidationapi.request.render.series
:
the number of series a /render request is handling. This is the number of metrics after all of the targets in the request have expanded by searching the index.api.request.render.targets
:
the number of targets a /render request is handling.api.requests_span.mem
:
the timerange of requests hitting only the ringbufferapi.requests_span.mem_and_cassandra
:
the timerange of requests hitting both in-memory and cassandracache.ops.chunk.add
:
how many chunks were added to the cachecache.ops.chunk.evict
:
how many chunks were evicted from the cachecache.ops.chunk.hit
:
how many chunks were hitcache.ops.chunk.push-hot
:
how many chunks have been pushed into the cache because their metric is hotcache.ops.metric.add
:
how many metrics were added to the cachecache.ops.metric.evict
:
how many metrics were evicted from the cachecache.ops.metric.hit-full
:
how many metrics were hit fully (all needed chunks in cache)cache.ops.metric.hit-partial
:
how many metrics were hit partially (some of the needed chunks in cache, but not all)cache.ops.metric.miss
:
how many metrics were missed fully (no needed chunks in cache)cache.overhead.chunk
:
an approximation of the overhead used to store chunks in the cachecache.overhead.flat
:
an approximation of the overhead used by flat accountingcache.overhead.lru
:
an approximation of the overhead used by the LRUcache.size.max
:
the maximum size of the cache (overhead does not count towards this limit)cache.size.used
:
how much of the cache is used (sum of the chunk data without overhead)cluster.decode_err.join
:
a counter of json unmarshal errorscluster.decode_err.update
:
a counter of json unmarshal errorscluster.events.join
:
how many node join events were receivedcluster.events.leave
:
how many node leave events were receivedcluster.events.update
:
how many node update events were receivedcluster.notifier.all.messages-received
:
a counter of messages received from cluster notifierscluster.notifier.kafka.message_size
:
the sizes seen of messages through the kafka cluster notifiercluster.notifier.kafka.messages-published
:
a counter of messages published to the kafka cluster notifiercluster.notifier.kafka.partition.%d.lag
:
how many messages (mechunkWriteRequestsrics) there are in the kafka partition (%d) that we have not yet consumed.cluster.notifier.kafka.partition.%d.log_size
:
the size of the kafka partition (%d), aka the newest available offset.cluster.notifier.kafka.partition.%d.offset
:
the current offset for the partition (%d) that we have consumedcluster.self.partitions
:
the number of partitions this instance consumescluster.self.priority
:
the priority of the node. A lower number gives higher prioritycluster.self.promotion_wait
:
how long a candidate (secondary node) has to wait until it can become a primary When the timer becomes 0 it means the in-memory buffer has been able to fully populate so that if you stop a primary and it was able to save its complete chunks, this node will be able to take over without dataloss. You can upgrade a candidate to primary while the timer is not 0 yet, it just means it may have missing data in the chunks that it will save.cluster.self.state.primary
:
whether this instance is a primarycluster.self.state.ready
:
whether this instance is readycluster.total.partitions
:
the number of partitions in the cluster that we know ofcluster.total.state.primary-not-ready
:
the number of nodes we know to be primary but not ready (total should only be in this state very temporarily)cluster.total.state.primary-ready
:
the number of nodes we know to be primary and readycluster.total.state.query-not-ready
:
the number of nodes we know to be query nodes and not readycluster.total.state.query-ready
:
the number of nodes we know to be query nodes and readycluster.total.state.secondary-not-ready
:
the number of nodes we know to be secondary and not readycluster.total.state.secondary-ready
:
the number of nodes we know to be secondary and readyidx.bigtable.add
:
the duration of an add of one metric to the bigtable idx, including the add to the in-memory index, excluding the insert queryidx.bigtable.control.add
:
the duration of add control messages processedidx.bigtable.control.delete
:
the duration of delete control messages processedidx.bigtable.delete
:
the duration of a delete of one or more metrics from the bigtable idx, including the delete from the in-memory index and the delete queryidx.bigtable.prune
:
the duration of a prune of the bigtable idx, including the prune of the in-memory index and all needed delete queriesidx.bigtable.query-delete.exec
:
time spent executing deletes (possibly repeatedly until success)idx.bigtable.query-delete.fail
:
how many delete queries for a metric failed (triggered by an update or a delete)idx.bigtable.query-delete.ok
:
how many delete queries for a metric completed successfully (triggered by an update or a delete)idx.bigtable.query-insert.exec
:
time spent executing inserts (possibly repeatedly until success)idx.bigtable.query-insert.fail
:
how many insert queries for a metric failed (triggered by an add or an update)idx.bigtable.query-insert.ok
:
how many insert queries for a metric completed successfully (triggered by an add or an update)idx.bigtable.query-insert.wait
:
time inserts spent in queue before being executedidx.bigtable.save.bytes-per-request
:
the number of bytes written to bigtable in each request.idx.bigtable.save.skipped
:
how many saves have been skipped due to the writeQueue being fullidx.bigtable.update
:
the duration of an update of one metric to the bigtable idx, including the update to the in-memory index, excluding any insert/delete queriesidx.cassadra.query-delete.ok
:
how many delete queries for a metric completed successfully (triggered by an update or a delete)idx.cassadra.query-insert.ok
:
how many insert queries for a metric completed successfully (triggered by an add or an update)idx.cassandra.add
:
the duration of an add of one metric to the cassandra idx, including the add to the in-memory index, excluding the insert queryidx.cassandra.control.add
:
the duration of add control messages processedidx.cassandra.control.delete
:
the duration of delete control messages processedidx.cassandra.delete
:
the duration of a delete of one or more metrics from the cassandra idx, including the delete from the in-memory index and the delete queryidx.cassandra.error.cannot-achieve-consistency
:
a counter of the cassandra idx not being able to achieve consistency for a given queryidx.cassandra.error.conn-closed
:
a counter of how many times we saw a connection closed to the cassandra idxidx.cassandra.error.no-connections
:
a counter of how many times we had no connections remaining to the cassandra idxidx.cassandra.error.other
:
a counter of other errors talking to the cassandra idxidx.cassandra.error.timeout
:
a counter of timeouts seen to the cassandra idxidx.cassandra.error.too-many-timeouts
:
a counter of how many times we saw to many timeouts and closed the connection to the cassandra idxidx.cassandra.error.unavailable
:
a counter of how many times the cassandra idx was unavailableidx.cassandra.prune
:
the duration of a prune of the cassandra idx, including the prune of the in-memory index and all needed delete queriesidx.cassandra.query-delete.exec
:
time spent executing deletes (possibly repeatedly until success)idx.cassandra.query-delete.fail
:
how many delete queries for a metric failed (triggered by an update or a delete)idx.cassandra.query-insert.exec
:
time spent executing inserts (possibly repeatedly until success)idx.cassandra.query-insert.fail
:
how many insert queries for a metric failed (triggered by an add or an update)idx.cassandra.query-insert.wait
:
time inserts spent in queue before being executedidx.cassandra.save.skipped
:
how many saves have been skipped due to the writeQueue being fullidx.cassandra.update
:
the duration of an update of one metric to the cassandra idx, including the update to the in-memory index, excluding any insert/delete queriesidx.memory.add
:
the duration of a (successful) add of a metric to the memory idxidx.memory.delete
:
the duration of a delete of one or more metrics from the memory idxidx.memory.filtered
:
number of series that have been excluded from responses due to their lastUpdate propertyidx.memory.find
:
the duration of memory idx findidx.memory.find-cache.backoff
:
the number of find caches in backoff modeidx.memory.find-cache.entries
:
the number of entries in the cacheidx.memory.find-cache.invalidation.drop
:
the number of dropped invalidation requestsidx.memory.find-cache.invalidation.exec
:
the number of executed invalidation requestsidx.memory.find-cache.invalidation.recv
:
the number of received invalidation requestsidx.memory.find-cache.ops.hit
:
a counter of findCache hitsidx.memory.find-cache.ops.miss
:
a counter of findCache missesidx.memory.get
:
the duration of a get of one metric in the memory idxidx.memory.list
:
the duration of memory idx listingsidx.memory.meta-tags.enricher.ops.known-meta-records
:
a counter of meta records known to the enricheridx.memory.meta-tags.enricher.ops.metrics-with-meta-records
:
a counter of metrics with at least one associated meta recordidx.memory.ops.add
:
the number of additions to the memory idxidx.memory.ops.update
:
the number of updates to the memory idxidx.memory.prune
:
the duration of successful memory idx prunesidx.memory.update
:
the duration of (successful) update of a metric to the memory idxidx.metrics_active
:
the number of currently known metrics in the indexinput.%s.metricdata.discarded.invalid
:
a count of times a metricdata was invalid by input plugininput.%s.metricdata.discarded.invalid_input
:
a count of times a metricdata was considered invalid due to invalid input data in the metric definition. all rejected metrics counted here are also counted in the above "invalid" counterinput.%s.metricdata.received
:
the count of metricdata datapoints received by input plugininput.%s.metricpoint.discarded.invalid
:
a count of times a metricpoint was invalid by input plugininput.%s.metricpoint.discarded.unknown
:
the count of times the ID of a received metricpoint was not in the index, by input plugininput.%s.metricpoint.received
:
the count of metricpoint datapoints received by input plugininput.%s.metricpoint_no_org.received
:
the count of metricpoint_no_org datapoints received by input plugininput.carbon.metrics_decode_err
:
a count of times an input message (MetricData, MetricDataArray or carbon line) failed to parseinput.carbon.metrics_per_message
:
how many metrics per message were seen. in carbon's case this is always 1.input.kafka-mdm.controlmsg_decode_err
:
a count of times a control message failed to parseinput.kafka-mdm.metrics_decode_err
:
a count of times an input message failed to parseinput.kafka-mdm.metrics_per_message
:
how many metrics per message were seen.input.kafka-mdm.partition.%d.lag
:
how many messages (metrics) there are in the kafka partition (%d) that we have not yet consumed.input.kafka-mdm.partition.%d.log_size
:
the current size of the kafka partition (%d), aka the newest available offset.input.kafka-mdm.partition.%d.offset
:
the current offset for the partition (%d) that we have consumed.mem.to_iter
:
how long it takes to transform in-memory chunks to iteratorsmemory.bytes.obtained_from_sys
:
the number of bytes currently obtained from the system by the process. This is what the profiletrigger looks at.memory.bytes_allocated_on_heap
:
a gauge of currently allocated (within the runtime) memory.memory.gc.cpu_fraction
:
how much cpu is consumed by the GC across process lifetime, in pro-millememory.gc.gogc
:
the current GOGC value (derived from the GOGC environment variable)memory.gc.heap_objects
:
how many objects are allocated on the heap, it's a key indicator for GC workloadmemory.gc.last_duration
:
the duration of the last GC STW pause in nanosecondsmemory.total_bytes_allocated
:
a counter of total number of bytes allocated during process lifetimememory.total_gc_cycles
:
a counter of the number of GC cycles since process startplan.run
:
the time spent running the plan for a request (function processing of all targets and runtime consolidation)pointslicepool.ops.get-candidate.hit
:
how many times we could satisfy a get with a pointslice from the poolpointslicepool.ops.get-candidate.miss
:
how many times there was nothing in the pool to satisfy a getpointslicepool.ops.get-candidate.unfit
:
how many times a pointslice from the pool was not large enough to satisfy a getpointslicepool.ops.get-make.default
:
how many times a pointslice is allocated that is equal to the default sizepointslicepool.ops.get-make.large
:
how many times a pointslice is allocated that is larger than the default sizepointslicepool.ops.get-make.small
:
how many times a pointslice is allocated that is smaller than the default sizepointslicepool.ops.put.default
:
how many times a pointslice is added to the pool that is equal to the defaultpointslicepool.ops.put.large
:
how many times a pointslice is added to the pool that is larger than the defaultpointslicepool.ops.put.small
:
how many times a pointslice is added to the pool that is smaller than the defaultprocess.major_page_faults.counter64
:
the number of major faults the process has made which have required loading a memory page from diskprocess.minor_page_faults.counter64
:
the number of minor faults the process has made which have not required loading a memory page from diskprocess.resident_memory_bytes.gauge64
:
a gauge of the process RSS from /proc/pid/statprocess.virtual_memory_bytes.gauge64
:
a gauge of the process VSZ from /proc/pid/statrecovered_errors.aggmetric.getaggregated.bad-aggspan
:
how many times we detected an GetAggregated call with an incorrect aggspan specifiedrecovered_errors.aggmetric.getaggregated.bad-consolidator
:
how many times we detected an GetAggregated call with an incorrect consolidator specifiedrecovered_errors.idx.memory.corrupt-index
:
how many times a corruption has been detected in one of the internal index structures each time this happens, an error is logged with more details.recovered_errors.idx.memory.invalid-tag
:
how many times an invalid tag for a metric is encountered. each time this happens, an error is logged with more details.runtime.goroutines.total
:
how many goroutines there arestats.generate_message
:
how long it takes to generate the statsstore.bigtable.chunk_operations.save_fail
:
counter of failed savesstore.bigtable.chunk_operations.save_ok
:
counter of successful savesstore.bigtable.chunk_size.at_load
:
the sizes of chunks seen when loading themstore.bigtable.chunk_size.at_save
:
the sizes of chunks seen when saving themstore.bigtable.chunks_per_row
:
how many chunks are retrieved per row in get queriesstore.bigtable.get.error
:
the count of reads that failedstore.bigtable.get.exec
:
the duration of getting from bigtable storestore.bigtable.get.wait
:
the duration of the get spent in the queuestore.bigtable.put.bytes
:
the number of chunk bytes saved in each bulkApplystore.bigtable.put.exec
:
the duration of putting in bigtable storestore.bigtable.put.wait
:
the duration of a put in the wait queuestore.bigtable.rows_per_response
:
how many rows come per get responsestore.cassandra.chunk_operations.save_fail
:
counter of failed savesstore.cassandra.chunk_operations.save_ok
:
counter of successful savesstore.cassandra.chunk_size.at_load
:
the sizes of chunks seen when loading themstore.cassandra.chunk_size.at_save
:
the sizes of chunks seen when saving themstore.cassandra.chunks_per_response
:
how many chunks are retrieved per response in get queriesstore.cassandra.error.cannot-achieve-consistency
:
a counter of the cassandra store not being able to achieve consistency for a given querystore.cassandra.error.conn-closed
:
a counter of how many times we saw a connection closed to the cassandra storestore.cassandra.error.no-connections
:
a counter of how many times we had no connections remaining to the cassandra storestore.cassandra.error.other
:
a counter of other errors talking to the cassandra storestore.cassandra.error.timeout
:
a counter of timeouts seen to the cassandra storestore.cassandra.error.too-many-timeouts
:
a counter of how many times we saw to many timeouts and closed the connection to the cassandra storestore.cassandra.error.unavailable
:
a counter of how many times the cassandra store was unavailablestore.cassandra.get.exec
:
the duration of getting from cassandra storestore.cassandra.get.wait
:
the duration of the get spent in the queuestore.cassandra.get_chunks
:
the duration of how long it takes to get chunksstore.cassandra.put.exec
:
the duration of putting in cassandra storestore.cassandra.put.wait
:
the duration of a put in the wait queuestore.cassandra.rows_per_response
:
how many rows come per get responsestore.cassandra.to_iter
:
the duration of converting chunks to iteratorstank.chunk_operations.clear
:
a counter of how many chunks are cleared (replaced by new chunks)tank.chunk_operations.create
:
a counter of how many chunks are createdtank.discarded.new-value-for-timestamp
:
points that have timestamps for which we already have data points. these points are discarded. data points can be incorrectly classified as metric tank.discarded.sample-out-of-order even when the timestamp has already been used. This happens in two cases:
- when the reorder buffer is enabled, if the point is older than the reorder buffer retention window
- when the reorder buffer is disabled, if the point is older than the last data point
tank.discarded.received-too-late
:
points received for the most recent chunk when that chunk is already being "closed", ie the end-of-stream marker has been written to the chunk. this indicates that your GC is actively sealing chunks and saving them before you have the chance to send your (infrequent) updates. Any points revcieved for a chunk that has already been closed are discarded.tank.discarded.sample-out-of-order
:
points that go back in time beyond the scope of the optional reorder window. these points will end up being dropped and lost.tank.discarded.sample-too-far-ahead
:
count of points which got discareded because their timestamp is too far in the future, beyond the limitation of the future tolerance window defined via the retention.future-tolerance-ratio parameter.tank.discarded.unknown
:
points that have been discarded for unknown reasons.tank.gc_metric
:
the number of times the metrics GC is about to inspect a metric (series)tank.metrics_active
:
the number of currently known metrics (excl rollup series), measured every secondtank.metrics_reordered
:
the number of points received that are going back in time, but are still within the reorder window. in such a case they will be inserted in the correct order. E.g. if the reorder window is 60 (datapoints) then points may be inserted at random order as long as their ts is not older than the 60th datapoint counting from the newest.tank.persist
:
how long it takes to persist a chunk (and chunks preceding it) this is subject to backpressure from the store when the store's queue runs fulltank.sample-too-far-ahead
:
count of points with a timestamp which is too far in the future, beyond the limitation of the future tolerance window defined via the retention.future-tolerance-ratio parameter. it also gets increased if the enforcement of the future tolerance is disabled, this is useful for predicting whether data points would get rejected once enforcement gets turned on.tank.total_points
:
the number of points currently held in the in-memory ringbufferversion.%s
:
the version of metrictank running. The metric value is always 1