Compact output metrics #4167

jgaalen · 2025-01-09T06:51:24Z

Currently, K6 stores way too many values per single request to influxdb or whatever output you use causing a very high overhead in terms of network bandwidth, cpu and storage. Per request it stores 4 lines (data sent, received, duration and failed).

It would be incredibly more efficient to be able to store a single metric per requests, which holds all the data such as the data sent, received, status (failed true/false), duration and url, etc..

inancgumus · 2025-01-09T15:00:28Z

Hi @jgaalen, this issue sounds similar to #1321. Can you confirm?

joanlopez · 2025-01-09T15:14:04Z

It would be incredibly more efficient to be able to store a single metric per requests, which holds all the data such as the data sent, received, status (failed true/false), duration and url, etc..

Could you explain this more extensively, please? @jgaalen

What do you understand by "a single metric per requests, which holds all the data such as the data"?

I'd love to see some concrete examples about how that data would look like in plaintext.

Thanks! 🙇🏻

jgaalen · 2025-01-10T08:41:45Z

It would be incredibly more efficient to be able to store a single metric per requests, which holds all the data such as the data sent, received, status (failed true/false), duration and url, etc..

Could you explain this more extensively, please? @jgaalen

What do you understand by "a single metric per requests, which holds all the data such as the data"?

I'd love to see some concrete examples about how that data would look like in plaintext.

Thanks! 🙇🏻

Ok, I've just did a single request to 'www.google.com' to show what is stored:

metric_name	timestamp	metric_value	expected_response	method	name	proto	scenario	status	tls_version	url
http_reqs	1736496755038080000	1	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_duration	1736496755038080000	64,557	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_blocked	1736496755038080000	0	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_connecting	1736496755038080000	0	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_tls_handshaking	1736496755038080000	0	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_sending	1736496755038080000	0,03	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_waiting	1736496755038080000	63,441	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_receiving	1736496755038080000	1,086	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com
http_req_failed	1736496755038080000	0	TRUE	GET	https://www.google.com	HTTP/2.0	sample	200	TLS1.3	https://www.google.com

for k6/http, these are 9 separate writes (either CSV, influxdb or probably other output writes as well), all at the same timestamp for the same request. It would safe incredible more resources, if this would be just a single line. Keep the same tags, but as some as values

If you would output it like this, it would be a big safer in resources (especially influxdb which can be heavily used due to all the writes):

type	timestamp	group	scenario	service	name	method	url	http_req_duration	http_req_blocked	http_req_connecting	http_req_tls_handshaking	http_req_sending	http_req_waiting	http_req_receiving	http_req_success
http_req	1736496755038080000					GET	https://www.google.com	64,557	0	0	0	0,03	63,44	1,086	TRUE

Perhaps some more columns/values, such as: sent bytes, received bytes (per request)

joanlopez · 2025-01-10T09:15:19Z

Hey @jgaalen,

Thanks for your detailed explanation, that makes more sense now.

However, I think what you're suggesting, which is basically having multiple values (not tags/labels) for a single measurement/sample, is not something that can be generalized, cause it's very specific of the InfluxDB metrics model.

Please, note the importance to distinguish between a value and a tag/label here, especially in the context of TSDBs, because tag/labels are normally used for filtering data, and values to do the actual calculus (percentiles, averages, means, etc).

In comparison, most of the widely used models, like Prometheus, OpenMetrics or OpenTelemetry only allow a single value for each measurement/sample, and there's no way to apply the pattern you're suggesting. Well, yes, it could be "doable" by storing the other values as tag/label values, but that would make them useless, which makes no sense. Not to mention the explosion of cardinality that would represent.

That said, I think what you're suggesting is likely something that could be interesting to explore for the InfluxDB output, but as we mentioned previously, we're no longer maintaining the InfluxDB output, but most likely leave it open for the community to maintain it.

Finally, there's possibly a way to get benefit of part of your idea to make the model k6 uses to internally store metrics in a more efficient way, but we're still doing some research there. We'll take it into account, but generally speaking our path forward is to be compliant with Prometheus and/or move towards OpenTelemetry.

Anyway, thanks for bringing your idea, and feel free to pursue that kind of improvement for the InfluxDB extension.

jgaalen · 2025-01-10T09:21:19Z

Hey @jgaalen,

Thanks for your detailed explanation, that makes more sense now.

However, I think what you're suggesting, which is basically having multiple values (not tags/labels) for a single measurement/sample, is not something that can be generalized, cause it's very specific of the InfluxDB metrics model.

Please, note the importance to distinguish between a value and a tag/label here, especially in the context of TSDBs, because tag/labels are normally used for filtering data, and values to do the actual calculus (percentiles, averages, means, etc).

In comparison, most of the widely used models, like Prometheus, OpenMetrics or OpenTelemetry only allow a single value for each measurement/sample, and there's no way to apply the pattern you're suggesting. Well, yes, it could be "doable" by storing the other values as tag/label values, but that would make them useless, which makes no sense. Not to mention the explosion of cardinality that would represent.

That said, I think what you're suggesting is likely something that could be interesting to explore for the InfluxDB output, but as we mentioned previously, we're no longer maintaining the InfluxDB output, but most likely leave it open for the community to maintain it.

Finally, there's possibly a way to get benefit of part of your idea to make the model k6 uses to internally store metrics in a more efficient way, but we're still doing some research there. We'll take it into account, but generally speaking our path forward is to be compliant with Prometheus and/or move towards OpenTelemetry.

Anyway, thanks for bringing your idea, and feel free to pursue that kind of improvement for the InfluxDB extension.

Thank you for your explanation. I understand that some TSDB's only allow a single value per unique tag-combo.

Perhaps, the output code, could somehow bundle all the values in a single object. Then it is up to the output writer to have single tag-combo's per value, or combine them. So CSV, Influx, Timescaled can leverage on combined values for resource reduction (imagine how much network, storage, cpu and memory is saved, as well as speed up queries if you can combine values).

joanlopez · 2025-01-10T09:32:11Z

imagine how much network, storage, cpu and memory is saved, as well as speed up queries if you can combine values

There's always a compromise, and speaking of flexibility, having a simpler model is likely more flexible.

Speaking of resource utilization, most of the metrics are aggregated when stored in memory, by k6, except for Trends (i.e. histograms), for which as I said we're already exploring more efficient ways to store them, although it has low priority on our backlog. And even in such case, the main problem is the amount of values to store rather than the structure.

Finally, if your concern is network, you can do that simple aggregation/transformation on the extension side, which still runs on the k6 side, before metrics are emitted to the backend (e.g. Influx). If all the values come from the same event (e.g. HTTP request), they will likely arrive together through the samples channel, or very close in time, so you would only need to hold them in memory for a very short period of time, and you could flush them very frequently.

github-actions bot added the triage label Jan 9, 2025

github-actions bot assigned inancgumus Jan 9, 2025

inancgumus removed their assignment Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact output metrics #4167

Compact output metrics #4167

jgaalen commented Jan 9, 2025

inancgumus commented Jan 9, 2025

joanlopez commented Jan 9, 2025

jgaalen commented Jan 10, 2025 •

edited

Loading

joanlopez commented Jan 10, 2025

jgaalen commented Jan 10, 2025

joanlopez commented Jan 10, 2025

Compact output metrics #4167

Compact output metrics #4167

Comments

jgaalen commented Jan 9, 2025

inancgumus commented Jan 9, 2025

joanlopez commented Jan 9, 2025

jgaalen commented Jan 10, 2025 • edited Loading

joanlopez commented Jan 10, 2025

jgaalen commented Jan 10, 2025

joanlopez commented Jan 10, 2025

jgaalen commented Jan 10, 2025 •

edited

Loading