-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact output metrics #4167
Comments
Could you explain this more extensively, please? @jgaalen What do you understand by "a single metric per requests, which holds all the data such as the data"? I'd love to see some concrete examples about how that data would look like in plaintext. Thanks! 🙇🏻 |
Ok, I've just did a single request to 'www.google.com' to show what is stored: <style> </style>
for k6/http, these are 9 separate writes (either CSV, influxdb or probably other output writes as well), all at the same timestamp for the same request. It would safe incredible more resources, if this would be just a single line. Keep the same tags, but as some as values If you would output it like this, it would be a big safer in resources (especially influxdb which can be heavily used due to all the writes): <style> </style>
Perhaps some more columns/values, such as: sent bytes, received bytes (per request) |
Hey @jgaalen, Thanks for your detailed explanation, that makes more sense now. However, I think what you're suggesting, which is basically having multiple values (not tags/labels) for a single measurement/sample, is not something that can be generalized, cause it's very specific of the InfluxDB metrics model. Please, note the importance to distinguish between a value and a tag/label here, especially in the context of TSDBs, because tag/labels are normally used for filtering data, and values to do the actual calculus (percentiles, averages, means, etc). In comparison, most of the widely used models, like Prometheus, OpenMetrics or OpenTelemetry only allow a single value for each measurement/sample, and there's no way to apply the pattern you're suggesting. Well, yes, it could be "doable" by storing the other values as tag/label values, but that would make them useless, which makes no sense. Not to mention the explosion of cardinality that would represent. That said, I think what you're suggesting is likely something that could be interesting to explore for the InfluxDB output, but as we mentioned previously, we're no longer maintaining the InfluxDB output, but most likely leave it open for the community to maintain it. Finally, there's possibly a way to get benefit of part of your idea to make the model k6 uses to internally store metrics in a more efficient way, but we're still doing some research there. We'll take it into account, but generally speaking our path forward is to be compliant with Prometheus and/or move towards OpenTelemetry. Anyway, thanks for bringing your idea, and feel free to pursue that kind of improvement for the InfluxDB extension. |
Thank you for your explanation. I understand that some TSDB's only allow a single value per unique tag-combo. Perhaps, the output code, could somehow bundle all the values in a single object. Then it is up to the output writer to have single tag-combo's per value, or combine them. So CSV, Influx, Timescaled can leverage on combined values for resource reduction (imagine how much network, storage, cpu and memory is saved, as well as speed up queries if you can combine values). |
There's always a compromise, and speaking of flexibility, having a simpler model is likely more flexible. Speaking of resource utilization, most of the metrics are aggregated when stored in memory, by k6, except for Finally, if your concern is network, you can do that simple aggregation/transformation on the extension side, which still runs on the k6 side, before metrics are emitted to the backend (e.g. Influx). If all the values come from the same event (e.g. HTTP request), they will likely arrive together through the samples channel, or very close in time, so you would only need to hold them in memory for a very short period of time, and you could flush them very frequently. |
Currently, K6 stores way too many values per single request to influxdb or whatever output you use causing a very high overhead in terms of network bandwidth, cpu and storage. Per request it stores 4 lines (data sent, received, duration and failed).
It would be incredibly more efficient to be able to store a single metric per requests, which holds all the data such as the data sent, received, status (failed true/false), duration and url, etc..
The text was updated successfully, but these errors were encountered: