forked from tektoncd/community
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
TEP-0127: Larger Results via Sidecar Logs
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
- Loading branch information
Showing
2 changed files
with
221 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
--- | ||
status: proposed | ||
title: Larger Results via Sidecar Logs | ||
creation-date: '2022-11-30' | ||
last-updated: '2022-11-30' | ||
authors: | ||
- '@chitrangpatel' | ||
- '@jerop' | ||
--- | ||
|
||
# TEP-0127: Larger Results via Sidecar Logs | ||
|
||
<!-- toc --> | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Use Cases](#use-cases) | ||
- [Proposal](#proposal) | ||
- [Feature Gate](#feature-gate) | ||
- [Logs Access](#logs-access) | ||
- [Size Limit](#size-limit) | ||
- [Script in `Sidecar`](#script-in-sidecar) | ||
- [`PipelineRun` Status](#pipelinerun-status) | ||
- [Design Evaluation](#design-evaluation) | ||
- [Reusability](#reusability) | ||
- [Simplicity](#simplicity) | ||
- [Performance](#performance) | ||
- [Security](#security) | ||
- [References](#references) | ||
<!-- /toc --> | ||
|
||
This TEP builds on the hard work of many people who have been tackling the problem over the past couple years, | ||
including but not limited to: | ||
- '@tlawrie' | ||
- '@imjasonh' | ||
- '@bobcatfish' | ||
- '@pritidesai' | ||
- '@tomcli' | ||
- '@ScrapCodes' | ||
- '@wlynch' | ||
|
||
## Summary | ||
|
||
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. | ||
|
||
The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many | ||
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. | ||
This allows us to support larger `Results` that are stored within `TaskRun` CRDs. | ||
|
||
## Motivation | ||
|
||
`Results` are too small - see [issue][4012]. The current implementation of `Results` involves parsing from disk and | ||
storing as part of the `Termination Message` which has a limit of 4KB per `Container` and 12KB per `Pod`. As such, | ||
the size limit of `Results` is 12KB per `TaskRun` and 4KB per `Step` at best. | ||
|
||
To make matters worse, the limit is divided equally among all `Containers` in a `Pod` - see [issue][4808]. Therefore, | ||
the more the `Steps` in a `Task` the less the size limit for `Results`. For example, if there are 12 `Steps` then each | ||
has 1KB in `Termination Message` storage to produce `Results`. | ||
|
||
[TEP-0086][tep-0086] aims to support larger `Results`. [TEP-0086][tep-0086] has many [alternatives][tep-0086-alts] with | ||
no proposal because there's no obvious "best" solution that would meet all the requirements. | ||
|
||
This TEP proposes experimenting with `Sidecar` logs to support larger `Results` that are stored with `TaskRun` CRDs. | ||
This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other | ||
alternatives. | ||
|
||
### Goals | ||
|
||
The main goal is to support larger `Results` that are limited by the size of a `TaskRun` CRD (1.5MB) via `Sidecar` logs. | ||
|
||
### Non-Goals | ||
|
||
The following are out of scope for this TEP: | ||
|
||
1. Solving use cases that requires really large `Results` beyond the size limit of a CRD (1.5MB). | ||
|
||
2. Addressing other [alternatives][tep-0086-alts] for larger `Results` that are listed in [TEP-0086][tep-0086]. | ||
|
||
### Use Cases | ||
|
||
1. Store larger `Results`, such as JSON payloads from an HTTP call or a Cloud Event, that can be inspected later or | ||
passed to other `Tasks` without the need to understand storage mechanisms. | ||
|
||
3. The [documented guidance][docs] is that `Results` are used for outputs less than 1KB while `Workspaces` are used | ||
for larger data. | ||
|
||
> As a general rule-of-thumb, if a `Result` needs to be larger than a kilobyte, you should likely use a `Workspace` | ||
to store and pass it between `Tasks` within a `Pipeline`. | ||
|
||
Supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without having to | ||
change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`. | ||
|
||
## Proposal | ||
|
||
We propose using stdout logs from a dedicated `Sidecar` to return a json `Result` object to support larger `Results`. | ||
The `Pipeline` controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and | ||
append `Results` to the `TaskRun`. | ||
|
||
We propose injecting the dedicated `Sidecar` alongside other `Steps`. The `Sidecar` will watch the `Results` paths of | ||
the `Steps`. This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The | ||
`TaskRun` controller will access the stdout logs of the `Sidecar` then extract the `Results` and its contents. | ||
|
||
After the `Steps` have terminated, the `TaskRun` controller will write the `Results` from the logs of the `Sidecar` | ||
instead of using the `Termination Message`, hence bypassing the 4KB limit. This method keeps the rest of the current | ||
functionality identical and does not require any external storage mechanism. | ||
|
||
For further details, see the [demonstration][demo] and [proof of concept][poc]. | ||
|
||
This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we continue to | ||
explore other alternatives, including those that involve external storage. | ||
|
||
### Feature Gate | ||
|
||
This feature will be gated using a `result-extraction-method` feature flag. | ||
|
||
Users have to set `result-extraction-method` to `"sidecar-logs"` to enable the feature: | ||
```shell | ||
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"result-extraction-method":"sidecar-logs"}}' | ||
``` | ||
|
||
This feature is disabled by default. When disabled, `Results` continue to pass through `Termination Message`. | ||
|
||
### Logs Access | ||
|
||
This feature requires that the `Pipeline` controller has access to `Pod` logs. | ||
|
||
Users have to grant `get` access to all `pods/log` to the `Pipeline` controller: | ||
```shell | ||
kubectl apply -f config/enable-log-access-to-controller/ | ||
``` | ||
|
||
### Size Limit | ||
|
||
The size limit per `Result` can be `max-result-size` feature flag, which takes the integer value of the bytes. | ||
|
||
The `max-result-size` feature flag defaults to 4096 bytes. This ensures that we support existing `Tasks` with only one | ||
`Result` that uses up the whole size limit of the `Termination Message`. | ||
|
||
If users want to set the size limit per `Result` to be something other than 4096 bytes, they can set `max-result-size` | ||
by setting `max-result-size: <VALUE-IN-BYTES>`. The value set here cannot exceed the CRD size limit of 1.5MB. | ||
|
||
```shell | ||
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}' | ||
``` | ||
|
||
Even though the size limit per `Result` is configurable, the size of `Results` are limited by CRD size limit of 1.5MB. | ||
If the size of `Results` exceeds this limit, then the `TaskRun` will fail with a message indicating the size limit has | ||
been exceeded. | ||
|
||
### Script in `Sidecar` | ||
|
||
The `Sidecar` will run a script that: | ||
- is auto-generated based on the required `Results` - identified from `taskSpec.results` field. | ||
- periodically checks for files in the directory `/tekton/results`. | ||
- when a `Result` is found, it prints it to stdout in a parsable pattern. | ||
- when all the expected `Results` are found, it breaks out of the periodic loop and exits. | ||
|
||
For further details, see the [demonstration][demo] of the proof of concept. | ||
|
||
### `PipelineRun` Status | ||
|
||
In [TEP-0100][tep-0100], we proposed changes to `PipelineRun` status to reduce the amount of information stored about | ||
the status of `TaskRuns` and `Runs` to improve performance and reduce memory bloat. The `PipelineRun` status is set up | ||
to handle larger `Results` without exacerbating the performance and storage issues that were there before. | ||
|
||
For `ChildReferences` to be populated, the `embedded-status` must be set to `"minimal"`. We recommend that the minimal | ||
embedded status - `ChildReferences` - is enabled while migration is ongoing until it becomes the only supported status. | ||
|
||
## Design Evaluation | ||
|
||
### Reusability | ||
|
||
This proposal does not introduce any API changes to specification `Results`, the changes are in implementation details | ||
of `Results`. The existing `Tasks` will continue to function as they are, only that they can support larger `Results`. | ||
|
||
Even more, this supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without | ||
having to change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`. | ||
This allows users to control execution as needed by their context without having to modify `Tasks` and `Pipelines`. | ||
|
||
### Simplicity | ||
|
||
This proposal provides a simple solution that solves most use cases: | ||
- Users don't need additional infrastructure, such as server or object storage, to support larger `Results`. | ||
- Existing `Tasks` will continue to function as they are, while supporting larger `Results`, without any API changes. | ||
|
||
### Performance | ||
|
||
Performance benchmarking with 20-30 `PipelineRuns`, each with 3 `TaskRuns` each with two `Steps`: | ||
- Average `Pipeline` controller's CPU difference during `PipelineRun` execution: 1% | ||
- Average `Pipeline` controller's Memory usage difference during `PipelineRun` execution: 0.2% | ||
- Average `Pod` startup time (time to get to running state) difference: 3s per `TaskRun` | ||
|
||
In the experiment, we will continue to measure the startup overhead and explore ways to improve it. | ||
|
||
For further details, see the [performance metrics][performance]. | ||
|
||
### Security | ||
|
||
This approach requires that the `Pipeline` controller has access to `Pod` logs. The `Pipeline` controller already has | ||
extensive permissions in the cluster, such as read access to `Secrets`. Expanding the access even further is a concern | ||
for some users, but is also acceptable for some users given the advantages. We will document the extended permissions | ||
so that users can make the right choice for their own use cases and requirements. | ||
|
||
## References | ||
|
||
- [TEP-0086: Larger Results][tep-0086] | ||
- [TEP-0086: Larger Results via Sidecar Logs][745] | ||
|
||
[docs]: https://tekton.dev/docs/pipelines/tasks/#emitting-results | ||
[4012]: https://github.com/tektoncd/pipeline/issues/4012 | ||
[4808]: https://github.com/tektoncd/pipeline/issues/4808 | ||
[tep-0086]: ./0086-changing-the-way-result-parameters-are-stored.md | ||
[tep-0086-alts]: ./0086-changing-the-way-result-parameters-are-stored.md#alternatives | ||
[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292 | ||
[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md | ||
[demo]: https://drive.google.com/file/d/1NrWudE_XBqweomiY24DP2Txnl1yN0yD9/view | ||
[poc]: https://github.com/tektoncd/pipeline/pull/5695 | ||
[performance]: https://github.com/tektoncd/community/pull/745#issuecomment-1206668381 | ||
[745]: https://github.com/tektoncd/community/pull/745 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters