Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results, TerminationMessage and Containers #4808

Open
1 of 4 tasks
vdemeester opened this issue Apr 28, 2022 · 6 comments
Open
1 of 4 tasks

Results, TerminationMessage and Containers #4808

vdemeester opened this issue Apr 28, 2022 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@vdemeester
Copy link
Member

vdemeester commented Apr 28, 2022

While looking into #4529, I stumbled on a relatively big shortcomings of our current implementation of Result.

The initial problem is highlighted in one of our customer's Task that write results. From one version to the other, the same exact task would fail to write the result. Digging a bit into the issue, the culprit commit was e6399ce, which naively made no sense at all. The only hint there is that we are adding a new initContainer.

In some doc around the TerminationMessage behaviour, we can read the following.

You can customize the terminationMessagePath field of a container for Kubernetes to use the content of the specified custom file to fulfill the termination message of the container when the container running process completes or fails. The maximum size of a termination message is 4 KB.

This is obviously what we document in Emitting Results. But there is more.

The total size of termination messages of all containers in a pod cannot exceed 12 KB. If the total size exceeds 12 KB, the state manager of Kubernetes sets a limit on the termination message sizes. For example, if a pod contains four InitContainers and eight application containers, the state manager limits the termination message of each container to 1 KB. This indicates that only the first 1 KB of the termination message of each container is intercepted.

Highlighted in bold is the reason of the failure. In a gist, the more container we have in our pod, the smaller the size of one container's result is. This is an issue because it means the maximum size of results depend on the number of container, meaning it also depend on the number of internal container (place-tools, …).

This might have an impact on some TEPs:

This also highlight why tektoncd/community#521 or at least some thinking around those are.

Next step for this particular issue could be:

  • Enhance the documentation around the limitation of Results (depending on containers, …)
  • Reduce the number of init container to the strict minimum
  • Error out more properly when the termination message is truncated. Today we error out by saying we didn't find the result, whereas we could probably detect that the message in json is invalid (because truncated)
  • Split results over multiple steps in case of multiple results — maybe by having results per step and TaskResults submiting all ?
@vdemeester vdemeester added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 28, 2022
@vdemeester vdemeester self-assigned this Apr 28, 2022
@vdemeester
Copy link
Member Author

/cc @tektoncd/core-maintainers @abayer @imjasonh

@vdemeester vdemeester added this to the Pipelines v0.36 milestone Apr 29, 2022
@imjasonh
Copy link
Member

That phrasing of the limit is surprising to me. The vanilla k8s docs say:

The termination message is intended to be brief final status, such as an assertion failure message. The kubelet truncates messages that are longer than 4096 bytes. The total message length across all containers will be limited to 12KiB. The default termination message path is /dev/termination-log. You cannot set the termination message path after a Pod is launched

...with no mention of how each container's message is limited.

My read of the k8s docs is that there's a 12KB total limit, but not that it's divided among containers in any way. It might be limited that way, but that's not documented.

That's not to say this isn't still a big issue that we should tackle soon (and plan to, AFAIK), just that maybe AlibabaCloud's docs are overly/incorrectly prescriptive about how that limitation is applied. Or maybe their platform does apply it that way, but vanilla k8s doesn't.

If someone tests this and we can determine that each container's message is limited to 12KB/${numContainers} on all k8s platforms, that certainly bumps the priority for fixing this.

@vdemeester
Copy link
Member Author

vdemeester commented Apr 29, 2022

If someone tests this and we can determine that each container's message is limited to 12KB/${numContainers} on all k8s platforms, that certainly bumps the priority for fixing this.

This is what is happening today and why I opened the bug. Adding an extra initContainer (even without any termination message setup, …) did reduce the size of the message for one container (and this is what happened in 0.32.0).

As a reproducer, see the following resources:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: generate-result
spec:
  params:
    - name: STRING_LENGTH
      description: Length of the string to create
    - name: STRING_CHAR
      description: Char to use when creating string
      type: string
      default: '.'
  results:
    - name: RESULT_STRING
      description: A result string
  steps:
    - name: gen-result
      image: bash:latest
      env:
        - name: PARAM_STRING_LENGTH
          value: $(params.STRING_LENGTH)
        - name: PARAM_STRING_CHAR
          value: $(params.STRING_CHAR)
      script: |
        #! /usr/bin/bash
        set -e
        len=$PARAM_STRING_LENGTH
        ch=$PARAM_STRING_CHAR
        printf '%*s' "$len" | tr ' ' "$ch" >>  $(results.RESULT_STRING.path)
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: print-result
spec:
  params:
    - name: TO_PRINT
      type: string
  steps:
    - name: print-result
      image: bash:latest
      env:
        - name: PARAM_TO_PRINT
          value: $(params.TO_PRINT)
      script: |
        #! /usr/bin/bash
        set -e
        echo $PARAM_TO_PRINT
---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  labels:
    app: sample
  name: result-test
spec:
  params:
  - name: RESULT_STRING_LENGTH
    description: Length of string to generate for generate-result task
  - name: RESULT_STRING_CHAR
    description: Char to repeat in result string
    default: '.'
  tasks:
  - name: generate-result
    params:
    - name: STRING_LENGTH
      value: $(params.RESULT_STRING_LENGTH)
    - name: STRING_CHAR
      value: $(params.RESULT_STRING_CHAR)
    taskRef:
      kind: Task
      name: generate-result
  - name: print-result
    params:
    - name: TO_PRINT
      value: $(tasks.generate-result.results.RESULT_STRING)
    taskRef:
      kind: Task
      name: print-result

Running the following pipeline with a RESULT_STRING_LENGTH of 3000 (~3K) does fail on >= 0.32, but suceeds before.

tkn pipeline start result-test -p RESULT_STRING_LENGTH=3000 --use-param-defaults

@imjasonh
Copy link
Member

Oh I see, thanks for clarifying. In that case, it sounds like we should update the k8s docs so folks don't trip over this later. I'm sure I'll forget and need a reminder 😅

This also definitely bumps up the priority for larger results and getting off terminationMessages.

@dibyom dibyom removed this from the Pipelines v0.43 milestone Nov 29, 2022
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Nov 30, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Dec 1, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Dec 1, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
jerop added a commit to jerop/community that referenced this issue Dec 1, 2022
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].

The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.

[4012]: tektoncd/pipeline#4012
[4808]: tektoncd/pipeline#4808
[tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2023
@vdemeester
Copy link
Member Author

/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

No branches or pull requests

7 participants