Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s.pod.phase not providing correct info if my pod status is Crashbackoff look #33797

Open
abhishekmahajan0709222 opened this issue Jun 27, 2024 · 9 comments
Labels
bug Something isn't working receiver/k8scluster

Comments

@abhishekmahajan0709222
Copy link

Component(s)

receiver/k8scluster

What happened?

Description

My pods are in crash backoff look but it still showing as running status

Steps to Reproduc

Expected Result

It should give us phase if my pods are in crashbackoff look

Actual Result

Its giving us Running phase that's incorrect

Collector version

Latest(v0.103.0)

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@abhishekmahajan0709222 abhishekmahajan0709222 added bug Something isn't working needs triage New item requiring triage labels Jun 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@denmanveer
Copy link

This is affecting us as well, @dmitryax @TylerHelmuth @povilasv can anyone help please ?
Why pod in CrashLoopBackOff shown as running in metrics?
thank you

@abhishekmahajan0709222
Copy link
Author

@dmitryax @TylerHelmuth @povilasv

Is there any update on the issue ?

@povilasv
Copy link
Contributor

Hey, we map K8S pod.Status.Phase field to k8s.pod.phase metric, with the following mapping:

func phaseToInt(phase corev1.PodPhase) int32 {
	switch phase {
	case corev1.PodPending:
		return 1
	case corev1.PodRunning:
		return 2
	case corev1.PodSucceeded:
		return 3
	case corev1.PodFailed:
		return 4
	case corev1.PodUnknown:
		return 5
	default:
		return 5
	}
}

If it was showing running, then the pod at that time was running.

There is no pod status phase for crashloop back off. For this see this issue -> #32457

@abhishekmahajan0709222
Copy link
Author

@povilasv that's kind of passing wrong information

You can check in below image status is giving as crash back loop
image

But metrics is showing as Running
image

@povilasv
Copy link
Contributor

Could you paste the output of your kubectl get pod x -o yaml ?

It should have a "phase" field:

  hostIP: 172.18.0.2
  hostIPs:
  - ip: 172.18.0.2
  phase: Running
  podIP: 172.18.0.2
  podIPs:
  - ip: 172.18.0.2
  qosClass: Burstable
  startTime: "2024-08-20T05:22:28Z"

@povilasv
Copy link
Contributor

povilasv commented Aug 20, 2024

I think I found the issue. Basically K8s docs state this:

// PodStatus represents information about the status of a pod. Status may trail the actual
// state of a system, especially if the node that hosts the pod cannot contact the control
// plane.
type PodStatus struct {
	// The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle.
	// The conditions array, the reason and message fields, and the individual container status
	// arrays contain more detail about the pod's status.
	// There are five possible phase values:
	//
	// Pending: The pod has been accepted by the Kubernetes system, but one or more of the
	// container images has not been created. This includes time before being scheduled as
	// well as time spent downloading images over the network, which could take a while.
	// Running: The pod has been bound to a node, and all of the containers have been created.
	// At least one container is still running, or is in the process of starting or restarting.
	// Succeeded: All containers in the pod have terminated in success, and will not be restarted.
	// Failed: All containers in the pod have terminated, and at least one container has
	// terminated in failure. The container either exited with non-zero status or was terminated
	// by the system.
	// Unknown: For some reason the state of the pod could not be obtained, typically due to an
	// error in communicating with the host of the pod.
	//

I think the crash loop back off status fits into K8s "running" category:

Running: The pod has been bound to a node, and all of the containers have been created.
At least one container is still running, or is in the process of starting or restarting.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@genadipost
Copy link

Any update or workaround?

@github-actions github-actions bot removed the Stale label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/k8scluster
Projects
None yet
Development

No branches or pull requests

5 participants