-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker exec always returns 0 - ignores exit code of process #5692
Comments
@caglar10ur believes this may be fixed. Will re-try tomorrow, but leave the bug open |
not just a belief but also following is what I get with current master :)
|
@caglar10ur if I knew of a specific commit this was tied to, I'd have more confidence :) Seeing is believing |
Build 12207 (https://ci.vcna.io/vmware/vic/12207) is at addc8b1. |
@caglar10ur I built the latest master myself and got the same result. Tested on ESX and it works. Tested on vSphere and it doesn't. That's why we were seeing different results. |
Belief is this is tied to vSphere host sync delay. |
An attempt at a release note entry:
@corrieb @anchal-agrawal @mdubya66 is this OK? Thanks! |
@caglar10ur can you also please take a look at the release note above? |
@stuclem, some additional info: the If we are documenting this as a known issue, we should probably add something about the command-based health configuration as well. If you are not familiar with the feature, some information is available in the GitHub wiki. cc @sergiosagu |
Thanks @shadjiiski and @sergiosagu. I updated the Release Note as follows:
Is this OK? Do we need to include this in the Admiral 1.3.0 RNs too? |
@stuclem, thanks for the update, looks good. Yes, please update the Admiral release notes as well. |
This is now blocking a product go-live |
It seems like a user could wrap whatever command they want to run with logic to always print the return code via standard out. Then, whatever logic wants to check the command could look at the last line of This is inelegant, but it seems like it would unblock things in the short term. |
I just want to clarify that this is not going to resolve the healthcheck issue on the Admiral side without additional effort from the Admiral team (that was not planned for the upcoming release). Admiral checks the exit code of the comment and makes no use of its standard output. cc @lazarin, @martin-borisov Also, I am not sure if VCH now has an equivalent to the native Docker healthcheck, but if it does, I I think you might still hit the exact same issue there. According to the Docker docs the healthcheck also executes a command and checks its exit code. |
Running healthcheck via an exec is a terrible pattern for a cVM and comes with various overheads. It also means that healthcheck will not continue to run while the endpointVM is down. If integrating with vSphere HA it makes much more sense for the healthcheck process to be dispatched from within the cVM and tied in to application heartbeat support. In this case the healthcheck on the docker API side can then watch for health alerts instead of performing heavyweight polling. I would highly recommend some longevity/performance testing on the impact of dispatching an exec into a container every few minutes if Admiral is using this mechanism for health checking. IIRC we are not garbage collecting the exec configurations in any aggressive manner which means that list will grow significantly over time. Other than that testing I would not conflate this issue with healthcheck at all. |
I have a proof of concept fix for this bug stored locally. I'm going to push that to a branch today and @mavery will be taking the lead on turning that stopgap fix into a maintainable redesign of the exec flow. |
To move forward on this ticket the first step is to create a design doc for the life cycle of a process for the container. We must design a path forward for handling all what transitions a process takes in the tether/cvm. That is the first step that I am taking for addressing this ticket and I will verify(with @hickeng ) and then link that design here. From there we can look at the potential patch that should work, and design it is such a way to avoid creating more tech debt in a place where we really need to implement order to create stability. cc @mdubya66 @hickeng @gigawhitlocks |
Increasing estimate to 5 in order to account for design. |
User Statement:
As a container developer, I need to know the exit code of a process so that I can determine success or failure of the command I've run via docker exec.
Details:
Docker exec is the most obvious and cleanest way to test if a container has come up correctly. This is particularly important in multi-container deployments as you may need to block one container starting until another has come up.
Eg.
As it stands - with master build 12207 - docker exec always returns 0. This is not consistent with the behavior of docker engine. Makes no difference whether -it is used or not.
Obviously in the case of -d, we wouldn't expect this to work because docker exec exits before the process completes.
There is a simple testcase for this:
Acceptance Criteria:
It should be relatively easy to test for this. We should add a regression test based on the above
Making this high priority. Given that we haven't implemented health check, the ability to determine health by running docker exec is an important capability.
The text was updated successfully, but these errors were encountered: