Docker exec always returns 0 - ignores exit code of process #5692

corrieb · 2017-07-11T20:08:46Z

User Statement:

As a container developer, I need to know the exit code of a process so that I can determine success or failure of the command I've run via docker exec.

Details:

Docker exec is the most obvious and cleanest way to test if a container has come up correctly. This is particularly important in multi-container deployments as you may need to block one container starting until another has come up.

Eg.

while true; do
   docker exec -it db mysqladmin --user=$DB_USER --password=$DB_PASSWORD version
   if [ $? -eq 0 ]; then
      break
   fi
   sleep 5
done

As it stands - with master build 12207 - docker exec always returns 0. This is not consistent with the behavior of docker engine. Makes no difference whether -it is used or not.

Obviously in the case of -d, we wouldn't expect this to work because docker exec exits before the process completes.

There is a simple testcase for this:

$ more Dockerfile 
FROM debian
COPY exit.sh /

$ more exit.sh 
#!/bin/bash
echo returning $1
exit $1

$ docker run --name test -d bensdoings/exit sleep 1000
b7b22cfb2271ee4d0e8b398b7df52869b7e791eddaf54913fc7367d5b1312722
$ docker exec test /exit.sh 5
returning 5
$ echo $?
0

Acceptance Criteria:

It should be relatively easy to test for this. We should add a regression test based on the above

Making this high priority. Given that we haven't implemented health check, the ability to determine health by running docker exec is an important capability.

The text was updated successfully, but these errors were encountered:

corrieb · 2017-07-11T20:55:03Z

@caglar10ur believes this may be fixed. Will re-try tomorrow, but leave the bug open

caglar10ur · 2017-07-11T21:41:54Z

not just a belief but also following is what I get with current master :)

[vagrant@devbox:/opt/go/src/github.com/vmware/vic(master)] docker exec -it a sh -c 'exit 4'
[vagrant@devbox:/opt/go/src/github.com/vmware/vic(master)] echo $?
4
[vagrant@devbox:/opt/go/src/github.com/vmware/vic(master)]

corrieb · 2017-07-11T22:08:15Z

@caglar10ur if I knew of a specific commit this was tied to, I'd have more confidence :) Seeing is believing

anchal-agrawal · 2017-07-11T22:14:15Z

Build 12207 (https://ci.vcna.io/vmware/vic/12207) is at addc8b1.

corrieb · 2017-07-12T18:33:33Z

@caglar10ur I built the latest master myself and got the same result. Tested on ESX and it works. Tested on vSphere and it doesn't. That's why we were seeing different results.

mdubya66 · 2017-08-02T18:31:47Z

Belief is this is tied to vSphere host sync delay.

stuclem · 2017-09-12T11:46:54Z

An attempt at a release note entry:

docker exec always returns 0 and ignores the exit code of processes. #5692
docker exec always returns 0, even if you specify -it. This is potentially due to a delay in vSphere host synchronization.

@corrieb @anchal-agrawal @mdubya66 is this OK? Thanks!

stuclem · 2017-09-18T15:25:52Z

@caglar10ur can you also please take a look at the release note above?

shadjiiski · 2018-01-22T13:28:20Z

@stuclem, some additional info: the docker exec functionality is exposed in Admiral in the form of a health configuration for a command-based healthcheck. More technically, a user-defined command is executed in the container and the healthcheck action is successful if the exit code of that command is 0. Because of this issue, command-based healthcheck is always successful for containers provisioned on affected VCH hosts, even if the user-specified command does not exist in the scope of the container.

If we are documenting this as a known issue, we should probably add something about the command-based health configuration as well. If you are not familiar with the feature, some information is available in the GitHub wiki. cc @sergiosagu

stuclem · 2018-01-23T09:36:28Z

Thanks @shadjiiski and @sergiosagu. I updated the Release Note as follows:

docker exec always returns 0 and ignores the exit code of processes. #5692
docker exec always returns 0, even if you specify -it. This issue is potentially due to a delay in vSphere host synchronization. If you configure command-based health checks in vSphere Integrated Containers Management Portal, the health checks are always successful for containers that are provisioned on affected VCHs, even if the user-specified command does not exist in the scope of the container. This is because command-based health checks are considered to be successful if the exit code of that command is 0.

Is this OK? Do we need to include this in the Admiral 1.3.0 RNs too?

shadjiiski · 2018-01-23T09:48:59Z

@stuclem, thanks for the update, looks good. Yes, please update the Admiral release notes as well.

stuclem · 2018-01-23T10:00:31Z

@shadjiiski, done: https://github.com/vmware/admiral/releases/tag/vic_v1.3.0

Thanks!

hickeng · 2018-04-26T09:15:15Z

This is now blocking a product go-live

zjs · 2018-04-27T21:14:23Z

This is now blocking a product go-live

It seems like a user could wrap whatever command they want to run with logic to always print the return code via standard out. Then, whatever logic wants to check the command could look at the last line of exec's standard out instead of exec's RC.

This is inelegant, but it seems like it would unblock things in the short term.

shadjiiski · 2018-05-02T08:36:11Z

It seems like a user could wrap whatever command they want to run with logic to always print the return code via standard out

I just want to clarify that this is not going to resolve the healthcheck issue on the Admiral side without additional effort from the Admiral team (that was not planned for the upcoming release). Admiral checks the exit code of the comment and makes no use of its standard output. cc @lazarin, @martin-borisov

Also, I am not sure if VCH now has an equivalent to the native Docker healthcheck, but if it does, I I think you might still hit the exact same issue there. According to the Docker docs the healthcheck also executes a command and checks its exit code.

hickeng · 2018-05-03T08:30:33Z

Running healthcheck via an exec is a terrible pattern for a cVM and comes with various overheads. It also means that healthcheck will not continue to run while the endpointVM is down.

If integrating with vSphere HA it makes much more sense for the healthcheck process to be dispatched from within the cVM and tied in to application heartbeat support. In this case the healthcheck on the docker API side can then watch for health alerts instead of performing heavyweight polling.

I would highly recommend some longevity/performance testing on the impact of dispatching an exec into a container every few minutes if Admiral is using this mechanism for health checking. IIRC we are not garbage collecting the exec configurations in any aggressive manner which means that list will grow significantly over time. Other than that testing I would not conflate this issue with healthcheck at all.

gigawhitlocks · 2018-05-09T18:11:44Z

I have a proof of concept fix for this bug stored locally. I'm going to push that to a branch today and @mavery will be taking the lead on turning that stopgap fix into a maintainable redesign of the exec flow.

matthewavery · 2018-05-09T18:21:02Z

To move forward on this ticket the first step is to create a design doc for the life cycle of a process for the container. We must design a path forward for handling all what transitions a process takes in the tether/cvm. That is the first step that I am taking for addressing this ticket and I will verify(with @hickeng ) and then link that design here. From there we can look at the potential patch that should work, and design it is such a way to avoid creating more tech debt in a place where we really need to implement order to create stability. cc @mdubya66 @hickeng @gigawhitlocks

sgairo · 2018-05-09T20:38:25Z

Increasing estimate to 5 in order to account for design.

corrieb added component/portlayer/execution component/tether kind/defect Behavior that is inconsistent with what's intended priority/p0 labels Jul 11, 2017

corrieb changed the title ~~Docker exec always returns 0 - ignores exit code of process~~ Docker exec always returns 0 - ignores exit code of process (works on ESX, not vSphere) Jul 12, 2017

corrieb changed the title ~~Docker exec always returns 0 - ignores exit code of process (works on ESX, not vSphere)~~ Docker exec always returns 0 - ignores exit code of process Jul 12, 2017

mdubya66 added impact/doc/note Requires creation of or changes to an official release note priority/p2 and removed priority/p0 labels Aug 2, 2017

stuclem closed this as completed Sep 13, 2017

stuclem reopened this Sep 13, 2017

stuclem removed the impact/doc/note Requires creation of or changes to an official release note label Oct 3, 2017

hickeng added source/customer Reported by a customer, directly or via an intermediary priority/p0 and removed priority/p2 labels Apr 26, 2018

gigawhitlocks self-assigned this Apr 26, 2018

anchal-agrawal added this to the Sprint 31 Container Ops milestone Apr 26, 2018

sgairo modified the milestones: Sprint 31 Container Ops, Sprint 32 Container Ops May 8, 2018

sgairo added the team/foundation label May 9, 2018

sgairo modified the milestones: Sprint 32 Container Ops, Sprint 32 Foundation May 9, 2018

sgairo unassigned gigawhitlocks May 9, 2018

matthewavery self-assigned this May 9, 2018

This was referenced May 18, 2018

WIP Fix incorrect exit codes from Exec #7965

Closed

Propagate exit code to client for docker exec and vCenter #7969

Merged

sgairo modified the milestones: Sprint 32 Foundation, Sprint 33 Foundation May 23, 2018

matthewavery closed this as completed in #7969 Jun 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker exec always returns 0 - ignores exit code of process #5692

Docker exec always returns 0 - ignores exit code of process #5692

corrieb commented Jul 11, 2017 •

edited

Loading

corrieb commented Jul 11, 2017

caglar10ur commented Jul 11, 2017

corrieb commented Jul 11, 2017 •

edited

Loading

anchal-agrawal commented Jul 11, 2017

corrieb commented Jul 12, 2017

mdubya66 commented Aug 2, 2017

stuclem commented Sep 12, 2017

stuclem commented Sep 18, 2017

shadjiiski commented Jan 22, 2018

stuclem commented Jan 23, 2018

shadjiiski commented Jan 23, 2018

stuclem commented Jan 23, 2018

hickeng commented Apr 26, 2018

zjs commented Apr 27, 2018 •

edited

Loading

shadjiiski commented May 2, 2018

hickeng commented May 3, 2018

gigawhitlocks commented May 9, 2018

matthewavery commented May 9, 2018

sgairo commented May 9, 2018

Docker exec always returns 0 - ignores exit code of process #5692

Docker exec always returns 0 - ignores exit code of process #5692

Comments

corrieb commented Jul 11, 2017 • edited Loading

corrieb commented Jul 11, 2017

caglar10ur commented Jul 11, 2017

corrieb commented Jul 11, 2017 • edited Loading

anchal-agrawal commented Jul 11, 2017

corrieb commented Jul 12, 2017

mdubya66 commented Aug 2, 2017

stuclem commented Sep 12, 2017

stuclem commented Sep 18, 2017

shadjiiski commented Jan 22, 2018

stuclem commented Jan 23, 2018

shadjiiski commented Jan 23, 2018

stuclem commented Jan 23, 2018

hickeng commented Apr 26, 2018

zjs commented Apr 27, 2018 • edited Loading

shadjiiski commented May 2, 2018

hickeng commented May 3, 2018

gigawhitlocks commented May 9, 2018

matthewavery commented May 9, 2018

sgairo commented May 9, 2018

corrieb commented Jul 11, 2017 •

edited

Loading

corrieb commented Jul 11, 2017 •

edited

Loading

zjs commented Apr 27, 2018 •

edited

Loading