Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker does not unmount on mount failure if -w option is passed with docker run. #22564

Closed
kerneltime opened this issue May 6, 2016 · 12 comments

Comments

@kerneltime
Copy link
Contributor

Output of docker version:

On VM1

root@photon-machine [ ~ ]# docker version
Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 19:36:04 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 19:36:04 2016
 OS/Arch:      linux/amd64

On VM2

docker version
Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64

Output of docker info:
On VM1

docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.0
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local vmdk
 Network: null host bridge
Kernel Version: 4.2.0-esx
Operating System: VMware Photon/Linux
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.958 GiB
Name: photon-machine
ID: YODH:WTQL:A5ZT:446C:PGNT:FLPE:G3W5:WIE4:VM5W:HLNZ:ALJM:RKW5
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 15
 Goroutines: 32
 System Time: 2016-05-06T22:48:32.647615591Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support

On VM2

docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.10.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 4
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local vmdk
 Network: host bridge null
Kernel Version: 4.2.0-27-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.789 GiB
Name: promc-2n-dhcp210
ID: B2D6:WWS6:M3LR:BZJL:FBCO:K7S4:PYVW:VN2K:WFFN:BXL2:T3WZ:LWQB
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
On VM1

root@photon-machine [ ~ ]# uname -a
Linux photon-machine 4.2.0-esx #1-photon SMP Fri Apr 22 22:28:32 UTC 2016 x86_64 GNU/Linux
root@photon-machine [ ~ ]# cat /etc/os-release 
NAME="VMware Photon"
VERSION="1.0"
ID=photon
VERSION_ID=1.0
PRETTY_NAME="VMware Photon/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"

VM2

root@210:~# cat /etc/os-release 
NAME="Ubuntu"
VERSION="14.04.4 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.4 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
root@210:~# uname -a
Linux promc-2n-dhcp210 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@210:~# 

Steps to reproduce the issue:

  1. Create a volume and consume it using
    docker volume create --driver= --name=datavol -o size=10gb
    docker run -it --rm -v datavol:/data -w /data busybox
  2. When docker tries to mount via the plugin return an error
  3. Unmount request is expected but never received.

Describe the results you received:
Command:

docker run  --rm -v radio2016:/data -w /data busybox cat /data/file
docker: Error response from daemon: VolumeDriver.Mount: Failed to connect to ESX over vsocket.
See 'docker run --help'.

Logs:

May 06 20:07:22 photon-machine docker[2095]: time="2016-05-06T20:07:22.226977123Z" level=debug msg="Calling POST /v1.23/containers/create"
May 06 20:07:22 photon-machine docker[2095]: time="2016-05-06T20:07:22.227550840Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"cat\",\"/data/file\"],\"Domainname\":\"\",\"Entrypoint\":null,\"Env\":[],\"HostConfig\":{\"AutoRemove\":false,\"Binds\":[\"radio2016:/data\"],\"BlkioBps\":0,\"BlkioDeviceReadBps\":null,\"BlkioDeviceReadIOps\":null,\"BlkioDeviceWriteBps\":null,\"BlkioDeviceWriteIOps\":null,\"BlkioIOps\":0,\"BlkioWeight\":0,\"BlkioWeightDevice\":null,\"CapAdd\":null,\"CapDrop\":null,\"Cgroup\":\"\",\"CgroupParent\":\"\",\"ConsoleSize\":[0,0],\"ContainerIDFile\":\"\",\"CpuCount\":0,\"CpuPercent\":0,\"CpuPeriod\":0,\"CpuQuota\":0,\"CpuShares\":0,\"CpusetCpus\":\"\",\"CpusetMems\":\"\",\"Devices\":[],\"DiskQuota\":0,\"Dns\":[],\"DnsOptions\":[],\"DnsSearch\":[],\"ExtraHosts\":null,\"GroupAdd\":null,\"IpcMode\":\"\",\"Isolation\":\"\",\"KernelMemory\":0,\"Links\":null,\"LogConfig\":{\"Config\":{},\"Type\":\"\"},\"Memory\":0,\"MemoryReservation\":0,\"MemorySwap\":0,\"MemorySwappiness\":-1,\"NetworkMode\":\"default\",\"OomKillDisable\":false,\"OomScoreAdj\":0,\"PidMode\":\"\",\"PidsLimit\":0,\"PortBindings\":{},\"Privileged\":false,\"PublishAllPorts\":false,\"ReadonlyRootfs\":false,\"RestartPolicy\":{\"MaximumRetryCount\":0,\"Name\":\"no\"},\"SandboxSize\":0,\"SecurityOpt\":null,\"ShmSize\":0,\"StorageOpt\":null,\"UTSMode\":\"\",\"Ulimits\":null,\"UsernsMode\":\"\",\"VolumeDriver\":\"\",\"VolumesFrom\":null},\"Hostname\":\"\",\"Image\":\"busybox\",\"Labels\":{},\"NetworkingConfig\":{\"EndpointsConfig\":{}},\"OnBuild\":null,\"OpenStdin\":false,\"StdinOnce\":false,\"Tty\":false,\"User\":\"\",\"Volumes\":{},\"WorkingDir\":\"/data\"}"
May 06 20:07:22 photon-machine docker[2095]: time="2016-05-06T20:07:22.244314184Z" level=debug msg="container mounted via layerStore: /var/lib/docker/overlay/752bb2f7a90c487baf52ffa85d81dfd5b93a9cb9e1faee323a939c2481416600/merged"
May 06 20:07:22 photon-machine docker[2095]: time="2016-05-06T20:07:22.245067104Z" level=debug msg="copying image data from a04dc67c3a068a1a3022fe5d8dc23c10c6389265e1b61b1a74c8ed30865dc741:/data, to radio2016"
May 06 20:07:22 photon-machine docker[2095]: time="2016-05-06T20:07:22.251241798Z" level=error msg="Handler for POST /v1.23/containers/create returned error: VolumeDriver.Mount: Failed to connect to ESX over vsocket"

Describe the results you expected:
Command:

docker run  --rm -v radio2016:/data busybox cat /data/file

Logs

May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.879817062Z" level=debug msg="Calling POST /v1.23/containers/create"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.880548779Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":false,\"AttachStdout\":true,\"Cmd\":[\"cat\",\"/data/file\"],\"Domainname\":\"\",\"Entrypoint\":null,\"Env\":[],\"HostConfig\":{\"AutoRemove\":false,\"Binds\":[\"radio2016:/data\"],\"BlkioBps\":0,\"BlkioDeviceReadBps\":null,\"BlkioDeviceReadIOps\":null,\"BlkioDeviceWriteBps\":null,\"BlkioDeviceWriteIOps\":null,\"BlkioIOps\":0,\"BlkioWeight\":0,\"BlkioWeightDevice\":null,\"CapAdd\":null,\"CapDrop\":null,\"Cgroup\":\"\",\"CgroupParent\":\"\",\"ConsoleSize\":[0,0],\"ContainerIDFile\":\"\",\"CpuCount\":0,\"CpuPercent\":0,\"CpuPeriod\":0,\"CpuQuota\":0,\"CpuShares\":0,\"CpusetCpus\":\"\",\"CpusetMems\":\"\",\"Devices\":[],\"DiskQuota\":0,\"Dns\":[],\"DnsOptions\":[],\"DnsSearch\":[],\"ExtraHosts\":null,\"GroupAdd\":null,\"IpcMode\":\"\",\"Isolation\":\"\",\"KernelMemory\":0,\"Links\":null,\"LogConfig\":{\"Config\":{},\"Type\":\"\"},\"Memory\":0,\"MemoryReservation\":0,\"MemorySwap\":0,\"MemorySwappiness\":-1,\"NetworkMode\":\"default\",\"OomKillDisable\":false,\"OomScoreAdj\":0,\"PidMode\":\"\",\"PidsLimit\":0,\"PortBindings\":{},\"Privileged\":false,\"PublishAllPorts\":false,\"ReadonlyRootfs\":false,\"RestartPolicy\":{\"MaximumRetryCount\":0,\"Name\":\"no\"},\"SandboxSize\":0,\"SecurityOpt\":null,\"ShmSize\":0,\"StorageOpt\":null,\"UTSMode\":\"\",\"Ulimits\":null,\"UsernsMode\":\"\",\"VolumeDriver\":\"\",\"VolumesFrom\":null},\"Hostname\":\"\",\"Image\":\"busybox\",\"Labels\":{},\"NetworkingConfig\":{\"EndpointsConfig\":{}},\"OnBuild\":null,\"OpenStdin\":false,\"StdinOnce\":false,\"Tty\":false,\"User\":\"\",\"Volumes\":{},\"WorkingDir\":\"\"}"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.899373490Z" level=debug msg="container mounted via layerStore: /var/lib/docker/overlay/7a018b4d33979d6aedfdf638cc7175a754dd66149e05a561ed66535add263e6d/merged"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.899957317Z" level=debug msg="copying image data from 1b1dc474a6c8a81e5ca978545414dc7ae699895f00bc1776324e219f0d47730d:/data, to radio2016"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.904220227Z" level=debug msg="Calling POST /v1.23/containers/1b1dc474a6c8a81e5ca978545414dc7ae699895f00bc1776324e219f0d47730d/attach?stderr=1&stdout=1&stream=1"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.905298577Z" level=debug msg="attach: stdout: begin"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.905978497Z" level=debug msg="attach: stderr: begin"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.906917754Z" level=debug msg="Calling POST /v1.23/containers/1b1dc474a6c8a81e5ca978545414dc7ae699895f00bc1776324e219f0d47730d/start"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.910935261Z" level=debug msg="container mounted via layerStore: /var/lib/docker/overlay/7a018b4d33979d6aedfdf638cc7175a754dd66149e05a561ed66535add263e6d/merged"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.911741477Z" level=debug msg="Assigning addresses for endpoint cranky_wilson's interface on network bridge"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.912290781Z" level=debug msg="RequestAddress(LocalDefault/172.17.0.0/16, <nil>, map[])"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.919231621Z" level=debug msg="Assigning addresses for endpoint cranky_wilson's interface on network bridge"
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.928945451Z" level=debug msg="Programming external connectivity on endpoint cranky_wilson (d2bc58f7bb466dd11ab29368e474175816561b0daa7dc5658e33d1d23318602c)"
2016-05-06 20:07:07.935254635 +0000 UTC [INFO] Mounting volume name=radio2016
2016-05-06 20:07:07.935335565 +0000 UTC [DEBUG] vmdkOps.Attach name=radio2016
2016-05-06 20:07:07.935731702 +0000 UTC [WARNING] Failed to connect to ESX over vsocket
2016-05-06 20:07:07.935816918 +0000 UTC [ERROR] Failed to mount error="Failed to connect to ESX over vsocket" name=radio2016
May 06 20:07:07 photon-machine docker[2095]: time="2016-05-06T20:07:07.942648069Z" level=debug msg="Revoking external connectivity on endpoint cranky_wilson (d2bc58f7bb466dd11ab29368e474175816561b0daa7dc5658e33d1d23318602c)"
May 06 20:07:08 photon-machine docker[2095]: time="2016-05-06T20:07:08.092797949Z" level=debug msg="Releasing addresses for endpoint cranky_wilson's interface on network bridge"
May 06 20:07:08 photon-machine docker[2095]: time="2016-05-06T20:07:08.093481286Z" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.2)"
2016-05-06 20:07:08.100071566 +0000 UTC [INFO] Unmounting Volume name=radio2016
2016-05-06 20:07:08.100140506 +0000 UTC [ERROR] Failed to unmount volume. Now trying to detach... mountpoint="/mnt/vmdk/radio2016" error="invalid argument"
2016-05-06 20:07:08.100182593 +0000 UTC [DEBUG] vmdkOps.Detach name=radio2016
2016-05-06 20:07:08.100557026 +0000 UTC [WARNING] Failed to connect to ESX over vsocket
2016-05-06 20:07:08.10065241 +0000 UTC [ERROR] Failed to unmount name=radio2016 error="Failed to connect to ESX over vsocket"

Additional information you deem important (e.g. issue happens only occasionally):
This happens on both 1.10 and 1.11. We are writing a plugin for volume and expect to get an umount request on failed mounts. This is needed to maintain refcount within the plugin. The expectation is that docker will always send an unmount independent of the status of mount command and this is what we see when we do not pass in the -w option on run. Discussions part of https://github.com/docker/docker/pull/21015/files also indicate that umount will always be sent.
Either this is a bug in the clean up handling when -w is sent or the behavior of mount and umount needs to be specified.

@thaJeztah
Copy link
Member

/cc @cpuguy83

@govint
Copy link

govint commented Aug 16, 2016

Ping, is this issue going to be fixed in the upcoming RCs for Docker 1.12? Our plugin does reference counting on its own and when mount fails we have the volume ref. counted and unless an unmount gets sent (which happens always except when "-w" is passed) the volume is unusable till at least the plugin is restarted.

/cc @cpuguy83

@cpuguy83
Copy link
Member

@govint specifying -w has no bearing on volumes.
If mount returned an error, why would docker try to unmount?

@govint
Copy link

govint commented Aug 16, 2016

@cpuguy83, correct, but that's the bug though, the behavior is inconsistent with when docker sends the unmount to the volume driver.

If -w

is in the command line to run the container and the mount fails, then an unmount isn't issued to the volume driver.

But if -w isn't provided and the mount fails then an unmount is issued to the volume driver.

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

I think I see the problem.

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

I lied, I don't see it, but I can reproduce it. It only happens when -w is pointing at a volume dir.

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

Ok, I think I've found it...

It seems to be related to whether or not the path in the container exists or not.
When -w is used, the container path is automatically created during the create phase of the container. When -w is set to the same thing as a volume path, this will make sure that the path always exists.

Later when we get to CopyImagePathContent(), still in the create phase, it does:

    if _, err = ioutil.ReadDir(rootfs); err != nil {
        if os.IsNotExist(err) {
            return nil
        }
        return err
    }

    id := stringid.GenerateNonCryptoID()
    path, err := v.Mount(id)
    if err != nil {
        return err
    }

    defer func() {
        if err := v.Unmount(id); err != nil {
            logrus.Warnf("error while unmounting volume %s: %v", v.Name(), err)
        }
    }()

Basically, if the volume path in the container does not exist, it will return immediately here and the error you get from docker run is from the start phase, and it will give us the unmount when the container start fails.

When the volume path does exist, the error is returned during the create phase.

I think we should not unmount if mount gave an error. WDYT?

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

Actually, it would be difficult to track what is mounted and what is not. We can just send the unmount and this can effectively be a no-op for plugins if they don't recognize the ID.

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

Here we go: #27116
Got an easy way to not unmount unless neccessary, so there would be no unmounts if there was a mount error.

@msterin
Copy link

msterin commented Oct 3, 2016

that would change the current behavior though. Before the fix, Docker sends unmount on mount failures (sans -w existing_path case) , so the volume driver increments refcounts on mount and decrements on refcount, no matter if mount succeeds of fails. With the suggested behavior, the driver(s) would need to be changed to skip refcount increment on failed mount, and this would have to be dependent on Docker version.

Is there any way just to send an umount on all mount failures (including "-w existing_path") ?

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 3, 2016

@msterin Right now the behavior is inconsistent.
If the path exists then you get multiple unmounts, if the path doesn't exist then you get no unmount.

A change as you are suggesting is the same problem, just in the other direction (re: changing behavior).
I would rather change to the correct behavior, where it doesn't make sense to send an unmount for something that's not mounted.

@msterin
Copy link

msterin commented Oct 3, 2016

@cpuguy83 I don't think you get multiple unmounts on existing path when mount fails, I think you get none.
At any rate - I agree there is no reason to send unmount on failed mounts - it was just a painful inconsistency we've already adjusted to :-)

Now we'll adjust to the correct one.

cpuguy83 added a commit to cpuguy83/docker that referenced this issue Nov 10, 2016
Fixes moby#22564

When an error occurs on mount, there should not be any call later to
unmount. This can throw off refcounting in the underlying driver
unexpectedly.

Consider these two cases:

```
$ docker run -v foo:/bar busybox true
```

```
$ docker run -v foo:/bar -w /foo busybox true
```

In the first case, if mounting `foo` fails, the volume driver will not
get a call to unmount (this is the incorrect behavior).

In the second case, the volume driver will not get a call to unmount
(correct behavior).

This occurs because in the first case, `/bar` does not exist in the
container, and as such there is no call to `volume.Mount()` during the
`create` phase. It will error out during the `start` phase.

In the second case `/bar` is created before dealing with the volume
because of the `-w`. Because of this, when the volume is being setup
docker will try to copy the image path contents in the volume, in which
case it will attempt to mount the volume and fail. This happens during
the `create` phase. This makes it so the container will not be created
(or at least fully created) and the user gets the error on `create`
instead of `start`. The error handling is different in these two phases.

Changed to only send `unmount` if the volume is mounted.

While investigating the cause of the reported issue I found some odd
behavior in unmount calls so I've cleaned those up a bit here as well.

Signed-off-by: Brian Goff <[email protected]>
@thaJeztah thaJeztah added this to the 1.13.0 milestone Nov 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants