Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmwarefusion: failed to start after stop: Error configuring auth on host: Too many retries waiting for SSH to be available #1382

Closed
ay0o opened this issue Apr 19, 2017 · 18 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ay0o
Copy link

ay0o commented Apr 19, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Minikube version (use minikube version): 0.18.0

Environment:

  • OS (e.g. from /etc/os-release): MacOS 10.12.4
  • VM Driver (e.g. cat ~/.minikube/machines/minikube/config.json | grep DriverName): vmwarefusion
  • ISO version (e.g. cat ~/.minikube/machines/minikube/config.json | grep -i ISO or minikube ssh cat /etc/VERSION): boot2docker.iso
  • Install tools:
  • Others:

What happened:
Using Vmware Fusion in Mac OS, the first time minikube is started, it works flawlessly. However, after minikube stop, if I run again minikube start --vm-driver=vmwarefusion, it will fail and never run the minikube.

Starting local Kubernetes cluster...
Starting VM...
Waiting for SSH to be available...
E0419 23:27:50.099029    1781 start.go:116] Error starting host: Temporary Error: Error configuring auth on host: Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded.

What you expected to happen:
Be able to start the cluster after stopping it.

How to reproduce it (as minimally and precisely as possible):

minikube start --vm-driver=vmwarefusion
minikube stop
minikube start --vm-driver=vmwarefusion

Anything else do we need to know:
The only solution I've found so far is to minikube delete and start over.

@r2d4 r2d4 added co/vmwarefusion-driver Issues with legacy VMware Fusion Driver kind/bug Categorizes issue or PR as related to a bug. labels Apr 20, 2017
@erichmond
Copy link

+1. Also seeing this behavior on three machines. Exact same environment.

@atiwari
Copy link

atiwari commented May 9, 2017

Same for me too. Does not start once stopped.

Looks similar to #1107

Getting to WaitForSSH function...
(minikube) Calling .GetSSHHostname
(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun list
(minikube) DBG | MAC address in VMX: 00:0c:29:e0:b3:62
(minikube) DBG | Trying to find IP address in configuration file: /Library/Preferences/VMware Fusion/vmnet1/dhcpd.conf
(minikube) DBG | Following IPs found map[00:50:56:c0:00:01:172.16.86.1]
(minikube) DBG | Trying to find IP address in configuration file: /Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf
(minikube) DBG | Following IPs found map[00:50:56:c0:00:08:172.16.0.1 00:0c:29:59:7d:eb:172.16.0.106]
(minikube) DBG | Trying to find IP address in leases file: /var/db/vmware/vmnet-dhcpd-vmnet1.leases
(minikube) DBG | Trying to find IP address in leases file: /var/db/vmware/vmnet-dhcpd-vmnet8.leases
(minikube) DBG | IP found in DHCP lease table: 172.16.0.184
(minikube) Calling .GetSSHPort
(minikube) Calling .GetSSHKeyPath
(minikube) Calling .GetSSHKeyPath
(minikube) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/arvtiwar/.minikube/machines/minikube/id_rsa (-rw-------)
&{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none [email protected] -o IdentitiesOnly=yes -i /Users/arvtiwar/.minikube/machines/minikube/id_rsa -p 22] /usr/bin/ssh }
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255:
Error getting ssh command 'exit 0' : Something went wrong running an SSH command!
command : exit 0
err : exit status 255
output :

@flying-binh
Copy link

Same issue here, I am facing the below errors and the minikube is keep retrying.

Starting VM...
E0512 14:29:35.839657 62651 start.go:119] Error starting host: Temporary Error: Error configuring auth on host: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded.

@b333z
Copy link

b333z commented May 14, 2017

Experiencing same issue here.

Did some digging with vmrun and found that guest /home/docker/.ssh dir is missing.

As a workaround I found I could get the cluster running again by:

minikube start -v 10 (get it to start the vm for you, [ctrl]+[c} once you start to see the 255 errors)

Then running this script on host to restore missing ssh keys in guest:

#!/bin/bash

MINIKUBE="${HOME}/.minikube/machines/minikube"
VMX="$MINIKUBE/minikube.vmx"
DOCKER_PUB_KEY="$MINIKUBE/id_rsa.pub"

function vmrun {
	GUESTCMD=$1; shift
	"/Applications/VMware Fusion.app/Contents/Library/vmrun" -gu docker -gp tcuser $GUESTCMD "$VMX" "$@"
}

vmrun runScriptInGuest /bin/bash "mkdir -p /home/docker/.ssh"
vmrun CopyFileFromHostToGuest "$DOCKER_PUB_KEY" /home/docker/.ssh/authorized_keys 
vmrun runScriptInGuest /bin/bash "chown -R docker /home/docker/.ssh" 
vmrun runScriptInGuest /bin/bash "chmod -R 700 /home/docker/.ssh" 

Then running start again now that ssh access is restored to bring it up: minikube start -v 10

Did a some quick digging for a cause, found this in minikube-automount logs, minikube-automount restores userdata.tar to populate the /home/docker/.ssh dir and so without that we get the 255 error from the client ssh

May 14 11:50:05 minikube minikube-automount[4977]: + tar xf /var/lib/boot2docker/userdata.tar -C /home/docker/
May 14 11:50:05 minikube minikube-automount[4977]: tar: can't open '/var/lib/boot2docker/userdata.tar': No such file or directory
May 14 11:50:05 minikube minikube-automount[4977]: + chown -R docker:docker /home/docker/.ssh
May 14 11:50:05 minikube minikube-automount[4977]: chown: /home/docker/.ssh: No such file or directory

/var/lib/boot2docker points onto persistent storage, so that is good:

$ ls -la /var/lib                             
total 0
drwxr-xr-x    7 root     root             0 May 14 11:50 .
drwxr-xr-x    4 root     root             0 May 14 11:50 ..
drwxr-xr-x    2 root     root             0 Feb  8 19:46 arpd
lrwxrwxrwx    1 root     root            29 May 14 11:50 boot2docker -> /mnt/sda1/var/lib/boot2docker
lrwxrwxrwx    1 root     root            21 May 14 11:50 cni -> /mnt/sda1/var/lib/cni
drwxr-xr-x    2 root     root             0 Feb  8 19:43 dbus
lrwxrwxrwx    1 root     root            24 May 14 11:50 docker -> /mnt/sda1/var/lib/docker
lrwxrwxrwx    1 root     root            25 May 14 11:50 kubelet -> /mnt/sda1/var/lib/kubelet
lrwxrwxrwx    1 root     root            27 May 14 11:50 localkube -> /mnt/sda1/var/lib/localkube
drwx------    2 root     root             0 May 14 11:50 machines
lrwxrwxrwx    1 root     root             9 Feb  8 19:23 misc -> ../../tmp
lrwxrwxrwx    1 root     root            21 May 14 11:50 rkt -> /mnt/sda1/var/lib/rkt
drwx--x--x    3 root     root             0 Feb  8 19:52 sudo
drwxr-xr-x    4 root     root             0 May 14 11:50 systemd

But there is no userdata.tar contained within.

$ find /mnt/sda1/var/lib/boot2docker -ls
  1835011      4 drwxr-xr-x   3  root     root         4096 May 12 21:46 /mnt/sda1/var/lib/boot2docker
  1835040      4 drwxr-xr-x   2  root     root         4096 May 12 21:46 /mnt/sda1/var/lib/boot2docker/etc

Yet to find out why userdata.tar is missing... But looks to be handled here: https://github.com/kubernetes/minikube/blob/k8s-v1.7/deploy/iso/minikube-iso/package/automount/minikube-automount

So I'm thinking the logs from the guest on first boot (journalctl -t minikube-automount) might show us the problem... will try to grab when I can.

@b333z
Copy link

b333z commented May 15, 2017

Created a cluster from scratch:

The userdata.tar get uploaded to the guest early in minikube create via vmrun:

(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun -gu docker -gp tcuser CopyFileFromHostToGuest /Users/b/.minikube/machines/minikube/minikube.vmx /Users/b/.minikube/machines/minikube/userdata.tar /home/docker/userdata.tar
(minikube) DBG | executing: /Applications/VMware Fusion.app/Contents/Library/vmrun -gu docker -gp tcuser runScriptInGuest /Users/b/.minikube/machines/minikube/minikube.vmx /bin/sh sudo /bin/mv /home/docker/userdata.tar /var/lib/boot2docker/userdata.tar && sudo tar xf /var/lib/boot2docker/userdata.tar -C /home/docker/ > /var/log/userdata.log 2>&1 && sudo chown -R docker:staff /home/docker

So now it is here on the guest: /var/lib/boot2docker/userdata.tar

Later on when minikube-automount is enabled and started it gets wiped by rm -rf /var/lib/docker /var/lib/boot2docker before it symlinks up the data partition:

May 15 15:11:31 minikube minikube-automount[4936]: + '[' -n /dev/sda1 ']'
May 15 15:11:31 minikube minikube-automount[4936]: ++ echo /dev/sda1
May 15 15:11:31 minikube minikube-automount[4936]: ++ sed 's/.*\///'
May 15 15:11:31 minikube minikube-automount[4936]: + PARTNAME=sda1
May 15 15:11:31 minikube minikube-automount[4936]: + echo 'mount p:sda1 ...'
May 15 15:11:31 minikube minikube-automount[4936]: mount p:sda1 ...
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1
May 15 15:11:31 minikube minikube-automount[4936]: + mount /dev/sda1 /mnt/sda1
May 15 15:11:31 minikube minikube-automount[4936]: + umount -f /var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: umount: /var/lib/docker: mountpoint not found
May 15 15:11:31 minikube minikube-automount[4936]: + true
May 15 15:11:31 minikube minikube-automount[4936]: + rm -rf /var/lib/docker /var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /var/lib
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/boot2docker /var/lib/boot2docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/docker /var/lib/docker
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/log
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/log /var/log
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/kubelet
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/kubelet /var/lib/kubelet
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/var/lib/cni
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/var/lib/cni /var/lib/cni
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/data
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/data /data
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/hostpath_pv
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/hostpath_pv /tmp/hostpath_pv
May 15 15:11:31 minikube minikube-automount[4936]: + mkdir -p /mnt/sda1/hostpath-provisioner
May 15 15:11:31 minikube minikube-automount[4936]: + ln -s /mnt/sda1/hostpath-provisioner /tmp/hostpath-provisioner
May 15 15:11:31 minikube minikube-automount[4936]: + rm -rf /var/lib/rkt

Without knowledge of the other drivers, a possible fix might be to change minikube-automount to cp /var/lib/boot2docker/userdata.tar /mnt/sda1/var/lib/boot2docker/userdata.tar before the rm wipes it?

@lateagain
Copy link

Thanks. After making a fresh cluster I put the tar file in by hand.

  1. minikube ssh
  2. sudo cp /Users/[mylogin]/.minikube/machines/minikube/userdata.tar /var/lib/boot2docker/
    and it now starts after a stop.

@rvowles
Copy link

rvowles commented Jul 27, 2017

I used the last piece of advice from @b333z and did the

  1. minikube ssh
  2. sudo cp /Users/[mylogin]/.minikube/machines/minikube/userdata.tar /mnt/sda1/var/lib/boot2docker/

as I wasn't able to get the /var/lib/boot2docker copy to work. I'm using 0.21. But now it works - so thanks ever so much for that investigation!

@joshk0
Copy link

joshk0 commented Aug 28, 2017

This commit seems to be a fix for the issue (minikube itself has no code dictating when userdata is copied.) Can we pull it in to minikube?

@joshk0
Copy link

joshk0 commented Oct 16, 2017

ping? I can try just blindly replacing the commit SHA1 in Godeps.json and seeing if tests pass...

@r2d4
Copy link
Contributor

r2d4 commented Oct 17, 2017

@joshk0 should be fixed by #2060

@urbaniak
Copy link

using latest v0.23.0 and still getting the same issue, is the fix included in that version?

is there any nightly build to test it?

the easiest way of fixing it is just ssh-copy-id -i ~/.minikube/machines/minikube/id_rsa.pub docker@$(minikube ip) while minikube is starting, the password is here cat ~/.minikube/machines/minikube/config.json|grep -i pass

@rvowles
Copy link

rvowles commented Nov 15, 2017

I wouldn't get 0.23.0 to work on MacOS at all, so thanks for the fix @urbaniak !

@GKTheOne
Copy link

Thanks @urbaniak. I used it to fix #2126.

@madeofstars0
Copy link

I used the script from urbaniak to get minikube to come up in VMWare Fusion 10.0.1 as well. I had the same error as #2126

@rvowles
Copy link

rvowles commented Dec 5, 2017

oddly, i have to use it every single time i start minikube.

@mikeroySoft
Copy link

This issue seems resolved with minikube 0.25.0

@ay0o
Copy link
Author

ay0o commented Jan 31, 2018

I no longer have vmware running on my MBP, so I cannot verify it. If more people confirm it's working, I'll close it.

@bsedat
Copy link

bsedat commented Jan 31, 2018

It is fixed for me on v0.25.0.

@ay0o ay0o closed this as completed Jan 31, 2018
@tstromberg tstromberg changed the title Fail to start the cluster after minikube stop in MacOS using vmwarefusion vmwarefusion: failed to start after stop: Error configuring auth on host: Too many retries waiting for SSH to be available Sep 19, 2018
@tstromberg tstromberg removed the co/vmwarefusion-driver Issues with legacy VMware Fusion Driver label Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests