-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation of VM pod lifecycle #746
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 approvals obtained (waiting on @jellonek)
docs/vmpod-lifecycle.md, line 10 at r1 (raw file):
straming
streaming
docs/vmpod-lifecycle.md, line 22 at r1 (raw file):
node
the node
docs/vmpod-lifecycle.md, line 23 at r1 (raw file):
lables
labels
docs/vmpod-lifecycle.md, line 24 at r1 (raw file):
particular
a particular or the particular
docs/vmpod-lifecycle.md, line 31 at r1 (raw file):
configuration
the configuration
docs/vmpod-lifecycle.md, line 38 at r1 (raw file):
reconfiguring i to
what does i
stand for? :)
docs/vmpod-lifecycle.md, line 39 at r1 (raw file):
unsuable
unusable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej and @jellonek)
docs/vmpod-lifecycle.md, line 10 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
straming
streaming
Done.
docs/vmpod-lifecycle.md, line 22 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
node
the node
Done.
docs/vmpod-lifecycle.md, line 23 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
lables
labels
Done.
docs/vmpod-lifecycle.md, line 24 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
particular
a particular or the particular
Done.
docs/vmpod-lifecycle.md, line 31 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
configuration
the configuration
Done.
docs/vmpod-lifecycle.md, line 38 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
reconfiguring i to
what does
i
stand for? :)
it. Done.
docs/vmpod-lifecycle.md, line 39 at r1 (raw file):
Previously, pigmej (Jędrzej Nowak) wrote…
unsuable
unusable?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general mistakes:
- tapmanager doesn't use gRPC because it needs to send fds over unix domain sockets
- apiserver calls are not forwarded to kubelet but rather kubelet watches the apiserver
- CRI uses blocking calls instead of repeatedly checking status while containers/pods are created, images are pulled etc.
- images are not pulled during pod sandbox startup (that is, while RunPodSandbox hasn't returned yet), but rather right before creating the containers, see SyncPod in kubernetes srcs
Reviewable status: 1 of 2 approvals obtained (waiting on @pigmej and @jellonek)
docs/vmpod-lifecycle.md, line 23 at r1 (raw file):
Previously, jellonek (Piotr Skamruk) wrote…
Done.
Scheduler places the pod on a node based on the requested resources (CPU, memory, etc.) as well as pod'snodeSelector and pod/node affinity constraints, taints/tolerations and so on.
docs/vmpod-lifecycle.md, line 24 at r1 (raw file):
Previously, jellonek (Piotr Skamruk) wrote…
Done.
kubelet running on the target node accepts the pod
docs/vmpod-lifecycle.md, line 1 at r2 (raw file):
# Lifecycle of VM pod
of a VM pod
docs/vmpod-lifecycle.md, line 4 at r2 (raw file):
This document describes life cycle of VM pod according to how it's handled on level of Virtlet.
describes the lifecycle of a VM pod managed by Virtlet
docs/vmpod-lifecycle.md, line 10 at r2 (raw file):
access to logs/console (done by another part of Virtlet process - [streaming server](https://github.com/Mirantis/virtlet/tree/master/pkg/stream)), or port forwarding (also done by streaming server).
This description omits the details of volume setup (using flexvolumes), handling of logs, the VM console and port forwarding (done by streaming server
docs/vmpod-lifecycle.md, line 15 at r2 (raw file):
Communication between kubelet and Virtlet goes through [criproxy](https://github.com/Mirantis/criproxy) which directs requests to Virtlet only if they match specific pod labels/annotations.
only if the requests concern a pod that has Virtlet-specific annotation or an image that has Virtlet-specific prefix
docs/vmpod-lifecycle.md, line 19 at r2 (raw file):
## Lifecycle ### Pod starting procedure
VM Pod Startup
docs/vmpod-lifecycle.md, line 21 at r2 (raw file):
### Pod starting procedure * User (or some mechanism like autoscaller, deamonset controller) creates pod object.
A pod is created in Kubernetes cluster, either directly by the user or via some other mechanism such as a higher-level Kubernetes object managed by kube-controller-manager (ReplicaSet, DaemonSet etc.)
docs/vmpod-lifecycle.md, line 27 at r2 (raw file):
* `kubelet` calls through [CRI](https://contributor.kubernetes.io/contributors/devel/container-runtime-interface/) the runtime service to create "sandbox" (enclosing container for all containers in pod definition), passing information about sandbox constraints/annotations (without actual info about containers).
-
kubelet invokes a CRI call RunPodSandbox to create the pod sandbox which will enclose all the containers in the pod definition. Note that at this point no information about the containers within the pod is passed to the call. kubelet can later request the information about the pod by means of
PodSandboxStatus
calls. -
If there's a Virtlet-specific annotation
kubernetes.io/target-runtime: virtlet.cloud
, CRI proxy passes the call to Virtlet
docs/vmpod-lifecycle.md, line 32 at r2 (raw file):
virtlet and pod network namespaces) [add this sandbox to your network](https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters) command on cni plugin according to the configuration in `/etc/cni/net.d` on the node (note: configuration from first in lexical order file in this directory, rest are ignored).
Virtlet saves sandbox metadata in its internal database, sets up the network namespace and then uses internal 'tapmanager' mechanism to invoke ADD operation via the CNI plugin as specified by the CNI configuration on the node.
docs/vmpod-lifecycle.md, line 35 at r2 (raw file):
* Plugin does it job configuring interfaces/addressation/routes/iptables/et.c. then retuns to runtime [info](https://github.com/containernetworking/cni/blob/master/SPEC.md#result) about interfaces configured and their ip configuration.
The CNI plugin configures the network namespace by setting up network interfaces, IP addresses, routes, iptables rules and so on, and returns the network configuration information to the caller as described in the CNI spec.
docs/vmpod-lifecycle.md, line 42 at r2 (raw file):
calls in memory, then returns it to main part of Virtlet. It stores this data in own metadata store - at this point pod sandbox has ip configuration which can be queried by `kubelet` calls for `PodSandboxStatus`.
Virtlet's tapmanager mechanism adjusts the configuration of the network namespace to make it work with the VM.
(note not to be included: I've mentioned PodSandboxStatus above, no need to repeat it here)
docs/vmpod-lifecycle.md, line 45 at r2 (raw file):
* In parallel to call for sandbox creation, `kubelet` asks Virtlet (its image service) to download "container image" (in this case qcow2 image) as defined in container part of pod description.
-
after creating the sandbox, kubelet starts the containers defined in the pod sandbox. Currently, Virtlet supports just one container per VM pod. So, the VM pod startup steps after this one describe the startup of this single container.
-
Depending on the image pull police of the container, kubelet checks if the image needs to be pulled by means of
ImageStatus
call and then usesPullImage
CRI call to pull the image if it doesn't exist or ifimagePullPolicy: Always
is used.
docs/vmpod-lifecycle.md, line 47 at r2 (raw file):
part of pod description. * Virtlet resolves image location considering configuration of [image name translation](https://github.com/Mirantis/virtlet/blob/master/docs/image-name-translation.md), downloads file for this location and stores it in libvirt images store.
- If
PullImage
is invoked, Virtlet resolves the image location based on the image name translation configuration, then downloads the file and stores it in the image store.
Note not to be included: Virtlet presently doesn't use libvirt image store, using it's own one instead.
docs/vmpod-lifecycle.md, line 51 at r2 (raw file):
(what is verified by `kubelet` by subsequent calls to Virtlet image service), `kubelet` asks Virtlet to create in particular sandbox container from downloaded image (note: Virtlet alows only for single "container"/vm in single pod sandbox).
- After the image is ready (no pull was needed or the
PullImage
call completed successfully), kubelet usesCreateContainer
CRI call to create the container in the pod sandbox using the specified image.
Note not to be included: single container per pod is mentioned earlier
docs/vmpod-lifecycle.md, line 56 at r2 (raw file):
them in the same time) - without any networking and with emulator path set to [`vmwrapper`](https://github.com/Mirantis/virtlet/tree/master/cmd/vmwrapper), instead of default `qemu`.
Virtlet uses the sandbox and container metadata to generate libvirt domain definition, using vmwrapper binary as the emulator and without specifying any network configuration in the domain.
docs/vmpod-lifecycle.md, line 57 at r2 (raw file):
set to [`vmwrapper`](https://github.com/Mirantis/virtlet/tree/master/cmd/vmwrapper), instead of default `qemu`. * When foregoing call finishes `kubelet` calls runtime to start previously created container.
After CreateContainer
call completes, kubelet invokes StartContainer
call on the newly created container.
docs/vmpod-lifecycle.md, line 64 at r2 (raw file):
command line parameters, after switching it's network namespace to PodSandbox namespace it execs to `qemu` with new set of command line parameters. At this time they also include info about network devices.
Virtlet starts the libvirt domain. libvirt invokes vmwrapper
as the emulator, passing it the necessary command line arguments as well as environment variables set by Virtlet. vmwrapper uses the environment variable values passed to Virtlet to communicate with tapmanager
over an Unix domain socket, retrieving a file descriptor for a tap device set up by tapmanager
. tapmanager
uses its own simple protocol to communicate with vmwrapper
because it needs to send file descriptors over the socket. This is not usually supported by RPC libraries, see e.g. grpc/grpc#11417 vmwrapper
then updated the command line arguments to include the network interface information and execs the actual emulator (qemu).
docs/vmpod-lifecycle.md, line 66 at r2 (raw file):
At this time they also include info about network devices. At this point - VM is running and visible, pod is operating in running state, same with "container"/VM.
At this point the VM is running and accessible via the network, and the pod is in Running
state as well as it's only container.
docs/vmpod-lifecycle.md, line 68 at r2 (raw file):
At this point - VM is running and visible, pod is operating in running state, same with "container"/VM. ### Pod removal procedure
Deleting the pod
docs/vmpod-lifecycle.md, line 70 at r2 (raw file):
### Pod removal procedure This part is ignited by user/machinery call to delete pod (`kubectl delete` pod or e.g. autoscaller force for scale down replica set):
This sequence is initiated when the pod is deleted, either by means of kubectl delete
or a controller manager action due to deletion or downscaling of a higher-level object.
docs/vmpod-lifecycle.md, line 72 at r2 (raw file):
This part is ignited by user/machinery call to delete pod (`kubectl delete` pod or e.g. autoscaller force for scale down replica set): * Call to `apiserver` is forwarded to particular node/`kubelet` controlling particular pod
kubelet notices the pod being deleted
docs/vmpod-lifecycle.md, line 73 at r2 (raw file):
* Call to `apiserver` is forwarded to particular node/`kubelet` controlling particular pod * `kubelet` calls runtime to stop container
kubelet invokes StopContainer
CRI calls which is getting forwared to Virtlet based on the containing pod sandbox annotations.
docs/vmpod-lifecycle.md, line 77 at r2 (raw file):
`qemu` is finishing it's job in some time (if it does not do that in first place in reasonable time - there is forcible kill called by Virtlet through libvirt)
Virtlet stops the libvirt domain. libvirt sends a signal to qemu, which starts the shutdown. If it doesn't quit in a reasonable time determined by pod's termination grace period, Virtlet will forcibly terminate the domain, thus killing the qemu process.
docs/vmpod-lifecycle.md, line 79 at r2 (raw file):
libvirt) * `kubelet` checks status of "container"/VM subsequetially and when all containers are down - it calls runtime to `StopPodSandbox`.
After all the containers in the pod (the single container in case of Virtlet VM pod) are stopped, kubelet invokes StopPodSandbox
CRI call.
docs/vmpod-lifecycle.md, line 81 at r2 (raw file):
containers are down - it calls runtime to `StopPodSandbox`. * During this call Virtlet calls (using `tapmanager`) cni plugin to remove pod from network.
Virtlet asks its tapmanager
to remove pod from the network by means of CNI DEL
command.
docs/vmpod-lifecycle.md, line 84 at r2 (raw file):
* `kubelet` checks status of pod sandbox subsequetially and when it notices that it's in stopped state, after some time (which is not constant) it calls Virtlet to garbage collect `PodSandbox`.
- aftter
StopPodSandbox
returns, the pod sandbox will be eventually GC'd by kubelet by means ofRemovePodSandbox
CRI call
docs/vmpod-lifecycle.md, line 85 at r2 (raw file):
that it's in stopped state, after some time (which is not constant) it calls Virtlet to garbage collect `PodSandbox`. * During that call Virtlet cleanups it's metadata about `PodSandbox`.
- Upon
RemovePodSandbox
, Virtlet removes the pod metadata from its internal database.
[ci skip]
5b630b0
to
3ed00cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 1 of 2 approvals obtained (waiting on @pigmej, @ivan4th, and @jellonek)
docs/vmpod-lifecycle.md, line 1 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
of a VM pod
Done.
docs/vmpod-lifecycle.md, line 4 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
describes the lifecycle of a VM pod managed by Virtlet
Done.
docs/vmpod-lifecycle.md, line 10 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
This description omits the details of volume setup (using flexvolumes), handling of logs, the VM console and port forwarding (done by streaming server
Done.
docs/vmpod-lifecycle.md, line 15 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
only if the requests concern a pod that has Virtlet-specific annotation or an image that has Virtlet-specific prefix
Done.
docs/vmpod-lifecycle.md, line 19 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
VM Pod Startup
Done.
docs/vmpod-lifecycle.md, line 21 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
A pod is created in Kubernetes cluster, either directly by the user or via some other mechanism such as a higher-level Kubernetes object managed by kube-controller-manager (ReplicaSet, DaemonSet etc.)
Done.
docs/vmpod-lifecycle.md, line 27 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
kubelet invokes a CRI call RunPodSandbox to create the pod sandbox which will enclose all the containers in the pod definition. Note that at this point no information about the containers within the pod is passed to the call. kubelet can later request the information about the pod by means of
PodSandboxStatus
calls.If there's a Virtlet-specific annotation
kubernetes.io/target-runtime: virtlet.cloud
, CRI proxy passes the call to Virtlet
Done.
docs/vmpod-lifecycle.md, line 32 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet saves sandbox metadata in its internal database, sets up the network namespace and then uses internal 'tapmanager' mechanism to invoke ADD operation via the CNI plugin as specified by the CNI configuration on the node.
Done.
docs/vmpod-lifecycle.md, line 35 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
The CNI plugin configures the network namespace by setting up network interfaces, IP addresses, routes, iptables rules and so on, and returns the network configuration information to the caller as described in the CNI spec.
Done.
docs/vmpod-lifecycle.md, line 42 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet's tapmanager mechanism adjusts the configuration of the network namespace to make it work with the VM.
(note not to be included: I've mentioned PodSandboxStatus above, no need to repeat it here)
Done.
docs/vmpod-lifecycle.md, line 51 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
- After the image is ready (no pull was needed or the
PullImage
call completed successfully), kubelet usesCreateContainer
CRI call to create the container in the pod sandbox using the specified image.Note not to be included: single container per pod is mentioned earlier
Done.
docs/vmpod-lifecycle.md, line 56 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet uses the sandbox and container metadata to generate libvirt domain definition, using vmwrapper binary as the emulator and without specifying any network configuration in the domain.
Done.
docs/vmpod-lifecycle.md, line 57 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
After
CreateContainer
call completes, kubelet invokesStartContainer
call on the newly created container.
Done.
docs/vmpod-lifecycle.md, line 64 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet starts the libvirt domain. libvirt invokes
vmwrapper
as the emulator, passing it the necessary command line arguments as well as environment variables set by Virtlet. vmwrapper uses the environment variable values passed to Virtlet to communicate withtapmanager
over an Unix domain socket, retrieving a file descriptor for a tap device set up bytapmanager
.tapmanager
uses its own simple protocol to communicate withvmwrapper
because it needs to send file descriptors over the socket. This is not usually supported by RPC libraries, see e.g. grpc/grpc#11417vmwrapper
then updated the command line arguments to include the network interface information and execs the actual emulator (qemu).
Done.
With additional info about SR-IOV.
Still this description uses notation of single interface, while it should talk about one or more interfaces.
docs/vmpod-lifecycle.md, line 66 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
At this point the VM is running and accessible via the network, and the pod is in
Running
state as well as it's only container.
Done.
docs/vmpod-lifecycle.md, line 70 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
This sequence is initiated when the pod is deleted, either by means of
kubectl delete
or a controller manager action due to deletion or downscaling of a higher-level object.
Done.
docs/vmpod-lifecycle.md, line 72 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
kubelet notices the pod being deleted
Done.
docs/vmpod-lifecycle.md, line 73 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
kubelet invokes
StopContainer
CRI calls which is getting forwared to Virtlet based on the containing pod sandbox annotations.
Done.
docs/vmpod-lifecycle.md, line 77 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet stops the libvirt domain. libvirt sends a signal to qemu, which starts the shutdown. If it doesn't quit in a reasonable time determined by pod's termination grace period, Virtlet will forcibly terminate the domain, thus killing the qemu process.
Done.
docs/vmpod-lifecycle.md, line 81 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
Virtlet asks its
tapmanager
to remove pod from the network by means of CNIDEL
command.
Done.
docs/vmpod-lifecycle.md, line 84 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
- aftter
StopPodSandbox
returns, the pod sandbox will be eventually GC'd by kubelet by means ofRemovePodSandbox
CRI call
Done.
docs/vmpod-lifecycle.md, line 85 at r2 (raw file):
Previously, ivan4th (Ivan Shvedunov) wrote…
- Upon
RemovePodSandbox
, Virtlet removes the pod metadata from its internal database.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 change requests, 1 of 2 approvals obtained (waiting on @pigmej and @jellonek)
[ci skip]
This change is