Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of VM pod lifecycle #746

Merged
merged 1 commit into from
Sep 11, 2018
Merged

Documentation of VM pod lifecycle #746

merged 1 commit into from
Sep 11, 2018

Conversation

jellonek
Copy link
Contributor

@jellonek jellonek commented Aug 24, 2018

[ci skip]


This change is Reviewable

Copy link
Contributor

@pigmej pigmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 approvals obtained (waiting on @jellonek)


docs/vmpod-lifecycle.md, line 10 at r1 (raw file):

straming

streaming


docs/vmpod-lifecycle.md, line 22 at r1 (raw file):

node

the node


docs/vmpod-lifecycle.md, line 23 at r1 (raw file):

lables

labels


docs/vmpod-lifecycle.md, line 24 at r1 (raw file):

particular

a particular or the particular


docs/vmpod-lifecycle.md, line 31 at r1 (raw file):

configuration

the configuration


docs/vmpod-lifecycle.md, line 38 at r1 (raw file):

 reconfiguring i to 

what does i stand for? :)


docs/vmpod-lifecycle.md, line 39 at r1 (raw file):

 unsuable

unusable?

Copy link
Contributor Author

@jellonek jellonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej and @jellonek)


docs/vmpod-lifecycle.md, line 10 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
straming

streaming

Done.


docs/vmpod-lifecycle.md, line 22 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
node

the node

Done.


docs/vmpod-lifecycle.md, line 23 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
lables

labels

Done.


docs/vmpod-lifecycle.md, line 24 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
particular

a particular or the particular

Done.


docs/vmpod-lifecycle.md, line 31 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
configuration

the configuration

Done.


docs/vmpod-lifecycle.md, line 38 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
 reconfiguring i to 

what does i stand for? :)

it. Done.


docs/vmpod-lifecycle.md, line 39 at r1 (raw file):

Previously, pigmej (Jędrzej Nowak) wrote…
 unsuable

unusable?

Done.

Copy link
Contributor

@pigmej pigmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej)

Copy link
Contributor

@pigmej pigmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 1 change requests, 0 of 2 approvals obtained (waiting on @pigmej)

Copy link
Contributor

@ivan4th ivan4th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general mistakes:

  • tapmanager doesn't use gRPC because it needs to send fds over unix domain sockets
  • apiserver calls are not forwarded to kubelet but rather kubelet watches the apiserver
  • CRI uses blocking calls instead of repeatedly checking status while containers/pods are created, images are pulled etc.
  • images are not pulled during pod sandbox startup (that is, while RunPodSandbox hasn't returned yet), but rather right before creating the containers, see SyncPod in kubernetes srcs

Reviewable status: 1 of 2 approvals obtained (waiting on @pigmej and @jellonek)


docs/vmpod-lifecycle.md, line 23 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Done.

Scheduler places the pod on a node based on the requested resources (CPU, memory, etc.) as well as pod'snodeSelector and pod/node affinity constraints, taints/tolerations and so on.


docs/vmpod-lifecycle.md, line 24 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Done.

kubelet running on the target node accepts the pod


docs/vmpod-lifecycle.md, line 1 at r2 (raw file):

# Lifecycle of VM pod

of a VM pod


docs/vmpod-lifecycle.md, line 4 at r2 (raw file):

This document describes life cycle of VM pod according to how it's handled
on level of Virtlet.

describes the lifecycle of a VM pod managed by Virtlet


docs/vmpod-lifecycle.md, line 10 at r2 (raw file):

access to logs/console (done by another part of Virtlet process - 
[streaming server](https://github.com/Mirantis/virtlet/tree/master/pkg/stream)),
 or port forwarding (also done by streaming server).

This description omits the details of volume setup (using flexvolumes), handling of logs, the VM console and port forwarding (done by streaming server


docs/vmpod-lifecycle.md, line 15 at r2 (raw file):

Communication between kubelet and Virtlet goes through [criproxy](https://github.com/Mirantis/criproxy)
which directs requests to Virtlet only if they match specific pod labels/annotations.

only if the requests concern a pod that has Virtlet-specific annotation or an image that has Virtlet-specific prefix


docs/vmpod-lifecycle.md, line 19 at r2 (raw file):

## Lifecycle

### Pod starting procedure

VM Pod Startup


docs/vmpod-lifecycle.md, line 21 at r2 (raw file):

### Pod starting procedure

 * User (or some mechanism like autoscaller, deamonset controller) creates pod object.

A pod is created in Kubernetes cluster, either directly by the user or via some other mechanism such as a higher-level Kubernetes object managed by kube-controller-manager (ReplicaSet, DaemonSet etc.)


docs/vmpod-lifecycle.md, line 27 at r2 (raw file):

 * `kubelet` calls through [CRI](https://contributor.kubernetes.io/contributors/devel/container-runtime-interface/)
   the runtime service to create "sandbox" (enclosing container for all containers in pod definition),
   passing information about sandbox constraints/annotations (without actual info about containers).
  • kubelet invokes a CRI call RunPodSandbox to create the pod sandbox which will enclose all the containers in the pod definition. Note that at this point no information about the containers within the pod is passed to the call. kubelet can later request the information about the pod by means of PodSandboxStatus calls.

  • If there's a Virtlet-specific annotation kubernetes.io/target-runtime: virtlet.cloud, CRI proxy passes the call to Virtlet


docs/vmpod-lifecycle.md, line 32 at r2 (raw file):

   virtlet and pod network namespaces) [add this sandbox to your network](https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters)
   command on cni plugin according to the configuration in `/etc/cni/net.d` on the node
   (note: configuration from first in lexical order file in this directory, rest are ignored).

Virtlet saves sandbox metadata in its internal database, sets up the network namespace and then uses internal 'tapmanager' mechanism to invoke ADD operation via the CNI plugin as specified by the CNI configuration on the node.


docs/vmpod-lifecycle.md, line 35 at r2 (raw file):

 * Plugin does it job configuring interfaces/addressation/routes/iptables/et.c.
   then retuns to runtime [info](https://github.com/containernetworking/cni/blob/master/SPEC.md#result)
   about interfaces configured and their ip configuration.

The CNI plugin configures the network namespace by setting up network interfaces, IP addresses, routes, iptables rules and so on, and returns the network configuration information to the caller as described in the CNI spec.


docs/vmpod-lifecycle.md, line 42 at r2 (raw file):

   calls in memory, then returns it to main part of Virtlet. It stores this
   data in own metadata store - at this point pod sandbox has ip configuration
   which can be queried by `kubelet` calls for `PodSandboxStatus`.

Virtlet's tapmanager mechanism adjusts the configuration of the network namespace to make it work with the VM.

(note not to be included: I've mentioned PodSandboxStatus above, no need to repeat it here)


docs/vmpod-lifecycle.md, line 45 at r2 (raw file):

 * In parallel to call for sandbox creation, `kubelet` asks Virtlet (its image service)
   to download "container image" (in this case qcow2 image) as defined in container
   part of pod description.
  • after creating the sandbox, kubelet starts the containers defined in the pod sandbox. Currently, Virtlet supports just one container per VM pod. So, the VM pod startup steps after this one describe the startup of this single container.

  • Depending on the image pull police of the container, kubelet checks if the image needs to be pulled by means of ImageStatus call and then uses PullImage CRI call to pull the image if it doesn't exist or if imagePullPolicy: Always is used.


docs/vmpod-lifecycle.md, line 47 at r2 (raw file):

   part of pod description.
 * Virtlet resolves image location considering configuration of [image name translation](https://github.com/Mirantis/virtlet/blob/master/docs/image-name-translation.md),
   downloads file for this location and stores it in libvirt images store.

Note not to be included: Virtlet presently doesn't use libvirt image store, using it's own one instead.


docs/vmpod-lifecycle.md, line 51 at r2 (raw file):

   (what is verified by `kubelet` by subsequent calls to Virtlet image service),
   `kubelet` asks Virtlet to create in particular sandbox container from downloaded image
   (note: Virtlet alows only for single "container"/vm in single pod sandbox).
  • After the image is ready (no pull was needed or the PullImage call completed successfully), kubelet uses CreateContainer CRI call to create the container in the pod sandbox using the specified image.

Note not to be included: single container per pod is mentioned earlier


docs/vmpod-lifecycle.md, line 56 at r2 (raw file):

   them in the same time) - without any networking and with emulator path
   set to [`vmwrapper`](https://github.com/Mirantis/virtlet/tree/master/cmd/vmwrapper),
   instead of default `qemu`.

Virtlet uses the sandbox and container metadata to generate libvirt domain definition, using vmwrapper binary as the emulator and without specifying any network configuration in the domain.


docs/vmpod-lifecycle.md, line 57 at r2 (raw file):

   set to [`vmwrapper`](https://github.com/Mirantis/virtlet/tree/master/cmd/vmwrapper),
   instead of default `qemu`.
 * When foregoing call finishes `kubelet` calls runtime to start previously created container.

After CreateContainer call completes, kubelet invokes StartContainer call on the newly created container.


docs/vmpod-lifecycle.md, line 64 at r2 (raw file):

   command line parameters, after switching it's network namespace to PodSandbox
   namespace it execs to `qemu` with new set of command line parameters.
   At this time they also include info about network devices.

Virtlet starts the libvirt domain. libvirt invokes vmwrapper as the emulator, passing it the necessary command line arguments as well as environment variables set by Virtlet. vmwrapper uses the environment variable values passed to Virtlet to communicate with tapmanager over an Unix domain socket, retrieving a file descriptor for a tap device set up by tapmanager. tapmanager uses its own simple protocol to communicate with vmwrapper because it needs to send file descriptors over the socket. This is not usually supported by RPC libraries, see e.g. grpc/grpc#11417 vmwrapper then updated the command line arguments to include the network interface information and execs the actual emulator (qemu).


docs/vmpod-lifecycle.md, line 66 at r2 (raw file):

   At this time they also include info about network devices.

At this point - VM is running and visible, pod is operating in running state, same with "container"/VM.

At this point the VM is running and accessible via the network, and the pod is in Running state as well as it's only container.


docs/vmpod-lifecycle.md, line 68 at r2 (raw file):

At this point - VM is running and visible, pod is operating in running state, same with "container"/VM.

### Pod removal procedure

Deleting the pod


docs/vmpod-lifecycle.md, line 70 at r2 (raw file):

### Pod removal procedure

This part is ignited by user/machinery call to delete pod (`kubectl delete` pod or e.g. autoscaller force for scale down replica set):

This sequence is initiated when the pod is deleted, either by means of kubectl delete or a controller manager action due to deletion or downscaling of a higher-level object.


docs/vmpod-lifecycle.md, line 72 at r2 (raw file):

This part is ignited by user/machinery call to delete pod (`kubectl delete` pod or e.g. autoscaller force for scale down replica set):

 * Call to `apiserver` is forwarded to particular node/`kubelet` controlling particular pod

kubelet notices the pod being deleted


docs/vmpod-lifecycle.md, line 73 at r2 (raw file):

 * Call to `apiserver` is forwarded to particular node/`kubelet` controlling particular pod
 * `kubelet` calls runtime to stop container

kubelet invokes StopContainer CRI calls which is getting forwared to Virtlet based on the containing pod sandbox annotations.


docs/vmpod-lifecycle.md, line 77 at r2 (raw file):

   `qemu` is finishing it's job in some time (if it does not do that in first
   place in reasonable time - there is forcible kill called by Virtlet through
   libvirt)

Virtlet stops the libvirt domain. libvirt sends a signal to qemu, which starts the shutdown. If it doesn't quit in a reasonable time determined by pod's termination grace period, Virtlet will forcibly terminate the domain, thus killing the qemu process.


docs/vmpod-lifecycle.md, line 79 at r2 (raw file):

   libvirt)
 * `kubelet` checks status of "container"/VM subsequetially and when all
   containers are down - it calls runtime to `StopPodSandbox`.

After all the containers in the pod (the single container in case of Virtlet VM pod) are stopped, kubelet invokes StopPodSandbox CRI call.


docs/vmpod-lifecycle.md, line 81 at r2 (raw file):

   containers are down - it calls runtime to `StopPodSandbox`.
 * During this call Virtlet calls (using `tapmanager`) cni plugin to remove pod
   from network.

Virtlet asks its tapmanager to remove pod from the network by means of CNI DEL command.


docs/vmpod-lifecycle.md, line 84 at r2 (raw file):

 * `kubelet` checks status of pod sandbox subsequetially and when it notices
   that it's in stopped state, after some time (which is not constant)
   it calls Virtlet to garbage collect `PodSandbox`.
  • aftter StopPodSandbox returns, the pod sandbox will be eventually GC'd by kubelet by means of RemovePodSandbox CRI call

docs/vmpod-lifecycle.md, line 85 at r2 (raw file):

   that it's in stopped state, after some time (which is not constant)
   it calls Virtlet to garbage collect `PodSandbox`.
 * During that call Virtlet cleanups it's metadata about `PodSandbox`.
  • Upon RemovePodSandbox, Virtlet removes the pod metadata from its internal database.

Copy link
Contributor Author

@jellonek jellonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 change requests, 1 of 2 approvals obtained (waiting on @pigmej, @ivan4th, and @jellonek)


docs/vmpod-lifecycle.md, line 1 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

of a VM pod

Done.


docs/vmpod-lifecycle.md, line 4 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

describes the lifecycle of a VM pod managed by Virtlet

Done.


docs/vmpod-lifecycle.md, line 10 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

This description omits the details of volume setup (using flexvolumes), handling of logs, the VM console and port forwarding (done by streaming server

Done.


docs/vmpod-lifecycle.md, line 15 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

only if the requests concern a pod that has Virtlet-specific annotation or an image that has Virtlet-specific prefix

Done.


docs/vmpod-lifecycle.md, line 19 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

VM Pod Startup

Done.


docs/vmpod-lifecycle.md, line 21 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

A pod is created in Kubernetes cluster, either directly by the user or via some other mechanism such as a higher-level Kubernetes object managed by kube-controller-manager (ReplicaSet, DaemonSet etc.)

Done.


docs/vmpod-lifecycle.md, line 27 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…
  • kubelet invokes a CRI call RunPodSandbox to create the pod sandbox which will enclose all the containers in the pod definition. Note that at this point no information about the containers within the pod is passed to the call. kubelet can later request the information about the pod by means of PodSandboxStatus calls.

  • If there's a Virtlet-specific annotation kubernetes.io/target-runtime: virtlet.cloud, CRI proxy passes the call to Virtlet

Done.


docs/vmpod-lifecycle.md, line 32 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet saves sandbox metadata in its internal database, sets up the network namespace and then uses internal 'tapmanager' mechanism to invoke ADD operation via the CNI plugin as specified by the CNI configuration on the node.

Done.


docs/vmpod-lifecycle.md, line 35 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

The CNI plugin configures the network namespace by setting up network interfaces, IP addresses, routes, iptables rules and so on, and returns the network configuration information to the caller as described in the CNI spec.

Done.


docs/vmpod-lifecycle.md, line 42 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet's tapmanager mechanism adjusts the configuration of the network namespace to make it work with the VM.

(note not to be included: I've mentioned PodSandboxStatus above, no need to repeat it here)

Done.


docs/vmpod-lifecycle.md, line 51 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…
  • After the image is ready (no pull was needed or the PullImage call completed successfully), kubelet uses CreateContainer CRI call to create the container in the pod sandbox using the specified image.

Note not to be included: single container per pod is mentioned earlier

Done.


docs/vmpod-lifecycle.md, line 56 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet uses the sandbox and container metadata to generate libvirt domain definition, using vmwrapper binary as the emulator and without specifying any network configuration in the domain.

Done.


docs/vmpod-lifecycle.md, line 57 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

After CreateContainer call completes, kubelet invokes StartContainer call on the newly created container.

Done.


docs/vmpod-lifecycle.md, line 64 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet starts the libvirt domain. libvirt invokes vmwrapper as the emulator, passing it the necessary command line arguments as well as environment variables set by Virtlet. vmwrapper uses the environment variable values passed to Virtlet to communicate with tapmanager over an Unix domain socket, retrieving a file descriptor for a tap device set up by tapmanager. tapmanager uses its own simple protocol to communicate with vmwrapper because it needs to send file descriptors over the socket. This is not usually supported by RPC libraries, see e.g. grpc/grpc#11417 vmwrapper then updated the command line arguments to include the network interface information and execs the actual emulator (qemu).

Done.
With additional info about SR-IOV.
Still this description uses notation of single interface, while it should talk about one or more interfaces.


docs/vmpod-lifecycle.md, line 66 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

At this point the VM is running and accessible via the network, and the pod is in Running state as well as it's only container.

Done.


docs/vmpod-lifecycle.md, line 70 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

This sequence is initiated when the pod is deleted, either by means of kubectl delete or a controller manager action due to deletion or downscaling of a higher-level object.

Done.


docs/vmpod-lifecycle.md, line 72 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

kubelet notices the pod being deleted

Done.


docs/vmpod-lifecycle.md, line 73 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

kubelet invokes StopContainer CRI calls which is getting forwared to Virtlet based on the containing pod sandbox annotations.

Done.


docs/vmpod-lifecycle.md, line 77 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet stops the libvirt domain. libvirt sends a signal to qemu, which starts the shutdown. If it doesn't quit in a reasonable time determined by pod's termination grace period, Virtlet will forcibly terminate the domain, thus killing the qemu process.

Done.


docs/vmpod-lifecycle.md, line 81 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…

Virtlet asks its tapmanager to remove pod from the network by means of CNI DEL command.

Done.


docs/vmpod-lifecycle.md, line 84 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…
  • aftter StopPodSandbox returns, the pod sandbox will be eventually GC'd by kubelet by means of RemovePodSandbox CRI call

Done.


docs/vmpod-lifecycle.md, line 85 at r2 (raw file):

Previously, ivan4th (Ivan Shvedunov) wrote…
  • Upon RemovePodSandbox, Virtlet removes the pod metadata from its internal database.

Done.

Copy link
Contributor

@ivan4th ivan4th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 1 change requests, 1 of 2 approvals obtained (waiting on @pigmej and @jellonek)

@ivan4th ivan4th merged commit 5b53a91 into master Sep 11, 2018
@ivan4th ivan4th deleted the jell/vmpodlc branch September 11, 2018 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants