Skip to content

Commit

Permalink
feat: add support for traffic router plugins (#2573)
Browse files Browse the repository at this point in the history
* feat: add support for traffic router plugins

Signed-off-by: zachaller <[email protected]>

* finish up refactor

Signed-off-by: zachaller <[email protected]>

* codegen

Signed-off-by: zachaller <[email protected]>

* update docs

Signed-off-by: zachaller <[email protected]>

* rename config field

Signed-off-by: zachaller <[email protected]>

* refactor tests

Signed-off-by: zachaller <[email protected]>

* add godocs

Signed-off-by: zachaller <[email protected]>

* add docs on creating plugins

Signed-off-by: zachaller <[email protected]>

* rename config fields

Signed-off-by: zachaller <[email protected]>

* Change New function to Init for tr

Signed-off-by: zachaller <[email protected]>

* Change New function to Init for metrics

Signed-off-by: zachaller <[email protected]>

* docs update

Signed-off-by: zachaller <[email protected]>

* docs update

Signed-off-by: zachaller <[email protected]>

* codegen

Signed-off-by: zachaller <[email protected]>

* change repo name

Signed-off-by: zachaller <[email protected]>

* small docs changes

Signed-off-by: zachaller <[email protected]>

* fix bad merge comments

Signed-off-by: zachaller <[email protected]>

* remove metric passing from metrics plugin on Init method

Signed-off-by: zachaller <[email protected]>

* fix mutex

Signed-off-by: zachaller <[email protected]>

* wrap errors

Signed-off-by: zachaller <[email protected]>

* rename

Signed-off-by: zachaller <[email protected]>

* docs change

Signed-off-by: zachaller <[email protected]>

* codegen

Signed-off-by: zachaller <[email protected]>

* some updates to docs

Signed-off-by: zachaller <[email protected]>

* change plugin to plugins for tr

Signed-off-by: zachaller <[email protected]>

* change plugin to plugins for tr

Signed-off-by: zachaller <[email protected]>

* refactor naming for metric plugins

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* change handshake

Signed-off-by: zachaller <[email protected]>

* more renames

Signed-off-by: zachaller <[email protected]>

* change handshake

Signed-off-by: zachaller <[email protected]>

* add err context

Signed-off-by: zachaller <[email protected]>

* lint

Signed-off-by: zachaller <[email protected]>

* small docs change

Signed-off-by: zachaller <[email protected]>

* docs update from pr review

Signed-off-by: zachaller <[email protected]>

* updates from review

Signed-off-by: zachaller <[email protected]>

* change config map format

Signed-off-by: zachaller <[email protected]>

* update docs

Signed-off-by: zachaller <[email protected]>

* add context to error

Signed-off-by: zachaller <[email protected]>

* add context to error

Signed-off-by: zachaller <[email protected]>

* add context to errors ans well as wrap the *bool returned by verifiy weight

Signed-off-by: zachaller <[email protected]>

* update docs for new interface type

Signed-off-by: zachaller <[email protected]>

* change error wraping for init

Signed-off-by: zachaller <[email protected]>

---------

Signed-off-by: zachaller <[email protected]>
  • Loading branch information
zachaller authored Mar 2, 2023
1 parent 535f244 commit b787cc5
Show file tree
Hide file tree
Showing 38 changed files with 2,673 additions and 893 deletions.
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -286,3 +286,7 @@ checksums:
build-sample-metric-plugin-debug:
go build -gcflags="all=-N -l" -o metric-plugin test/cmd/sample-metrics-plugin/main.go

.PHONY: build-sample-traffic-plugin-debug
build-sample-traffic-plugin-debug:
go build -gcflags="all=-N -l" -o traffic-plugin test/cmd/sample-trafficrouter-plugin/main.go

31 changes: 13 additions & 18 deletions docs/analysis/plugins.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Metric Plugins

!!! important Available since v1.5
!!! important Available since v1.5 - Status: Alpha

Argo Rollouts supports getting analysis metrics via 3rd party plugin system. This allows users to extend the capabilities of Rollouts
to support metric providers that are not natively supported. Rollout's uses a plugin library called
[go-plugin](https://github.com/hashicorp/go-plugin) to do this. You can find a sample plugin
here: [sample-rollouts-metric-plugin](https://github.com/argoproj-labs/sample-rollouts-metric-plugin)
here: [rollouts-sample_prometheus-metric-plugin](https://github.com/argoproj-labs/rollouts-sample_prometheus-metric-plugin)

## Using a Metric Plugin

Expand All @@ -14,28 +14,24 @@ into the rollouts controller container. The second method is to use a HTTP(S) se

### Mounting the plugin executable into the rollouts controller container

To use this method, you will need to build or download the plugin executable and then mount it into the rollouts controller container.
The plugin executable must be mounted into the rollouts controller container at the path specified by the `--metric-plugin-location` flag.

There are a few ways to mount the plugin executable into the rollouts controller container. Some of these will depend on your
particular infrastructure. Here are a few methods:

* Using an init container to download the plugin executable
* Using a Kubernetes volume mount with a shared volume such as NFS, EBS, etc.
* Building the plugin into the rollouts controller container

Then you can use the configmap to point to the plugin executable. Example:
Then you can use the configmap to point to the plugin executable file location. Example:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argo-rollouts-config
data:
plugins: |-
metrics:
- name: "prometheus" # name the plugin uses to find this configuration, it must match the name required by the plugin
pluginLocation: "file://./my-custom-plugin" # supports http(s):// urls and file://
metricProviderPlugins: |-
- name: "argoproj-labs/sample-prometheus" # name of the plugin, it must match the name required by the plugin so it can find it's configuration
location: "file://./my-custom-plugin" # supports http(s):// urls and file://
```
### Using a HTTP(S) server to host the plugin executable
Expand All @@ -49,11 +45,10 @@ kind: ConfigMap
metadata:
name: argo-rollouts-config
data:
plugins: |-
metrics:
- name: "prometheus" # name the plugin uses to find this configuration, it must match the name required by the plugin
pluginLocation: "https://github.com/argoproj-labs/sample-rollouts-metric-plugin/releases/download/v0.0.3/metric-plugin-linux-amd64" # supports http(s):// urls and file://
pluginSha256: "08f588b1c799a37bbe8d0fc74cc1b1492dd70b2c" #optional sha256 checksum of the plugin executable
metricProviderPlugins: |-
- name: "argoproj-labs/sample-prometheus" # name of the plugin, it must match the name required by the plugin so it can find it's configuration
location: "https://github.com/argoproj-labs/rollouts-sample_prometheus-metric-plugin/releases/download/v0.0.4/metric-plugin-linux-amd64" # supports http(s):// urls and file://
sha256: "dac10cbf57633c9832a17f8c27d2ca34aa97dd3d" #optional sha256 checksum of the plugin executable
```

## Some words of caution
Expand All @@ -66,13 +61,13 @@ the server hosting the plugin is available again.

Argo Rollouts will download the plugin at startup only once but if the pod is deleted it will need to download the plugin again on next startup. Running
Argo Rollouts in HA mode can help a little with this situation because each pod will download the plugin at startup. So if a single pod gets
deleted during a server outage, the other pods will still be able to take over because there will already be a plugin executable available to it. However,
it is up to you to define your risk for and decide how you want to install the plugin executable.
deleted during a server outage, the other pods will still be able to take over because there will already be a plugin executable available to it. It is the
responsibility of the Argo Rollouts administrator to define the plugin installation method considering the risks of each approach.

## List of Available Plugins (alphabetical order)

#### Add Your Plugin Here
* If you have created a plugin, please submit a PR to add it to this list.
#### [sample-rollouts-metric-plugin](https://github.com/argoproj-labs/sample-rollouts-metric-plugin)
#### [rollouts-sample_prometheus-metric-plugin](https://github.com/argoproj-labs/rollouts-sample_prometheus-metric-plugin)
* This is just a sample plugin that can be used as a starting point for creating your own plugin.
It is not meant to be used in production. It is based on the built-in prometheus provider.
73 changes: 73 additions & 0 deletions docs/features/traffic-management/plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Traffic Router Plugins

!!! important Available since v1.5 - Status: Alpha

Argo Rollouts supports getting analysis metrics via 3rd party plugin system. This allows users to extend the capabilities of Rollouts
to support metric providers that are not natively supported. Rollout's uses a plugin library called
[go-plugin](https://github.com/hashicorp/go-plugin) to do this. You can find a sample plugin
here: [rollouts-sample_nginx-trafficrouter-plugin](https://github.com/argoproj-labs/rollouts-sample_nginx-trafficrouter-plugin)

## Using a Traffic Router Plugin

There are two methods of installing and using an argo rollouts plugin. The first method is to mount up the plugin executable
into the rollouts controller container. The second method is to use a HTTP(S) server to host the plugin executable.

### Mounting the plugin executable into the rollouts controller container

There are a few ways to mount the plugin executable into the rollouts controller container. Some of these will depend on your
particular infrastructure. Here are a few methods:

* Using an init container to download the plugin executable
* Using a Kubernetes volume mount with a shared volume such as NFS, EBS, etc.
* Building the plugin into the rollouts controller container

Then you can use the configmap to point to the plugin executable file location. Example:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argo-rollouts-config
data:
trafficRouterPlugins: |-
- name: "argoproj-labs/sample-nginx" # name of the plugin, it must match the name required by the plugin so it can find it's configuration
location: "file://./my-custom-plugin" # supports http(s):// urls and file://
```
### Using a HTTP(S) server to host the plugin executable
Argo Rollouts supports downloading the plugin executable from a HTTP(S) server. To use this method, you will need to
configure the controller via the `argo-rollouts-config` configmap and set `pluginLocation` to a http(s) url. Example:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argo-rollouts-config
data:
trafficRouterPlugins: |-
- name: "argoproj-labs/sample-nginx" # name of the plugin, it must match the name required by the plugin so it can find it's configuration
location: "https://github.com/argoproj-labs/rollouts-sample_nginx-trafficrouter-plugin/releases/download/v0.0.1/metric-plugin-linux-amd64" # supports http(s):// urls and file://
sha256: "08f588b1c799a37bbe8d0fc74cc1b1492dd70b2c" #optional sha256 checksum of the plugin executable
```

## Some words of caution

Depending on which method you use to install and the plugin, there are some things to be aware of.
The rollouts controller will not start if it can not download or find the plugin executable. This means that if you are using
a method of installation that requires a download of the plugin and the server hosting the plugin for some reason is not available and the rollouts
controllers pod got deleted while the server was down or is coming up for the first time, it will not be able to start until
the server hosting the plugin is available again.

Argo Rollouts will download the plugin at startup only once but if the pod is deleted it will need to download the plugin again on next startup. Running
Argo Rollouts in HA mode can help a little with this situation because each pod will download the plugin at startup. So if a single pod gets
deleted during a server outage, the other pods will still be able to take over because there will already be a plugin executable available to it. It is the
responsibility of the Argo Rollouts administrator to define the plugin installation method considering the risks of each approach.

## List of Available Plugins (alphabetical order)

#### Add Your Plugin Here
* If you have created a plugin, please submit a PR to add it to this list.
#### [rollouts-sample_nginx-trafficrouter-plugin](https://github.com/argoproj-labs/rollouts-sample_nginx-trafficrouter-plugin)
* This is just a sample plugin that can be used as a starting point for creating your own plugin.
It is not meant to be used in production. It is based on the built-in prometheus provider.
155 changes: 155 additions & 0 deletions docs/plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Creating an Argo Rollouts Plugin

## High Level Overview

Argo Rollouts plugins depend on hashicorp's [go-plugin](https://github.com/hashicorp/go-plugin) library. This library
provides a way for a plugin to be compiled as a standalone executable and then loaded by the rollouts controller at runtime.
This works by having the plugin executable act as a rpc server and the rollouts controller act as a client. The plugin executable
is started by the rollouts controller and is a long-lived process and that the rollouts controller connects to over a unix socket.
The communication protocol uses golang built in net/rpc library so plugins have to be written in golang.

## Plugin Repository

In order to get plugins listed in the main argo rollouts documentation we ask that the plugin repository be created under
the [argoproj-labs](https://github.com/argoproj-labs) organization. Please open an issue under argo-rollouts requesting a
repo which you would be granted admin access on.

There is also a standard naming convention for plugin names used for configmap registration, as well as what the plugin
uses for locating its specific configuration on rollout or analysis resources. The name needs to be in the form of
`<namespace>/<name>` and both <namespace> and <name> have a regular expression check that matches Github's requirements
for `username/org` and `repository name`. This requirement is in place to help with allowing multiple creators of the same plugin
types to exist such as `<org1>/nginx` and `<org2>/nginx`. These names could be based of the repo name such
as `argoproj-labs/rollouts-sample_prometheus-metric-plugin` but it is not a requirement.

There will also be a standard for naming repositories under argoproj-labs in the form of `rollouts-<tool>-<type>-plugin`
where `<type>` is say `metric`, or `trafficrouter` and `<tool>` is the software the plugin is for say nginx.

## Plugin Name

So now that we have an idea on plugin naming and repository standards let's pick a name to use for the rest of this
documentation and call our plugin `argoproj-labs/nginx`.

This name will be used in a few different spots the first is the config map that your plugin users will need to configure.
It looks like this below.

```yaml
kind: ConfigMap
metadata:
name: argo-rollouts-config
data:
metricProviderPlugins: |-
- name: "argoproj-labs/metrics"
location: "file:///tmp/argo-rollouts/metric-plugin"
trafficRouterPlugins: |-
- name: "argoproj-labs/nginx"
location: "file:///tmp/argo-rollouts/traffic-plugin"
```
As you can see there is a field called `name:` under both `metrics` or `trafficrouters` this is the first place where your
end users will need to configure the name of the plugin. The second location is either in the rollout object or the analysis
template which you can see the examples below.

#### AnalysisTemplate Example
```yaml
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
metrics:
- name: success-rate
...
provider:
plugin:
argoproj-labs/metrics:
address: http://prometheus.local
```

#### Traffic Router Example
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: example-plugin-ro
spec:
strategy:
canary:
canaryService: example-plugin-ro-canary-analysis
stableService: example-plugin-ro-stable-analysis
trafficRouting:
plugins:
argoproj-labs/nginx:
stableIngress: canary-demo
```

You can see that we use the plugin name under `spec.metrics[].provider.plugin` for analysis template and `spec.strategy.canary.trafficRouting.plugins`
for traffic routers. You as a plugin author can then put any configuration you need under `argoproj-labs/nginx` and you will be able to
look up that config in your plugin via the plugin name key. You will also want to document what configuration options your plugin supports.

## Plugin Interfaces

Argo Rollouts currently supports two plugin systems as a plugin author your end goal is to implement these interfaces as
a hashicorp go-plugin. The two interfaces are `MetricsPlugin` and `TrafficRouterPlugin` for each of the respective plugins:

```go
type MetricProviderPlugin interface {
// InitPlugin initializes the traffic router plugin this gets called once when the plugin is loaded.
InitPlugin() RpcError
// Run start a new external system call for a measurement
// Should be idempotent and do nothing if a call has already been started
Run(*v1alpha1.AnalysisRun, v1alpha1.Metric) v1alpha1.Measurement
// Resume Checks if the external system call is finished and returns the current measurement
Resume(*v1alpha1.AnalysisRun, v1alpha1.Metric, v1alpha1.Measurement) v1alpha1.Measurement
// Terminate will terminate an in-progress measurement
Terminate(*v1alpha1.AnalysisRun, v1alpha1.Metric, v1alpha1.Measurement) v1alpha1.Measurement
// GarbageCollect is used to garbage collect completed measurements to the specified limit
GarbageCollect(*v1alpha1.AnalysisRun, v1alpha1.Metric, int) RpcError
// Type gets the provider type
Type() string
// GetMetadata returns any additional metadata which providers need to store/display as part
// of the metric result. For example, Prometheus uses is to store the final resolved queries.
GetMetadata(metric v1alpha1.Metric) map[string]string
}
type TrafficRouterPlugin interface {
// InitPlugin initializes the traffic router plugin this gets called once when the plugin is loaded.
InitPlugin() RpcError
// UpdateHash informs a traffic routing reconciler about new canary, stable, and additionalDestination(s) pod hashes
UpdateHash(rollout *v1alpha1.Rollout, canaryHash, stableHash string, additionalDestinations []v1alpha1.WeightDestination) RpcError
// SetWeight sets the canary weight to the desired weight
SetWeight(rollout *v1alpha1.Rollout, desiredWeight int32, additionalDestinations []v1alpha1.WeightDestination) RpcError
// SetHeaderRoute sets the header routing step
SetHeaderRoute(rollout *v1alpha1.Rollout, setHeaderRoute *v1alpha1.SetHeaderRoute) RpcError
// SetMirrorRoute sets up the traffic router to mirror traffic to a service
SetMirrorRoute(rollout *v1alpha1.Rollout, setMirrorRoute *v1alpha1.SetMirrorRoute) RpcError
// VerifyWeight returns true if the canary is at the desired weight and additionalDestinations are at the weights specified
// Returns nil if weight verification is not supported or not applicable
VerifyWeight(rollout *v1alpha1.Rollout, desiredWeight int32, additionalDestinations []v1alpha1.WeightDestination) (RpcVerified, RpcError)
// RemoveManagedRoutes Removes all routes that are managed by rollouts by looking at spec.strategy.canary.trafficRouting.managedRoutes
RemoveManagedRoutes(ro *v1alpha1.Rollout) RpcError
// Type returns the type of the traffic routing reconciler
Type() string
}
```

## Plugin Init Function

Each plugin interface has a `InitPlugin` function, this function is called when the plugin is first started up and is only called
once per startup. The `InitPlugin` function is used as a means to initialize the plugin it gives you the plugin author the ability
to either set up a client for a specific metrics provider or in the case of a traffic router construct a client or informer
for kubernetes api. The one thing to note about this though is because these calls happen over RPC the plugin author should
not depend on state being stored in the plugin struct as it will not be persisted between calls.

## Kubernetes RBAC

The plugin runs as a child process of the rollouts controller and as such it will inherit the same RBAC permissions as the
controller. This means that the service account for the rollouts controller will need the correct permissions for the plugin
to function. This might mean instructing users to create a role and role binding to the standard rollouts service account
for the plugin to use. This will probably affect traffic router plugins more than metrics plugins.

## Sample Plugins

There are two sample plugins within the argo-rollouts repo that you can use as a reference for creating your own plugin.

* [Sample Metrics Plugin](https://github.com/argoproj/argo-rollouts/tree/master/test/cmd/sample-metrics-plugin)
* [Sample Traffic Router Plugin](https://github.com/argoproj/argo-rollouts/tree/master/test/cmd/sample-trafficrouter-plugin)
3 changes: 3 additions & 0 deletions manifests/crds/rollout-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,9 @@ spec:
required:
- stableIngress
type: object
plugins:
type: object
x-kubernetes-preserve-unknown-fields: true
smi:
properties:
rootService:
Expand Down
3 changes: 3 additions & 0 deletions manifests/install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11908,6 +11908,9 @@ spec:
required:
- stableIngress
type: object
plugins:
type: object
x-kubernetes-preserve-unknown-fields: true
smi:
properties:
rootService:
Expand Down
5 changes: 4 additions & 1 deletion metricproviders/metricproviders.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,10 @@ func (f *ProviderFactory) NewProvider(logCtx log.Entry, metric v1alpha1.Metric)
return skywalking.NewSkyWalkingProvider(client, logCtx), nil
case plugin.ProviderType:
plugin, err := plugin.NewRpcPlugin(metric)
return plugin, err
if err != nil {
return nil, fmt.Errorf("failed to create plugin: %v", err)
}
return plugin, nil
default:
return nil, fmt.Errorf("no valid provider in metric '%s'", metric.Name)
}
Expand Down
Loading

0 comments on commit b787cc5

Please sign in to comment.