-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIMO] Move cluster certificate functionality to ClientHelper #3736
Conversation
1895b7b
to
4c797bc
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
ClientHelper.New
is never used. I think we can kill it without trouble.- Can we collapse the
New
stuff in this so we have private constructors and a publicGetClient
that builds or creates a client as needed. Basically, should this be a singleton? - General ask to add some more descriptive naming. Can we say what kind of client it is? It think
k8sClient
but I'm not sure. Maybe a better name forclientHelper
isAROK8sClient
i.e. our wrapped version of the k8s client.
pkg/cluster/install.go
Outdated
return err | ||
} | ||
|
||
// initializeKubernetesClients initializes clients which are used | ||
// once the cluster is up later on in the install process. | ||
func (m *manager) initializeOperatorDeployer(ctx context.Context) (err error) { | ||
m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.client, m.extensionscli, m.kubernetescli, m.operatorcli) | ||
m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.ch.Client(), m.extensionscli, m.kubernetescli, m.operatorcli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping the client into the helper only to ask the helper to give the client right back is awkward. Either the client helper closes over the client, or hold on to both values, IMO. The in-between seems like an incorrect abstraction that closes over nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. My question: is there a good reason to do so here that we are not aware of ?
I would simply make the client public instead of providing a getter as this indirect access doesn't bring much value (specially when we have multiple stuff that is called "Client", methods, struct fields, etc) IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it a bit, and I've made it close over the whole client instead, since that was pretty easy. This means that the places that we currently use a controller-runtime client, we can upgrade it to the clienthelper fairly easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hawkowl for this, I like that! 🙂
I just left some comments in case any make sense to you, but not a blocker.
pkg/cluster/apply.go
Outdated
utilpem "github.com/Azure/ARO-RP/pkg/util/pem" | ||
) | ||
|
||
func EnsureTLSSecretFromKeyvault(ctx context.Context, env env.Interface, ch clienthelper.Interface, target types.NamespacedName, certificateName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same direction as my previous comment and talking about the ch clienthelper.Interface parameter... there is something that I have seen multiple times in the ARO-RP that I will consider a bad pattern. We tend to pass big interfaces where just one of the methods of the interface is used, something pretty common in our beloved friend Java but that can be easily avoided in Go.
In those cases, I think it would be a good idea to define just an interface with the method needed in this function and just update the header of the _ EnsureTLSSecretFromKeyvault_ to use that new one-method-interface, that is one of the beauties of implicit interfaces in Go: we can define an Interface IN THE CONSUMER package, not the implementer so we fulfill the requirement for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've altered the code to have Reader and Writer interfaces, similar to the controller-runtime client. This should let us define things a bit nicer if we want to restrict down the interface. I'm not too eager to narrow down the interfaces more than this, though.
I think that making single-method interfaces (especially defining them in consumers), is overly restricting and can cause a lot of work if any of those interfaces need to change in the future. Every time I've gone to use it, I've wished I hadn't later on. More granular interfaces from the provider can mitigate some of this duplication, but I'm also not sure how often we can get away with having super-granular interfaces. A Reader/Writer should hopefully do for now.
I also don't think it makes code easier to test when we have verified fakes (e.g. the controller-runtime client). It does make it easier to deal with when we're writing mocks for interfaces, since the mocks (and what the code might call) are therefore smaller, but I firmly believe that mocks are an anti-pattern and that even big interfaces like env.Interface
would be a lot more tolerable if designed with a proper fake in parallel.
|
||
var cb []byte | ||
for _, cert := range certs { | ||
cb = append(cb, pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: cert.Raw})...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super opinionated and optional: could we please move the line
pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: cert.Raw})...
to an aux variable for readability.
Type: corev1.SecretTypeTLS, | ||
} | ||
|
||
return ch.Ensure(ctx, secret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned, as we just use the Ensure method of ch, maybe we could define here an interface with just this method to stick to the Interface Segregation Principle.
pkg/cluster/install.go
Outdated
return err | ||
} | ||
|
||
// initializeKubernetesClients initializes clients which are used | ||
// once the cluster is up later on in the install process. | ||
func (m *manager) initializeOperatorDeployer(ctx context.Context) (err error) { | ||
m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.client, m.extensionscli, m.kubernetescli, m.operatorcli) | ||
m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.ch.Client(), m.extensionscli, m.kubernetescli, m.operatorcli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. My question: is there a good reason to do so here that we are not aware of ?
I would simply make the client public instead of providing a getter as this indirect access doesn't bring much value (specially when we have multiple stuff that is called "Client", methods, struct fields, etc) IMHO.
4c797bc
to
5aefdaf
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
EnsureDeleted(ctx context.Context, gvk schema.GroupVersionKind, key types.NamespacedName) error | ||
type Writer interface { | ||
client.Writer | ||
// Ensure applies self-contained objects to a Kubernetes API, merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're not changing this in the PR, so this is not a blocker here, but we should use server-side apply instead of whatever client-side logic exists, as it's guaranteed to be a better implementation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to use server-side apply, but the fake implementation of controller-runtime doesn't yet support it :(
Refs:
- Support server-side apply (client.Apply) in fake Client kubernetes-sigs/controller-runtime#2341
- Add field management support to fake client-go typed client kubernetes/kubernetes#125560 -- the groundwork does look like it's beginning to be there :)
- ⚠ Client Server side apply kubernetes-sigs/controller-runtime#2702 hi @troy0820 fancy seeing you there :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comment, just waving at @hawkowl 👋🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the controller-runtime fakes are horrendous and IMO moving away from them as soon as possible is also a great boon to the project :)
SSA is very complex and faking it is likely not going to happen - not even just in some soon time-frame, but perhaps ever. Client fakes have fallen out of favor and everyone has a horror story of working around the mess that is envtest without gaining any confidence in their application functioning in the real world against a real API server. Even past that, the concerns Alvaro has in Troy's PR are a mountain of work that the upstream SIGs are not interested in taking up or sponsoring, from my understanding.
This is neither here nor there, but you might want to consider not putting SSA support in the controller-runtime fakes as a precondition to using this (amazingly useful and beneficial) technology in your production deployments.
See an example of how to test controllers or similar using static data in, static data out here: https://github.com/openshift/hypershift/blob/main/control-plane-pki-operator/certificaterevocationcontroller/certificaterevocationcontroller_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue beyond the requisite work are the apply types that are not scaffold with crd/operator-sdk/kubebuilder. That work is a side effect of the client (controller-runtime) not implementing that whereas client-go has that necessary logic.
The interface client being used here is the mechanism where that can't be utilized. I have no context around what this is trying to accomplish but if it is indeed using corev1 types where a kubernetes.Interface
can be used, those applyConfigurations are available to fulfil the server side apply.
The faking of the clients are tracked with a "tracker" and trying to track a server side apply in any client would be trying to hold state for things it wouldn't worry about because of the client's responsibility. Having a test suite where that can happen seems outside of the scope of what the client/fakes could do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have worked with the controller-runtime
ecosystem and upstream Kubernetes client-sets for many years, so I'm pretty familiar with the structure. I don't think I communicated the "static data in, static data out" link well - my suggestion is not that some client helper hold on to apply configurations, nor is it to somehow teach the controller-runtime
client how to apply.
In case we get too far in the weeds - I just want to re-iterate that the comments here are not review comments for this PR, just a general comment. I'm not suggesting any changes are made right now.
What I am suggesting, though, is that operator reconciliation loops are written as pure functions - either take some set of cluster state as input, or allow dependency injection to take producers of cluster state as input, and produce state as output. Apply configurations have a wonderful property in that they encode your intent in static data - so, testing your lower-case r
reconcile()
is straightforward, and no mocks of any persuasion are necessary.
The top-level Reconcile()
might look like:
func (r *reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// assemble the data necessary by using listers, etc
input := r.getInput()
// transform input state into intended state
intent, err := r.reconcile(input)
// realize intended state onto the cluster
return r.execute(intent)
}
Where r.execute(intent)
takes apply configurations and uses a kubernetes.Clientset
to call Apply
.
In my experience, relying heavily on mocks leads many teams to stumble into deeply frustrating issues, often many years down the line. Examples I've seen:
- implementing highly-performant caching using metadata-only watches requires 100% of tests to be rewritten, as the fakes don't know how to propagate data between the metadata fake and the "real" fake
- simplifying controller code to reduce maintenance burden and eliminate incorrect client-side patching by using server-side apply means all unit tests fail
- fragile assumptions about the number of times your reconciliation is called and when, in what order, etc, lead integration tests to be a poor substitute for end-to-end testing against actual API servers, leading to test duplication
- fakes make many highly surprising decisions, taking valuable mental overhead and productivty away from the team, like:
- faking out field selectors (usually handled in the server) with indices, which requires test code to be written to create indices
- objects with deletion timestamps but no finalizers cause some internal panic even though it's totally valid for your controller to see one in production
- fake clients need subresources registered manually
Teams that spend time writing unit tests for unit-level concerns and do not reach to mocks, in my experience, move faster and get a higher RoI on their tests. End-to-end tests and chaos tests validate production modalities that integration tests attempt to, but cannot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov I completely agree with that. 🥇
Even though I am by far the one less experienced in this conversation about this domain, I have seen similar problems in other domains that made me resonate a lot with your comment while reading it.
Sometimes we focus too much on trying to reproduce the state of the world where our app will run, and we forget that we should primarily test what our app does, and what our business logic is. It is almost always much simpler and easier to understand to just have our business logic in pure functions that expect input data and produce output data. We then forget about how this data will be gathered and what will be done with the result, and we can simply ensure that domainLogicFn(x) == expectedOutput.
This is a common concern that applies to a lot of different domains, but it is pretty common in k8s because most of the time the business logic gets mixed/loosed in the effort to understand/simulate the "state" where all that business logic will happen (due to the stateful nature of k8s controllers). So I just want to reaffirm that avoiding complex mocks when possible with pure functions is a great idea, and not just in this context 🙂
9906f6d
to
4126dbc
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
Please rebase pull request. |
4126dbc
to
b32fd60
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
b32fd60
to
241fe18
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
update makfile to use go 1.23 update docs bump gotestsum to 1.12.0 bump golangci-lint to 1.59.1 use the fips compliant golang image generate a secret for the operator from workload identity Update pkg/operator/deploy/deploy.go Co-authored-by: Ayato Tokubi <[email protected]> get subscription info from the subscription doc rather than env test the operator identity secret generation code properly Fixed to correctly reference the local image, preventing unauthorized Docker Hub pulls. Align docs hierarchy Indent bullet points Copy fluentbit image from arointsvc ACR to your ACR It is needed since it is compared against a default image (and digest) from const file ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config ARO-9570: Update controller to watch MCP and ARO Cluster object ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist Fix linting Add licenses bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint update golangci-lint version use non fips golang image use go 1.22 bump go in ci add git to dockerignore set buildvcs to false upgrade to go 1.22.6 update docs fix go lint address comments remove commented code from onebranch pipelines file change var to const fix unit-tests and api cloudError This is the new CI-RP stage for the pipline (#3768) * This is the new CI-RP stage for the pipline (#3753) * Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically removed the tag Add Podman service start and remote args setup for seamless operation Add sudo to start Podman service for elevated permissions and fix permission errors Add sudo to start Podman service for elevated permissions and fix permission errors Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions. Add sudo to start Podman service for elevated permissions and fix permission errors * Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds. * fix the makefile * added the port to fix the Makefile Add smoke test for alerts from Alertmanager (#3801) Move ARM swagger to subfolder (#3805) To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters. There are no additional changes, no impact on the SDK and clients. Add the old make runlocal-rp as an alternative to containerization (#3789) Add smoke test documents (#3813) Adding Ayato to CODEOWNERS Fix make ci-clean and runlocal-rp (#3806) * Fix make ci-clean error for running work containers by buildah that prevents prune from working * Fix make runlocal-rp image syntax Upgrade to Podman 5 to fix the vuln Install required binary for Podman 5 in ci Switch back to OneBranch build image Install crun Install more OCI packages Change home dir to /tmp for podman see containers/podman#23818 for more details. Use sudo for tdnf bump golangci-lint version in dockerfile ci-rp add go flags update go ver in ci.yml update test Correct testing/time issues in pkg/deploy (#3808) - Percolate up the time to wait for LB healthcheck probes, test @ 0 sec - Correct a context timeout test case, test @ 0 sec timeout Fix slow tests in /pkg/backend (#3809) Fix slow tests in /pkg/frontend (#3810) * Clarifying etcd cert renew test - Updated the test to make it clear it is passing because timeout is being reached - Updated the timeout from 10s -> 0s to pass faster * Fix slow changefeed tests Generate smaller OIDC keys for unit tests (#3811) - significantly increases unit test performance by moving from 4096 -> 256 bit keys - preserves 4096 bit keys for all non-testing scenarios Make CI-RP Improvements (#3791) - Remove linting from ci-rp - Remove generate from ci-rp Set CGO_ENABLED update test command in ci-rp dockerfile Separate Makefile targets for local vs containers (#3816) - reverts changes to runlocal-rp - updates old run-portal to runlocal-portal since it uses local bins - adds new targets for containerized run of RP and Portal; opt-in - fixes docs and pipelines to use updated targets Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719) * Move to using bingo for tools * go mod vendor [MIMO] Move cluster certificate functionality to ClientHelper (#3736) * move over TLS applying, as well as some clienthelper work bump go in bingo merge makefile changes from Master more Makefile updates add GO var in toplevel Makefile
Which issue this PR addresses:
Splitout of MIMO M1 work
What this PR does / why we need it:
Retrofits clienthelper support into some functions that MIMO tasks require
Test plan for issue:
CI, E2E
Is there any documentation that needs to be updated for this PR?
N/A
How do you know this will function as expected in production?
E2E should cover it, since it applies during install as well