Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting environment fails with: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on #1050

Open
goldmann opened this issue May 9, 2018 · 7 comments

Comments

@goldmann
Copy link
Contributor

goldmann commented May 9, 2018

Issue Overview

I see some (random) errors like this:

09:30:34  [ERROR] org.jboss.test.arquillian.ce.sso.SsoEap64SecureDeploymentsTest  Time elapsed: 1.996 s  <<< ERROR!
09:30:34  io.fabric8.kubernetes.clnt.v3_1.KubernetesClientException: Failure executing: PATCH at: https://MASTER:8443/api/v1/namespaces/test-sso-1dddeeef/replicationcontrollers/eap-app-1. Message: Forbidden! User arq-sso/MASTER:8443 doesn't have permission. replicationcontrollers "eap-app-1" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: User "arq-sso" cannot update deploymentconfigs/finalizers.apps.openshift.io in project "test-sso-1dddeeef", <nil>.
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:407)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.OperationSupport.handlePatch(OperationSupport.java:279)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.handlePatch(BaseOperation.java:783)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.HasMetadataOperation$3.apply(HasMetadataOperation.java:156)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.HasMetadataOperation$3.apply(HasMetadataOperation.java:152)
09:30:34  	at io.fabric8.kubernetes.api.model.v3_1.DoneableReplicationController.done(DoneableReplicationController.java:26)
09:30:34  	at io.fabric8.kubernetes.api.model.v3_1.DoneableReplicationController.done(DoneableReplicationController.java:5)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.HasMetadataOperation.patch(HasMetadataOperation.java:163)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.RollableScalableResourceOperation.patch(RollableScalableResourceOperation.java:175)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.HasMetadataOperation$1.apply(HasMetadataOperation.java:60)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.HasMetadataOperation$1.apply(HasMetadataOperation.java:50)
09:30:34  	at io.fabric8.kubernetes.api.model.v3_1.DoneableReplicationController.done(DoneableReplicationController.java:26)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.ReplicationControllerOperationsImpl.withReplicas(ReplicationControllerOperationsImpl.java:85)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.ReplicationControllerOperationsImpl.withReplicas(ReplicationControllerOperationsImpl.java:42)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.RollableScalableResourceOperation.scale(RollableScalableResourceOperation.java:77)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.RollableScalableResourceOperation.scale(RollableScalableResourceOperation.java:45)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.ScalableReaper.reap(ScalableReaper.java:30)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:621)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:70)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:662)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.deleteList(BaseOperation.java:696)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:636)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:70)
09:30:34  	at io.fabric8.openshift.clnt.v3_1.dsl.internal.DeploymentConfigOperationsImpl$DeploymentConfigReaper.reap(DeploymentConfigOperationsImpl.java:133)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:621)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:70)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:662)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.base.BaseOperation.delete(BaseOperation.java:650)
09:30:34  	at io.fabric8.openshift.clnt.v3_1.handlers.DeploymentConfigHandler.delete(DeploymentConfigHandler.java:63)
09:30:34  	at io.fabric8.openshift.clnt.v3_1.handlers.DeploymentConfigHandler.delete(DeploymentConfigHandler.java:33)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.KubernetesListOperationsImpl.delete(KubernetesListOperationsImpl.java:157)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.KubernetesListOperationsImpl.delete(KubernetesListOperationsImpl.java:149)
09:30:34  	at io.fabric8.kubernetes.clnt.v3_1.dsl.internal.KubernetesListOperationsImpl.delete(KubernetesListOperationsImpl.java:47)
09:30:34  	at org.arquillian.cube.openshift.impl.fabric8.F8OpenShiftAdapter.deleteTemplate(F8OpenShiftAdapter.java:438)
09:30:34  	at org.arquillian.cube.openshift.impl.resources.OpenShiftResourceFactory.deleteTemplates(OpenShiftResourceFactory.java:230)
09:30:34  	at org.arquillian.cube.openshift.impl.resources.OpenShiftResourceFactory.deleteEnvironment(OpenShiftResourceFactory.java:248)
09:30:34  	at org.arquillian.cube.openshift.impl.CEEnvironmentProcessor.deleteEnvironment(CEEnvironmentProcessor.java:133)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
09:30:34  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:30:34  	at java.lang.reflect.Method.invoke(Method.java:498)
09:30:34  	at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:86)
09:30:34  	at org.jboss.arquillian.core.impl.EventContextImpl.invokeObservers(EventContextImpl.java:103)
09:30:34  	at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:90)
09:30:34  	at org.jboss.arquillian.test.impl.TestContextHandler.createClassContext(TestContextHandler.java:83)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
09:30:34  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:30:34  	at java.lang.reflect.Method.invoke(Method.java:498)
09:30:34  	at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:86)
09:30:34  	at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:95)
09:30:34  	at org.jboss.arquillian.test.impl.TestContextHandler.createSuiteContext(TestContextHandler.java:69)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
09:30:34  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
09:30:34  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
09:30:34  	at java.lang.reflect.Method.invoke(Method.java:498)
09:30:34  	at org.jboss.arquillian.core.impl.ObserverImpl.invoke(ObserverImpl.java:86)
09:30:34  	at org.jboss.arquillian.core.impl.EventContextImpl.proceed(EventContextImpl.java:95)
09:30:34  	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:133)
09:30:34  	at org.jboss.arquillian.core.impl.ManagerImpl.fire(ManagerImpl.java:105)
09:30:34  	at org.jboss.arquillian.test.impl.EventTestRunnerAdaptor.afterClass(EventTestRunnerAdaptor.java:95)
09:30:34  	at org.jboss.arquillian.junit.Arquillian$3$1.evaluate(Arquillian.java:183)
09:30:34  	at org.jboss.arquillian.junit.Arquillian.multiExecute(Arquillian.java:350)
09:30:34  	at org.jboss.arquillian.junit.Arquillian.access$200(Arquillian.java:54)
09:30:34  	at org.jboss.arquillian.junit.Arquillian$3.evaluate(Arquillian.java:177)
09:30:34  	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
09:30:34  	at org.jboss.arquillian.junit.Arquillian.run(Arquillian.java:115)
09:30:34  	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
09:30:34  	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
09:30:34  	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
09:30:34  	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
09:30:34  	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
09:30:34  	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
09:30:34  	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
09:30:34  	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)

This execution is part of CI run in Jenkins. This is not happening on my laptop. I have the exact same oc version available here and there. The OCP (3.9) environment I use is the same. The user used for the OCP env is the same and the commands executed are the same. Difference is in the host OS (Fedora locally, RHEL on CI) and in the location of these two machines.

My assumption is that tests pass, the error happens only when Cube tries to clean up the environment.

Any clue?

@goldmann
Copy link
Contributor Author

goldmann commented May 9, 2018

The only thing that comes to my mind is that maybe the deletion of resources is somewhat too fast? When the failure occurs, both - OCP and test host is located in the same datacenter minimizing the network delays. When I execute it from my local machine, it always passes, but the test execution happens on the other end of the globe.

Additional note is that test failures are pretty much random - sometimes 2, 4 or 6 tests fail of the whol 50+ tests testsuite.

/cc @rcernich

@goldmann
Copy link
Contributor Author

goldmann commented May 9, 2018

I played a bit with parameters and I made it work by setting the namespace.cleanup.enabled parameter to false and namespace.destroy.enabled to true. This removes the project created by test suite but does not try to remove any resources in that project upfront. These resource are removed as part of the project removal, so this configuration works for me well.

I'm not sure what to do with this ticket, should I close it or you want to investigate why it actually failed to remove resources? For me it looks like a bug and above can be seen as workaround.

@rcernich
Copy link
Contributor

rcernich commented May 9, 2018

Is the issue related to the order in which the resources are deleted? It almost seems like the user is getting nuked before it removes everything else.

Regarding random failures, I think there are some things in OpenShift/k8s that occur asynchronously. If we could add some blocking logic so we wait until things are complete, that might help eliminate these random types of failures. (For example, routes are updated asynchronously and we often have issues where tests fail because the route hasn't been updated in haproxy.)

@mhofstetter
Copy link

i'm facing the same problem after using arquillian cube openshift with an openshift cluster in version 3.7. (didn't had the problem in 3.6)

after some debugging i think the problem is related to a race condition issue in the fabric8 kubernetes client.

After deleting (non-cascaded) the DeploymentConfig the DeploymentConfigReaper tries to delete all managed ReplicationControllers of the deleted DeploymentConfig.

The newly created ReplicationControllerOperationsImpl then performs a deletion with an implicit scaleDownToZero. This by loading the current state of the ReplicationController at the beginning of the operation (T1), performing the operation on the local stored object and than evaluates the patch. To evaluate the patch - it loads the current state of the ReplicationController again (T2).

This gets problematically when kubernetes itself modifies the related ReplicationController after the deletion of the DeploymentConfig too, by removing the 'metadata/ownerReferences' to the deleted DeploymentConfig.

If this internal modification falls between T1 and T2 the fabric8 client tries to re-add the ownerReference during the patching of the 'scaleDown' - as is detects the diff.

[
{"op":"add",
"path":"/metadata/ownerReferences",
"value":[
    {
        "apiVersion":"apps.openshift.io/v1",
        "kind":"DeploymentConfig",
        "blockOwnerDeletion":true,
        "controller":true,
        "name":"test",
        "uid":"dfa8774a-5375-11e8-8b90-0050569bf613"
    }
    ]
}
]

in conjunction with the fact, that kubernetes since 1.7 checks the ownerReference with the admission controller, this results in the error.

so maybe it's better to exclude the ownerReferences from the patch-set as it is also recommended in the kubernetes documentation.

If an object’s ownerReferences field is set by a controller (such as Deployment or ReplicaSet), blockOwnerDeletion is set automatically and you do not need to manually modify this field.

@lordofthejars
Copy link
Member

So this is a f8 kubernetes client issue that should be controlled there? Maybe we could open an issue there so they are aware.

@lburgazzoli
Copy link

I've opened this an issue upstream as I'm also experiencing the same behaviour

@gtully
Copy link

gtully commented Sep 13, 2018

looks like there is a v4.0.4 with a fix.
fabric8io/kubernetes-client@05f5c4f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants