Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Hive Directory #3765

Merged
merged 4 commits into from
Sep 11, 2024
Merged

Refactor Hive Directory #3765

merged 4 commits into from
Sep 11, 2024

Conversation

razo7
Copy link
Collaborator

@razo7 razo7 commented Aug 8, 2024

Which issue this PR addresses:

  1. Group the Hive files from hack directory to hack/hive directory.
  2. Add main function, and use hack/utils.sh functions
  3. Hive installation was timed out using the hack/hive-dev-install.sh script, so we remove the redundant timeout waiting.

.
.
.
clusterrolebinding.rbac.authorization.k8s.io/hive-operator-rolebinding created
deployment.apps/hive-operator created
deployment.apps/hive-operator condition met
Error from server (NotFound): deployments.apps "hive-controllers" not found

What this PR does / why we need it:

  • Simplify Hive scripts and creation and also use common functions.

  • Aligning with production Hive installation by not waiting for the hive-controllers deployment to be Available and the pod to be Ready. It's the responsibility of the operator to manage its resources, and it is an overkill that even causes the Hive installation to fail on waiting. Rerun the script was succeeded

  • Fixing the Hive installation timeout issue would help Containerized Full RP Dev Automation #3764 automation.

Test plan for issue:

Not needed

Is there any documentation that needs to be updated for this PR?

Yes, due to the change of files location.

How do you know this will function as expected in production?

@razo7
Copy link
Collaborator Author

razo7 commented Aug 8, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@razo7 razo7 changed the title Remove timeout for Hive installation Remove Timeout for Hive Installation Aug 11, 2024
Copy link
Collaborator

@hawkowl hawkowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to remove it entirely, but increase it to some amount that Hive will surely be operational by. If we do want to just trust in the operator, then we should instead put in some output of the script that indicates how to tell when Hive is operational and you can use it (e.g. "check readiness via kubectl x y z to monitor rollout").

@tsatam
Copy link
Collaborator

tsatam commented Aug 12, 2024

These waits/timeouts were intentionally added in order to solve for issues we were facing when deploying Hive into ARO production clusters, where the deployment itself was "healthy" but the Hive operator was degraded due to various issues (namely missing OpenShift-specific CRDs in the AKS clusters we run Hive in).

Without these checks, the script would exit "successfully" and thus our deployment would be marked as successful, despite Hive being degraded.

That being said, I don't necessarily think we need to retain this specific solution to that problem, and there's probably a better long-term solution (e.g. release pipeline takes Hive health metrics into account).

For now I think we could make this change if truly desired, since this script isn't itself used for production Hive deployments. I just backported the changes we made to the production deployment script here.

@razo7 razo7 force-pushed the time-out-hive-installation branch from 5d0538a to ffbf745 Compare August 27, 2024 10:05
@razo7 razo7 changed the title Remove Timeout for Hive Installation Refactor Hive Directory Aug 27, 2024
@razo7 razo7 force-pushed the time-out-hive-installation branch 2 times, most recently from df61f07 to 4045606 Compare August 27, 2024 10:36
@razo7 razo7 force-pushed the time-out-hive-installation branch from 4045606 to a51641c Compare August 27, 2024 10:38
Comment on lines 41 to 44
echo "$PULL_SECRET" > /tmp/.tmp-secret
# Using dry-run allows updates to work seamlessly
$KUBECTL create secret generic hive-global-pull-secret --from-file=.dockerconfigjson=/tmp/.tmp-secret --type=kubernetes.io/dockerconfigjson --namespace $HIVE_OPERATOR_NS -o yaml --dry-run=client | $KUBECTL apply -f - 2>/dev/null
rm -f /tmp/.tmp-secret
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used in production, so it might be possible to remove it from here as well.

@razo7
Copy link
Collaborator Author

razo7 commented Aug 27, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@razo7
Copy link
Collaborator Author

razo7 commented Aug 28, 2024

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Group the Hive files under hack directory to hack/hive
Group the Hive files under hack directory to hack/hive, and refactor Hive instllation using main function and utils.sh
@razo7 razo7 force-pushed the time-out-hive-installation branch from a51641c to e5739dd Compare September 8, 2024 06:54
@razo7 razo7 requested a review from bitoku as a code owner September 8, 2024 06:54
@razo7
Copy link
Collaborator Author

razo7 commented Sep 10, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Trust in the operator installation and print two options to monitor Hive deployment rollout
Use double quote to prevent word splitting, break long line into multiple, use '-n' over '! -z', simpler if check, use consistent function declaration syntax, trap outside main and after clenup is declared
@razo7
Copy link
Collaborator Author

razo7 commented Sep 11, 2024

/azp run ci, e2e

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tiguelu
Copy link
Contributor

tiguelu commented Sep 11, 2024

I don't think we want to remove it entirely, but increase it to some amount that Hive will surely be operational by. If we do want to just trust in the operator, then we should instead put in some output of the script that indicates how to tell when Hive is operational and you can use it (e.g. "check readiness via kubectl x y z to monitor rollout").

Thank you Amber. A log line has been added to print commands to check readiness.

Copy link
Contributor

@tiguelu tiguelu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All requested changes have been addressed. LGTM.

@tiguelu tiguelu merged commit abf4167 into master Sep 11, 2024
24 checks passed
@tiguelu tiguelu deleted the time-out-hive-installation branch September 11, 2024 12:31
edisonLcardenas pushed a commit that referenced this pull request Sep 16, 2024
* Move Hive hack files under one directory
Group the Hive files under hack directory to hack/hive

* Refactor Hive installation and hack files location
Group the Hive files under hack directory to hack/hive, and refactor Hive installation using main function and utils.sh

* Print troubleshooting for Hive deployment rollout
Trust in the operator installation and print two options to monitor Hive deployment rollout

* Small fixes for hive installation script
Use double quote to prevent word splitting, break long line into multiple, use '-n' over '! -z', simpler if check, use consistent function declaration syntax, trap outside main and after cleanup is declared
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants