-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Hive Directory #3765
Refactor Hive Directory #3765
Conversation
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to remove it entirely, but increase it to some amount that Hive will surely be operational by. If we do want to just trust in the operator, then we should instead put in some output of the script that indicates how to tell when Hive is operational and you can use it (e.g. "check readiness via kubectl x y z to monitor rollout").
These waits/timeouts were intentionally added in order to solve for issues we were facing when deploying Hive into ARO production clusters, where the deployment itself was "healthy" but the Hive operator was degraded due to various issues (namely missing OpenShift-specific CRDs in the AKS clusters we run Hive in). Without these checks, the script would exit "successfully" and thus our deployment would be marked as successful, despite Hive being degraded. That being said, I don't necessarily think we need to retain this specific solution to that problem, and there's probably a better long-term solution (e.g. release pipeline takes Hive health metrics into account). For now I think we could make this change if truly desired, since this script isn't itself used for production Hive deployments. I just backported the changes we made to the production deployment script here. |
5d0538a
to
ffbf745
Compare
df61f07
to
4045606
Compare
4045606
to
a51641c
Compare
hack/hive/hive-dev-install.sh
Outdated
echo "$PULL_SECRET" > /tmp/.tmp-secret | ||
# Using dry-run allows updates to work seamlessly | ||
$KUBECTL create secret generic hive-global-pull-secret --from-file=.dockerconfigjson=/tmp/.tmp-secret --type=kubernetes.io/dockerconfigjson --namespace $HIVE_OPERATOR_NS -o yaml --dry-run=client | $KUBECTL apply -f - 2>/dev/null | ||
rm -f /tmp/.tmp-secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used in production, so it might be possible to remove it from here as well.
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
Group the Hive files under hack directory to hack/hive
Group the Hive files under hack directory to hack/hive, and refactor Hive instllation using main function and utils.sh
a51641c
to
e5739dd
Compare
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
Trust in the operator installation and print two options to monitor Hive deployment rollout
Use double quote to prevent word splitting, break long line into multiple, use '-n' over '! -z', simpler if check, use consistent function declaration syntax, trap outside main and after clenup is declared
/azp run ci, e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
Thank you Amber. A log line has been added to print commands to check readiness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All requested changes have been addressed. LGTM.
* Move Hive hack files under one directory Group the Hive files under hack directory to hack/hive * Refactor Hive installation and hack files location Group the Hive files under hack directory to hack/hive, and refactor Hive installation using main function and utils.sh * Print troubleshooting for Hive deployment rollout Trust in the operator installation and print two options to monitor Hive deployment rollout * Small fixes for hive installation script Use double quote to prevent word splitting, break long line into multiple, use '-n' over '! -z', simpler if check, use consistent function declaration syntax, trap outside main and after cleanup is declared
Which issue this PR addresses:
hack
directory tohack/hive
directory.hack/utils.sh
functionshack/hive-dev-install.sh
script, so we remove the redundant timeout waiting.What this PR does / why we need it:
Simplify Hive scripts and creation and also use common functions.
Aligning with production Hive installation by not waiting for the hive-controllers deployment to be Available and the pod to be Ready. It's the responsibility of the operator to manage its resources, and it is an overkill that even causes the Hive installation to fail on waiting. Rerun the script was succeeded
Fixing the Hive installation timeout issue would help Containerized Full RP Dev Automation #3764 automation.
Test plan for issue:
Not needed
Is there any documentation that needs to be updated for this PR?
Yes, due to the change of files location.
How do you know this will function as expected in production?