-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orka Upgrade Thurs Jan 19th 2023 6pm - 9pm GMT #3112
Comments
I believe this issue can be closed due to #3116 |
I made some notes in Slack while I was working on it, but I will consolidate it in the following lines TL;DR 😁
Extra information 📖I deployed the testing machines in the nodes expecting to follow the NAT config by default, but it was not super clear to me how that magic works in details. On the VMs appears to be
So I distributed the VMs accoding to the external IPs expected and the proper ports binding (8822, 8823...). Here are some logs checking ssh and OS version per machine test-orka-macos10.14-x64-1
test-orka-macos10.14-x64-2
test-orka-macos10.14-x64-3
test-orka-macos10.15-x64-1
test-orka-macos10.15-x64-2
test-orka-macos11-x64-1
test-orka-macos11-x64-2
Opportunity 🦾Checking the inventory I discovered that |
@UlisesGascon As @AshCripps mentioned in the issue description,
This is the likely reason that you're seeing inconsistencies with what the machine is and its hostname -- the hostname will be whatever is in the base image until we rerun the Ansible |
oh I knew I was forgetting something, when you deploy a machine it takes the next available IP so you have to either change the inventory file or deploy in IP order from the file |
Current status: All the test machines are now available in Jenkins. I did some manual patching inside the 10.x machines (Jekins tokens, kwon_hosts, etc..) as I faced some Ansible challenges ( #3119 ). Regarding the backups, I will suggest to wait until we are sure that the machines are working fine and the pipelines are passing before doing a final backup 👍 |
Quick update since 2022: I think we are good with the current situation in the CI for the Also we need to provision (I can do it) and re-Ansible the release machine |
The macOS 10.14 and 10.15 VMs still have an issue building Node.js 14 (#3131). Should be fixed by running through the manual steps in https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#install-command-line-tools-for-xcode.
I could try re-ansibling. If anyone else wants to try, the macOS release machines will also require the manual steps in https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#macos-release-machines to be run to set up full Xcode and the signing certificates. |
@UlisesGascon We should now be good w.r.t Xcode on the My offer to re-ansible the release machine stands if you re-provision it 🙂. |
@richardlau I re-provisioned the machine few minutes ago. I will do the snapshots once the release one is ready too 👍 |
I've reansibled release-orka-10.15-x64-1 and ran through the manual steps in https://github.com/nodejs/build/blob/main/ansible/MANUAL_STEPS.md#macos-release-machines. For now I've disabled release-nearform-macos10.15-x64-1 to force the next iojs+release job onto the orka machine so we can verify (via the next nightly or canary build) that it is working as expected. |
The orka macOS 10.15 release machine looks good. I've reenabled the nearform one. |
I will started with the backup. Based on the documentation, I will create a new image based on a deployed machine strategy for all the vms. |
I finished with the backup images process. If in the future we need more space for new images we can potentially remove the ones from 2020 @richardlau, I am confident with the backups, but if you want I can delete and restore one VM just to check that the backups are working. I believe this was the last step in order to close this issue 🤔 |
Ive scheduled in our Orka upgrade to 6pm - 9pm on 19th Jan.
Reason I booked so far out we need to do some stuff beforehand:
Notes from Macstadium:
We require a minimum 3-hour maintenance window to perform the upgrade. We have time slots (EST) to choose from Monday-Thursday.
The API will be unreachable during the maintenance window. Please pause all CI/CD functions during the maintenance window.
Prior to starting the maintenance window, If you want to preserve the changes done to a VM make sure to save them to an image (save or commit).
Prior to the start of the maintenance window, please save/shut down any VMs running on M1/Intel nodes. Cached images on M1 nodes may be removed as part of the upgrade
If you have any Kubernetes data in your sandbox namespace, you will need to back that data up prior to the start of the window. If you have any non-ephemeral VMs, please let us know in advance.
For upgrades on environments moving to version 2.3.0, all tokens will be purged as part of the new database performance optimization. If you are already on 2.1.0 and above this action has been completed. You may need to generate new tokens for your Orka API connections.
Orka 2.3.0 and above now offer better optimization for logs. These involve a new configuration to be set regarding retention rates. The default values are: Expiration policy is set to two weeks (336 hours). This is fully customizable. Please plan your log retention strategy to best meet your needs and use cases. Be sure to inform us of necessary configuration changes prior to the upgrade. The old log system will be deprecated in future releases.
There is no impact on your images, VM configs, or ISOs performing this upgrade.
Please upgrade your Jenkins plugin to the latest version: https://plugins.jenkins.io/macstadium-orka/Doesnt apply to use we dont use the ephermial version (should we?)We will notify you as soon as the upgrade is completed!
I have an email notification but it should have also gone to all member of build infra email chain
cc @nodejs/build
The text was updated successfully, but these errors were encountered: