Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix management interface and startup configs for IOL #2347

Merged
merged 15 commits into from
Dec 18, 2024

Conversation

kaelemc
Copy link
Contributor

@kaelemc kaelemc commented Dec 13, 2024

Relevant to IOL and the discussion in Discord.

@hellt hellt marked this pull request as draft December 13, 2024 08:21
- Allow for user startup configs to override default cfg

- Allow for user to supply partial startup config with '.partial' file ext

- Modify cfg template to allow for partial cfg

- Rename startup config ('startup.cfg') to 'boot_config.txt' to better reflect nature of the file.
- Updates IOL mgmt interface IP addressing on boots that are NOT the first boot.
@kaelemc kaelemc changed the title Add OpenStdin to container config Fix management interface and startup configs for IOL Dec 14, 2024
@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

I've added a few things:

  • Startup and partial configurations are now supported for IOL. Like SROS partial configs are supplied with the .partial file extension. If a startup config is supplied without .partial extension, then the config will override the default configuration we have, however like XRd and Cisco 8000 Emulator (and probably a few other nodes) this config can still take advantage of the template fields.

  • The UpdateMgmtIntf function has been added, it runs in PostDeploy() with a 7 second delay. This takes advantage of a new WriteToStdinNoWait func that i've implemented only for the Docker runtime.

As per the name, WriteToStdinNoWait just blindly sends data to the containers stdin. In this case for IOL we send configuration to correct the management interface IP addressing. This works very very reliably for IOL, at this stage I don't think we need to scrape and send like in vrnetlab.

To refresh your memory, the reason we must do this is because in IOL the startup configuration file is ignored as soon as the nvram file exists (where user configuration is stored once a user does write memory or copy run start).

So the issue would be that:

  1. A user starts a lab
  2. A user SSHes into IOL, does some config and saves it (it gets saved to NVRAM)
  3. On next deployment of the lab, IOL will load configuration from the NVRAM file.. which has the OLD management IP
  4. User can't SSH into IOL properly since the management interface has the wrong IP address configured.

@kaelemc kaelemc marked this pull request as ready for review December 14, 2024 05:50
@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

@hellt Currently my change only alters the management IP on the mgmt interface.

Is there a possible case at all where the management gateway or subnet could change without having to do a deployment with --reconfigure/-c flag?

If yes, I'll add configuration to update the static default route for the mgmt VRF.

@hellt
Copy link
Member

hellt commented Dec 14, 2024

Is there a possible case at all where the management gateway or subnet could change without having to do a deployment with --reconfigure/-c flag?

the eth0 interface of a container can change when you do a regular deploy operation. In other words, it doesn't matter if reconfigure is used or not, the IP might change if it becomes allocated to another node between two deploys

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

@hellt Yes, understood. That is what has already been fixed in this PR.

What I mean to ask is if the entire subnet can change, hence constituting a change of the next hop IP for the static default mgmt route.

For example:

On initial deployment the subnet is 192.168.10.0/24 and the next hop for the static route is 192.168.10.1
Is there any case where on redeployment (without the -c flag) the subnet could change.. ie. to 192.168.20.0/24, thus meaning the next hop is now 192.168.20.1 (in this case the route in IOL must be updated).

Hope that explained it well enough :)

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

Apologies, fixing the build errors as well, thought I had everything covered.

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

@hellt Kindly requesting your help with the MockRuntime stuff. On this front I am out of my depth so I'll be a bit slow.

I didn't have access to my PC earlier, hence why I asked re: the mgmt subnet. But yes I have confirmed that the static route must be added as well since this case is possible.

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 14, 2024

Ok, should be good to go (bar the tests).

Not proud of the 7 second delay. But this is an artifact of IOL having a 5 second boot delay. We can probably remove the 5sec boot delay (this is on the vrnetlab side).. but it means everyone has to rebuild their containers.

Also the mgmt_str is awfully long as the multi-line raw string didn't seem to play nice when writing to the stdin.

@hellt
Copy link
Member

hellt commented Dec 14, 2024

Thanks @kaelemc

I think to sleep well at nights we might want to add a short e2e test for iol startup config.
The testing orchestration is done via Robot Framework, and for iol it is defined here https://github.com/srl-labs/containerlab/tree/main/tests/10-basic-cisco_iol

hit me up on discord if you want to have some introduction to the testing infra if you hit roadblocks

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 15, 2024

I see you generated the mocks via the Makefile.. didn't know that it was that simple 😅.

Anyways i've added two fairly basic tests to confirm partial and full startup configurations are loading correctly. Full config also tests the template variable replacement. This is not supported in the partial config use case.

Tested it all locally on my machine, Everything seemed to work fine.

Let me know if I got anything wrong.

@hellt
Copy link
Member

hellt commented Dec 16, 2024

thanks @kaelemc
tests look good. The last piece before the merge is to add the doc section that explains how the startup config is handled. You can check the sros docs for reference

@hellt
Copy link
Member

hellt commented Dec 16, 2024

it seems that the updateMgmtIf function in the iol code base is not tested. What are the conditions for this code branch to run?

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 16, 2024

It runs if it's not the first boot of IOL.

If the NVRAM file already exists, we know this is not the first boot -- so update the management interface addressing.

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 17, 2024

Documentation has been added regarding the startup configuration. Let me know if you see any improvements, clarifications or further info I should add/change.

I am not so familiar with the testing aspect, I am happy to take a crack at it, but will need some info & assistance. Is the codecov report based on the robotframework tests, or is there something else in play there?

How do you think we should handle this since for the mgmt intf IP addressing to be updated (the code branch in question, updateMgmtIntf), we deploy IOL lab -> destroy IOL lab -> deploy it again.

@kaelemc
Copy link
Contributor Author

kaelemc commented Dec 17, 2024

Ok did a few things:

  • Bumped the delay before we change mgmt IP address to 10s. this is probably good for slower systems.
  • Make sure we remove the old IPv6 address before applying the new one
  • Add tests for mgmt interface IP change on redeployment

I've probably written the testing very poorly, This is my first time touching Robot Framework, so I just did a bit of googling for the syntax/stuff I couldn't infer from the existing tests. If you have any changes/cleanup suggestions or any other stuff I'm all ears.

@hellt
Copy link
Member

hellt commented Dec 18, 2024

thanks @kaelemc
stellar work

@hellt hellt merged commit 3581550 into srl-labs:main Dec 18, 2024
67 checks passed
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 54.36893% with 47 lines in your changes missing coverage. Please review.

Project coverage is 51.80%. Comparing base (59ad3bc) to head (079185d).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
mocks/mocknodes/default_node.go 0.00% 18 Missing ⚠️
mocks/mockruntime/runtime.go 0.00% 8 Missing ⚠️
nodes/iol/iol.go 88.00% 4 Missing and 2 partials ⚠️
runtime/docker/docker.go 66.66% 4 Missing and 2 partials ⚠️
mocks/mocknodes/node.go 0.00% 3 Missing ⚠️
runtime/ignite/ignite.go 0.00% 3 Missing ⚠️
runtime/podman/podman.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2347      +/-   ##
==========================================
- Coverage   51.82%   51.80%   -0.03%     
==========================================
  Files         172      172              
  Lines       17000    17083      +83     
==========================================
+ Hits         8811     8850      +39     
- Misses       7267     7307      +40     
- Partials      922      926       +4     
Files with missing lines Coverage Δ
runtime/runtime.go 90.90% <ø> (ø)
mocks/mocknodes/node.go 5.12% <0.00%> (ø)
runtime/ignite/ignite.go 0.30% <0.00%> (-0.01%) ⬇️
runtime/podman/podman.go 42.76% <0.00%> (-0.43%) ⬇️
nodes/iol/iol.go 82.37% <88.00%> (-0.10%) ⬇️
runtime/docker/docker.go 68.93% <66.66%> (-0.06%) ⬇️
mocks/mockruntime/runtime.go 10.22% <0.00%> (-0.38%) ⬇️
mocks/mocknodes/default_node.go 0.00% <0.00%> (ø)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants