Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure NGINX reload actually happens #664

Closed
kate-osborn opened this issue May 23, 2023 · 5 comments · Fixed by #1033
Closed

Ensure NGINX reload actually happens #664

kate-osborn opened this issue May 23, 2023 · 5 comments · Fixed by #1033
Assignees
Labels
bug Something isn't working refined Requirements are refined and the issue is ready to be implemented. size/large Estimated to be completed within two weeks
Milestone

Comments

@kate-osborn
Copy link
Contributor

kate-osborn commented May 23, 2023

When we reload NGINX, we do not check whether the reload succeeds. We just sleep for a second to wait for the new workers to spin up. We need to verify that a reload was successful and return an error if it fails. Additionally, we should retry on failure.

Acceptance:

  • When a NGINX reload occurs, NKG will wait for the reload to finish, and if an error occurs, the error is reported to the user.
  • When NGINX fails to reload, NKG will retry reloading NGINX.

This work should resolve the following FIXME: https://github.com/nginxinc/nginx-kubernetes-gateway/blob/b00216eeac0f8edfb054d45ad073c9c004788762/internal/nginx/runtime/manager.go#L52

@kate-osborn kate-osborn self-assigned this May 23, 2023
@kate-osborn kate-osborn changed the title Placeholder FIXME: "ensure the reload actually happens" Ensure NGINX reload actually happens May 23, 2023
@kate-osborn kate-osborn added the bug Something isn't working label May 23, 2023
@mpstefan mpstefan added this to the v1.0.0 milestone Jun 2, 2023
@mpstefan mpstefan modified the milestones: v1.0.0, v1.0.1 Aug 11, 2023
@mpstefan mpstefan modified the milestones: v1.0.1, v1.0.0 Aug 14, 2023
@mpstefan mpstefan added refined Requirements are refined and the issue is ready to be implemented. size/large Estimated to be completed within two weeks labels Aug 21, 2023
@mpstefan
Copy link
Collaborator

Logs could be the solution to this problem.

@brianehlert
Copy link

Logs should not be written to disk. Log rotate cron jobs do not run in containers. So tailing a log should be avoided.
Watching log output stream is possible but could be problematic.

@sjberman
Copy link
Collaborator

sjberman commented Sep 1, 2023

Our k8s Golang client has the ability to get a Pod's logs at any point (we would need to instantiate a client-go client vs the current controller-runtime client, though, since the latter doesn't support this yet). We'd likely want to look for the string signal 1 (SIGHUP) received which tells us a reload happened. We have to figure out how to tie it to the last reload call though, because there are going to be many instances of that string. Use a timestamp somehow.

@brianehlert
Copy link

brianehlert commented Sep 1, 2023

What if, instead of looking to logs, you noticed the NGINX workers changed?
Which indicates a successful reload (the config was determined "good" by NGINX).

It does not solve the problem of applying configurations through the N+ API. Which would not have either of these signals.

@sjberman
Copy link
Collaborator

sjberman commented Sep 1, 2023

If the PIDs change, then maybe. I'm not sure if they do though. (I guess if the reload creates new workers, then they probably do)

EDIT: verified that at least the single worker process PID in my deployment did change on a reload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refined Requirements are refined and the issue is ready to be implemented. size/large Estimated to be completed within two weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants