Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing Go bindings calls XDG runtime checks, causing application to exit unexpectedly #23818

Open
ventifus opened this issue Aug 30, 2024 · 7 comments · May be fixed by containers/common#2147
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue

Comments

@ventifus
Copy link
Contributor

Issue Description

XDG runtime checks should not be performed when only the go bindings are used.

After updating our podman go bindings from 4.9.4 to 5.0.3, our application began failing with the error

time="2024-08-29T22:12:26Z" level=error msg="stat /.config: no such file or directory"

We tracked this down to https://github.com/containers/storage/blob/main/pkg/homedir/homedir_unix.go#L120-L123, where the XDG runtime checks are verifying that $HOME/.config exists. Our application runs in a container with $HOME set to /, and there is no .config, and the runtime non-root user doesn't have permission to create it.

Inserting a debug.PrintStack() in GetConfigHome() shows this stack trace:

goroutine 1 [running, locked to thread]:
runtime/debug.Stack()
/usr/lib/golang/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
/usr/lib/golang/src/runtime/debug/stack.go:16 +0x13
github.com/containers/storage/pkg/homedir.GetConfigHome.func1()
/app/vendor/github.com/containers/storage/pkg/homedir/homedir_unix.go:122 +0x176
sync.(*Once).doSlow(0x3e9?, 0x0?)
/usr/lib/golang/src/sync/once.go:74 +0xbf
sync.(*Once).Do(...)
/usr/lib/golang/src/sync/once.go:65
github.com/containers/storage/pkg/homedir.GetConfigHome()
/app/vendor/github.com/containers/storage/pkg/homedir/homedir_unix.go:111 +0x2c
github.com/containers/common/pkg/config.defaultConfig()
/app/vendor/github.com/containers/common/pkg/config/default.go:201 +0x7a
github.com/containers/common/pkg/config.newLocked(0xc00035fd78)
/app/vendor/github.com/containers/common/pkg/config/new.go:71 +0x2a
github.com/containers/common/pkg/config.Default()
/app/vendor/github.com/containers/common/pkg/config/new.go:63 +0xfe
github.com/containers/podman/v5/pkg/util.init.0()
/app/vendor/github.com/containers/podman/v5/pkg/util/utils.go:48 +0x13

Additional context: https://redhat-internal.slack.com/archives/CBBJY9GSX/p1725040560356209

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a go program that imports the podman bindings
  2. Run the go program as an unprivileged user with $HOME=/ or other un-writable directory.
  3. Program exits unexpectedly

Describe the results you received

Program exits with the error

time="2024-08-29T22:12:26Z" level=error msg="stat /.config: no such file or directory"

Describe the results you expected

Program runs as expected

podman info output

n/a, we are not actually running podman

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@ventifus ventifus added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2024
nwnt added a commit to Azure/ARO-RP that referenced this issue Aug 30, 2024
nwnt added a commit to Azure/ARO-RP that referenced this issue Aug 30, 2024
@Luap99
Copy link
Member

Luap99 commented Sep 2, 2024

ENOENT should not be an error to begin with IMO as this is only a config dir and AFAIK all other tools work fine without a config file in the home dir.

Second the bindings should not really import any of this code I think, at least not github.com/containers/common/pkg/config

rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
@mtrmac
Copy link
Collaborator

mtrmac commented Sep 3, 2024

Second the bindings should not really import any of this code I think, at least not github.com/containers/common/pkg/config

This part is the Podman bug, and should be fixed. (IIRC it’s c/podman/pkg/util importing this — util packages are always an anti pattern.)


ENOENT should not be an error to begin with IMO as this is only a config dir and AFAIK all other tools work fine without a config file in the home dir.

At least podman machine is creating files in $XDG_CONFIG_HOME (and/or the fallback). So for podman machine, we do need the ability to have, or create, a reasonable …ConfigHome directory.

The situation is a non-root user with a home directory, but not allowed to write into such a home directory. I generally suggest that this is an unreasonable setup to run (the full feature set of) Podman in, and not really worth adding special cases for.

rhatdan added a commit to rhatdan/common that referenced this issue Sep 3, 2024
@Luap99
Copy link
Member

Luap99 commented Sep 4, 2024

ENOENT should not be an error to begin with IMO as this is only a config dir and AFAIK all other tools work fine without a config file in the home dir.

At least podman machine is creating files in $XDG_CONFIG_HOME (and/or the fallback). So for podman machine, we do need the ability to have, or create, a reasonable …ConfigHome directory.

There are two types of callers readers and writers, readers just use this to read a config file so they just join the result to its path and try to open the file which then either fails or not so they have to handle ENOENT errors if these files are optional so stating the dir before is just a waste of a syscall. Writers on the other side have to create a further directories, AFAICT none of other tools would write into .config but rather .config/containers so they must mkdir containers anyway. Now that can fail if they just call Mkdir vs MkdirAll but looking at it it seems they are already use MkdirAll

A function name like GetConfigHome() should just return the path regadless if it does exists or not, it should especially not create the path IMO because this means a command like podman ps creates the dir for no reason whatsoever.

rhatdan added a commit to rhatdan/common that referenced this issue Sep 4, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 4, 2024
rhatdan added a commit to rhatdan/common that referenced this issue Sep 4, 2024
@rhatdan
Copy link
Member

rhatdan commented Sep 4, 2024

How about we deprecate GetConfigHome() and add ConfigHome() to do what you want, then we move all callers to ConfigHome()?

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 4, 2024

There are two types of callers readers and writers …

I think that’s all a fair analysis.

The one thing I do care about is that the permission special case remains: if we inherit $XDG_CONFIG_HOME belonging to a different user account (not owned by us), we must fail, at the very least for writers (because root writing to other users’ directory can do damage), and I’d prefer that for for readers as well.


How about we deprecate GetConfigHome() and add ConfigHome() to do what you want, then we move all callers to ConfigHome()?

👍 Sounds like a good way to make this change and ensure all aspects are reviewed.

@Luap99
Copy link
Member

Luap99 commented Sep 4, 2024

How about we deprecate GetConfigHome() and add ConfigHome() to do what you want, then we move all callers to ConfigHome()?

I would be fine with that as well.

gouthamMN pushed a commit to Azure/ARO-RP that referenced this issue Sep 5, 2024
gouthamMN pushed a commit to Azure/ARO-RP that referenced this issue Sep 10, 2024
update makfile to use go 1.23

update docs

bump gotestsum to 1.12.0

bump golangci-lint to 1.59.1

use the fips compliant golang image

generate a secret for the operator from workload identity

Update pkg/operator/deploy/deploy.go

Co-authored-by: Ayato Tokubi <[email protected]>

get subscription info from the subscription doc rather than env

test the operator identity secret generation code properly

Fixed  to correctly reference the local  image, preventing unauthorized Docker Hub pulls.

Align docs hierarchy

Indent bullet points

Copy fluentbit image from arointsvc ACR to your ACR

It is needed since it is compared against a default image (and digest) from const file

ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config

ARO-9570: Update controller to watch MCP and ARO Cluster object

ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist

Fix linting

Add licenses

bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint

update golangci-lint version

use non fips golang image

use go 1.22

bump go in ci

add git to dockerignore

set buildvcs to false

upgrade to go 1.22.6

update docs

fix go lint

address comments

remove commented code from onebranch pipelines file

change var to const

fix unit-tests and api cloudError

This is the new CI-RP stage for the pipline (#3768)

* This is the new CI-RP stage for the pipline (#3753)

* Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

removed the tag

Add Podman service start and remote args setup for seamless operation

Add sudo to start Podman service for elevated permissions and fix permission errors

Add sudo to start Podman service for elevated permissions and fix permission errors

Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions.

Add sudo to start Podman service for elevated permissions and fix permission errors

* Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds.

* fix the makefile

* added the port to fix the Makefile

Add smoke test for alerts from Alertmanager (#3801)

Move ARM swagger to subfolder (#3805)

To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters.

There are no additional changes, no impact on the SDK and clients.

Add the old make runlocal-rp as an alternative to containerization (#3789)

Add smoke test documents (#3813)

Adding Ayato to CODEOWNERS

Fix make ci-clean and runlocal-rp (#3806)

* Fix make ci-clean error for running work containers by buildah that prevents prune from working
* Fix make runlocal-rp image syntax

Upgrade to Podman 5 to fix the vuln

Install required binary for Podman 5 in ci

Switch back to OneBranch build image

Install crun

Install more OCI packages

Change home dir to /tmp for podman

see containers/podman#23818
for more details.

Use sudo for tdnf

bump golangci-lint version in dockerfile ci-rp

add go flags

update go ver in ci.yml

update test

Correct testing/time issues in pkg/deploy (#3808)

- Percolate up the time to wait for LB healthcheck probes, test @ 0 sec
- Correct a context timeout test case, test @ 0 sec timeout

Fix slow tests in /pkg/backend (#3809)

Fix slow tests in /pkg/frontend (#3810)

* Clarifying etcd cert renew test

- Updated the test to make it clear it is passing because timeout is being reached
- Updated the timeout from 10s -> 0s to pass faster

* Fix slow changefeed tests

Generate smaller OIDC keys for unit tests (#3811)

- significantly increases unit test performance by moving from 4096 -> 256 bit keys
- preserves 4096 bit keys for all non-testing scenarios

Make CI-RP Improvements (#3791)

- Remove linting from ci-rp
- Remove generate from ci-rp

Set CGO_ENABLED

update test command in ci-rp dockerfile

Separate Makefile targets for local vs containers (#3816)

- reverts changes to runlocal-rp
- updates old run-portal to runlocal-portal since it uses local bins
- adds new targets for containerized run of RP and Portal; opt-in
- fixes docs and pipelines to use updated targets

Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719)

* Move to using bingo for tools
* go mod vendor

[MIMO] Move cluster certificate functionality to ClientHelper (#3736)

* move over TLS applying, as well as some clienthelper work

bump go in bingo

merge makefile changes from Master

more Makefile updates

add GO var in toplevel Makefile
edisonLcardenas pushed a commit to Azure/ARO-RP that referenced this issue Sep 16, 2024
Copy link

github-actions bot commented Oct 5, 2024

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants