-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support (m)TLS API Socket #24601
base: main
Are you sure you want to change the base?
Support (m)TLS API Socket #24601
Conversation
Ephemeral COPR build failed. @containers/packit-build please check. |
I don’t think we ever looked into this. Largely because without certificate
authentication - which is hard to do - TLS doesn’t add much. Other people
not being any to read the traffic didn’t matter if we are exposing
passwordless root over the network.
…On Tue, Nov 19, 2024 at 10:14 Paul Holzinger ***@***.***> wrote:
@jwhonce <https://github.com/jwhonce> @baude <https://github.com/baude>
@mheon <https://github.com/mheon> FYI. Had you ever looked at TLS support
for the podman service and remote client?
With the amount of code here this looks easy enough to accept to me but I
was wondering if there is/was any reason why we do not support it.
—
Reply to this email directly, view it on GitHub
<#24601 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCFVIQJ3XD2SIG42VKL2BNIT5AVCNFSM6AAAAABSA2GWC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBVHE4TGNBQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just skimmed, but LGTM on approach. A lot less code than I was expecting for certificate auth. Now, if we have to start providing docs for how to do certificate auth right, I think the amount of changes grows substantially, but I don't think it's really worth it at this point. |
Sending the directory location of the private key out over the wire doesn't give me the warm fuzzies. Is it something that other software does? |
That certainly wasn't my intent, but I'm not sure what you're referring to. Is that a consequence of adding it to the conf file that I'm unaware of? |
It is not being send anywhere, both the client and server read the files locally and then use the certificates to perform a normal TLS handshake via the go std lib AFAICT.
@jwhonce I am not sure what are you referring to? The question for me was if there was a specific reason why TLS support was not done before. If not then I think we should accept this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this will need some e2e or system tests where we spawn a server with TLS and the connect with the remove client.
pkg/util/tls.go
Outdated
"os" | ||
) | ||
|
||
func ReadCertBundle(path string) (*x509.CertPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you mope this in a new separate package, maybe call it cert
and then name the function ReadBundle()
We should really move away from dumping everything into util packages as this causes a lot of unwanted side effects via other imports that really should not have to be imported on the remote client, i.e. #23818. Fixing that will be quite some work but in the meantime we should not add new things there that are needed by the remote client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/bindings/connection.go
Outdated
if len(tlsCAFile) != 0 || len(tlsCertFile) != 0 || len(tlsKeyFile) != 0 { | ||
logrus.Debugf("using TLS cert=%s key=%s ca=%s", tlsCertFile, tlsKeyFile, tlsCAFile) | ||
transport.TLSClientConfig = &tls.Config{} | ||
connection.tls = true | ||
} | ||
if len(tlsCAFile) != 0 { | ||
pool, err := util.ReadCertBundle(tlsCAFile) | ||
if err != nil { | ||
return connection, fmt.Errorf("unable to read CA bundle: %w", err) | ||
} | ||
transport.TLSClientConfig.RootCAs = pool | ||
} | ||
if len(tlsCertFile) != 0 && len(tlsKeyFile) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition here do not seem to line up perfectly.
If only tlsCertFile
is set then the first condition matches and says tls is used but then we never add the the key on the last one. That case should return a hard error I would assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to have that check performed in the cmd/podman/...
packages, but on reflection it makes sense to also perform the same check here, as that wouldn't account for the env vars. Done.
cmd/podman/system/connection/add.go
Outdated
if cmd.Flags().Changed("tls-cert") { | ||
return errors.New("--tls-cert option not supported for ssh scheme") | ||
} | ||
if cmd.Flags().Changed("tls-key") { | ||
return errors.New("--tls-key option not supported for ssh scheme") | ||
} | ||
if cmd.Flags().Changed("tls-ca") { | ||
return errors.New("--tls-ca option not supported for ssh scheme") | ||
} | ||
return ssh.Create(entities, sshMode) | ||
case "unix": | ||
if cmd.Flags().Changed("identity") { | ||
return errors.New("--identity option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-cert") { | ||
return errors.New("--tls-cert option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-key") { | ||
return errors.New("--tls-key option not supported for unix scheme") | ||
} | ||
if cmd.Flags().Changed("tls-ca") { | ||
return errors.New("--tls-ca option not supported for unix scheme") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the same conditions. Move them outside the switch case and match if scheme != tcp
so you do not duplicate that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…terial for TCP remotes Signed-off-by: Andrew Melnick <[email protected]>
* Moved cert bundle read to separate package * Removed duplication for non-tcp tls flag check * Added tls info to `system connection list` * Removed TCP warning if TLS is enabled * Fixed not using TLS when using ABI instead of remote * Added central check for cert without key or vice-versa Signed-off-by: Andrew Melnick <[email protected]>
cceab7f
to
0748cfe
Compare
Of course. I was hoping to have my environment capable of passing tests on main before I went on vacation last week, but that didn't end up happening. After throwing up my hands and just making a fresh CentOS VM, I have that working now. I'd like to use any existing tests for SSH/TCP remotes as a basis for my tests, as well as ensure my tests have parity with them, but for the life of me, I cannot seem to find any such tests. Would you mind pointing me to where they are? |
@Luap99 sorry, just back to this. This line: https://github.com/containers/podman/pull/24601/files#diff-cf5cc76e70a369c5221f08f29ec0452710af1f4ecd423c162d5b669688416caeR525 seemed to indicate that you were looking for the directory where the private key resigned. Unless I misinterpreted, that doesn't feel right. |
Yes, the purpose of this line is to create a command line argument so that the user can provide the path to their TLS client private key, which is necessary to perform authentication. Unless I too have misinterpreted something else in the codebase, neither that path, nor especially not the data in that file, ever goes over the wire. This is the analog of the |
* Fixed attach endpoint client using unencrypted socket when TLS is enabled * Duplicated libpod remote unix socket e2e suite for plaintext TCP, TLS, and mTLS Signed-off-by: Andrew Melnick <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: meln5674 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Apologies for the delay on this, a number of other items were pushed onto my priority stack, but I've been able to resume work on it. As mentioned, there weren't any substantial e2e tests for the TCP API, and the libpod remote test suite was entirely using the unix socket, so I have refactored it to allow running it for unix, plain TCP, TLS, and mTLS. This might seem excessive, but it was only by running this complete suite did I find additional issues I had to fix. Suggestions are welcome for how to achieve similar coverage with fewer tests. Speaking of said issues, it would seem that the container attach client has a subtle race condition in it, as well as a memory leak. By overwriting the http transport and capturing the net.Conn as a local variable, if the same Connection is used concurrently, there is a possibility that the first net.Conn will be overwritten on the first attach call's local variable, resulting in the two calls proxying each-other's streams, or just outright corrupting them. As well, because the original transport is never restored, if the same client is used to attach to multiple containers, the closure is never freed, meaning the dial function will call an ever-increasing stack of closures. Fixing this issue felt outside the scope of this PR, but I have added a note warning of it to whoever feels like addressing it in the future. I would suggest adding some sort of Finally, there were a number of other e2e tests that were failing in my fresh centos 9 environment, even on main, so I have simply ignored them for the time being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments, I need more time to look at the test changes. But we definitely cannot just explode the matrix and run 4 times the amount.
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_unix_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_tcp_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_tls_testing" | ||
$(MAKE) ginkgo-run TAGS="$(REMOTETAGS) remote_testing remote_mtls_testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not acceptable to me, it is certainly great to force coverage. But we run this many times on each Pr. A 4x time increase is not acceptable.
What we can consider is some split testing, we run the test on fedora rawhide, 41, 40 and debian sid so technically would could wire this up in CI ro run each case on a different distro to not add any new overhead will still getting full coverage. The transport layer should certainly not care about the distro (except underlying kernel bugs of course) so I think that may be best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this was a "nuclear option" in order to get tests to run with the least amount of change to the tests themselves, and to make sure that I wasn't missing anything.
One thought I had in the interim was that ginkgo allows tagging tests and sets of tests, and selecting a subset of them on the command line. Using this, it would be possible to run a full remote tests via unix sockets, and then a subset of all tests against tcp, tls, and mtls. It would then also be possible to add a separate target to run all tests in all remotes, but wouldn't be run in CI, only on-demand in development environments.
The main thing I would need for this is guidance from the podman core team on which tests they believed were critical to be tested over every possible remote, and which are "good enough" to only be tested over unix. For example, the attach endpoints would definitely need to be tested on all different remotes, as that's how I discovered the additional fixes I had to make.
@@ -118,19 +118,22 @@ func inspect(cmd *cobra.Command, args []string) error { | |||
rpt, err = rpt.Parse(report.OriginUser, format) | |||
} else { | |||
rpt, err = rpt.Parse(report.OriginPodman, | |||
"{{range .}}{{.Name}}\t{{.URI}}\t{{.Identity}}\t{{.Default}}\t{{.ReadWrite}}\n{{end -}}") | |||
"{{range .}}{{.Name}}\t{{.URI}}\t{{.Identity}}\t{{.TLSCAFile}}\t{{.TLSCertFile}}\t{{.TLSKeyFile}}\t{{.Default}}\t{{.ReadWrite}}\n{{end -}}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a breaking change. While nobody should relay on the order of the output and use --format if they use it in scripts we can never know.
I guess I could be convinced to add them as last keys after readwrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, and while I'm not sure if podman or the the umbrella containers project has strict guidelines on the topic, I've always considered "human readable" data to be excluded from breaking changes. I would no sooner worry about breaking scripts that scrape this output than scripts that break if a new log message was added.
BeforeEach(setupConnectionsConf) | ||
|
||
Context("without running API service", func() { | ||
It("add ssh://", func() { | ||
cmd := []string{"system", "connection", "add", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these changes are done by your formatter? Can you revert them, they bloat the diff unnecessarily and make reviewing harder
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Add flags
--tls-cert
,--tls-key
,--tls-ca
/--tls-client-ca
to the commandspodman remote
,podman system service
, andpodman system connection add
to support serving the API socket using TLS and mTLS, as well as connecting to such a socket.This relies on containers/common#2249 and will fail CI until merged.
Fixes #24583
Does this PR introduce a user-facing change?