Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto-build: optionally unbundle protoc #576

Closed
wants to merge 2 commits into from

Conversation

benesch
Copy link

@benesch benesch commented Dec 22, 2021

This is a proposed solution to #575. I expect there will be lots of opinions here, but I at least wanted to whip something up as a show of good faith! With something like this in place we'd be able to take the dependency on prost in MaterializeInc/materialize.


This commit makes it possible to opt out of downloading the bundled
pre-compiled protoc binaries. This is useful for downstream users with
strict supply chain security requirements (see #575 for details).

The approach taken is to move the pre-compiled binaries to a new crate
called protoc-bin. prost-build depends on this crate only if the
protoc-vendored-bin feature is enabled (which it is by default).
While it was always possible to avoid the vendored binaries by
specifying the PROTOC environment variable, the feature is a bit more
foolproof; security-concious users can audit their Cargo.lock to ensure
that the protc-bin crate is never even downloaded.

This commit also adds a new feature called protoc-vendored-src, which
uses the new protobuf-src crate to compile a copy of protoc from
source on the fly. This feature makes cargo build seamless on all
platforms that Protobuf supports, not just those that protoc-bin
distributed pre-compiled binaries for, but has the downside of requiring
a C++ toolchain and increasing compile times.

If neither of the new vendored features are enabled, prost-build fails
unless PROTOC and PROTOC_INCLUDE are explicitly specified.

This commit makes it possible to opt out of downloading the bundled
pre-compiled protoc binaries. This is useful for downstream users with
strict supply chain security requirements (see tokio-rs#575 for details).

The approach taken is to move the pre-compiled binaries to a new crate
called `protoc-bin`. prost-build depends on this crate only if the
`protoc-vendored-bin` feature is enabled (which it is by default).
While it was always possible to avoid the vendored binaries by
specifying the `PROTOC` environment variable, the feature is a bit more
foolproof; security-concious users can audit their Cargo.lock to ensure
that the protc-bin crate is never even downloaded.

This commit also adds a new feature called `protoc-vendored-src`, which
uses the new `protobuf-src` crate to compile a copy of protoc from
source on the fly. This feature makes `cargo build` seamless on all
platforms that Protobuf supports, not just those that protoc-bin
distributed pre-compiled binaries for, but has the downside of requiring
a C++ toolchain and increasing compile times.

If neither of the new vendored features are enabled, `prost-build` fails
unless `PROTOC` and `PROTOC_INCLUDE` are explicitly specified.
@tmfink
Copy link

tmfink commented Dec 24, 2021

This PR is a good first step to solving the supply chain security concerns in #562 and #575!

I have some comments below.

Environment variables instead of cargo features

Only the "root" of the dependency of graph (AKA the executable being built) should have control over how protoc is acquired. As such, controlling via environment variables is preferable to cargo features.

Each crate A that depends on prost (even transitively) would need to "forward" cargo features. This would allow any crate B that depends on A to itself expose (and forward to A) the necessary "prost" features (and so on). This adds more work for each crate that includes prost in the dependency graph.

Also, there is no good way to resolve conflicting features. Due to how cargo resolves feature unification, the prost crate may be compiled with with the features to use source and binary protoc.

The end-user is the one who can make the best security decisions as to how protoc is acquired. There is no reason to allow the transitive dependencies to have influence on the decision. The best way I know to allow that is by controlling the functionality via environment variables.

Fail closed

This would be a breaking change, but I vote that this feature "fails closed" (be secure-by-default).

For example:

  1. If PROST_BUILD_PROTOC_FROM_SRC is set, then build from source
  2. If PROTOC is set, use the binary.
  3. If protoc is found on the system (maybe in PATH?) then use that
  4. If PROST_USE_BUNDLED_BINARIES is set, then use the supplies binaries.
  5. Otherwise, panic with a descriptive error message.

The openssl-sys crate does a good job of emitting a descriptive error message when OpenSSL cannot be found:
https://github.com/sfackler/rust-openssl/blob/b3dde76ea28ca5e54ce6341927c96679b9921153/openssl-sys/build/find_normal.rs#L98-L112

On a side note, I think it's reasonable to expect that a user has a C++ compiler available to compile protobuf from source. If the user is already compiling Rust, then they probably have a C++ compiler. A C compiler is essentially required to link Rust binaries (which is probably accompanied by a C++ compiler).

@benesch
Copy link
Author

benesch commented Dec 24, 2021

Only the "root" of the dependency of graph (AKA the executable being built) should have control over how protoc is acquired.

I agree wholeheartedly with this.

Unfortunately it's a nonstarter for us to move to a model where environment variables are required for a successful compile. We maintain the invariant for Materialize that a cargo build is all you need to compile (assuming you've got CMake, a C++ toolchain, and a Rust toolchain installed). We don't want our users and developers to have to type PROST_BUILD_PROTOC_FROM_SRC=1 cargo build. It's a shame that Cargo doesn't allow root crates to specify env vars to set before invoking any build scripts, because that would neatly solve the problem.

The end-user is the one who can make the best security decisions as to how protoc is acquired. There is no reason to allow the transitive dependencies to have influence on the decision. The best way I know to allow that is by controlling the functionality via environment variables.

I think using features is unavoidable, though, because otherwise you'll be stuck downloading both the binary blobs and compiling protobuf from source, even if you know you only want one or the other (or neither). I.e., you can't depend on the protobuf-src crate only if an environment variable is set; you're always going to compile the thing from source so that it's available in case the env var says you need it. And then you end up with a dependency on a C++ compiler even if you were planning to use the binaries.

I think it makes sense to expose environment variables as additional layers of control, though, on top of the features that are selected. That could allow you to enable both the protoc-vendored-bin and protoc-vendored-src features (sidestepping Cargo's mutually-exclusive feature morass) and toggle which one you want via an environment variable.

I do wonder if the Prost maintainers and users would be happy enough with the default always being to compile protobuf from source. I agree that a C++ compiler is not terribly much to ask; indeed you've probably already got one if you've got a C compiler.

Somewhat relatedly, I'm working on C++ bindings for libprotobuf that will make it possible to "invoke" protoc by just calling into the appropriate C++ APIs. That could prove useful here.

@tmfink
Copy link

tmfink commented Dec 24, 2021

It's a shame that Cargo doesn't allow root crates to specify env vars to set before invoking any build scripts, because that would neatly solve the problem.

I think you can set both cfg flags and env vars from build scripts.

I think using features is unavoidable, though, because otherwise you'll be stuck downloading both the binary blobs and compiling protobuf from source, even if you know you only want one or the other (or neither). I.e., you can't depend on the protobuf-src crate only if an environment variable is set; you're always going to compile the thing from source so that it's available in case the env var says you need it.

By setting cfg flags in a build script, you can at least avoid compiling protobuf-src (but not downloading the dependency).

I do wonder if the Prost maintainers and users would be happy enough with the default always being to compile protobuf from source. I agree that a C++ compiler is not terribly much to ask; indeed you've probably already got one if you've got a C compiler.

I agree. If you are already in a mode of "compiling code" (AKA prost), then I think you should be okay compiling dependencies.

Could you update your PR to use protobuf-src by default? That way, there does not need to be a PROST_BUILD_PROTOC_FROM_SRC variable/feature. The only feature would be the non-default "protbuf-bin" feature.

@benesch
Copy link
Author

benesch commented Feb 28, 2022

By setting cfg flags in a build script, you can at least avoid compiling protobuf-src (but not downloading the dependency).

I don’t think it works like that. The cfg flags and env vars you set in a build script only apply to the one crate that is being built.

Could you update your PR to use protobuf-src by default? That way, there does not need to be a PROST_BUILD_PROTOC_FROM_SRC variable/feature. The only feature would be the non-default "protbuf-bin" feature.

I’m happy to do that, but I’d like @LucioFranco or another maintainer could weigh in on their desired approach before I take the time!

Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on reviewing this! I'd like to get something like this in the next breaking release. I left a few questions nothing blocking, I think this approach looks pretty good.


Could you update your PR to use protobuf-src by default? That way, there does not need to be a PROST_BUILD_PROTOC_FROM_SRC variable/feature. The only feature would be the non-default "protbuf-bin" feature.

I’m happy to do that, but I’d like @LucioFranco or another maintainer could weigh in on their desired approach before I take the time!

Honestly, on a pure user experience end I would not want to default to compiling c++ code since this is usually quite a hit to the build experience. That said, in practice I am not sure if it would be that big of a deal.

I'd love to hear more from people more integrated in the build process to chime in.

cc @jonhoo

Comment on lines +1 to +16
[![Documentation](https://docs.rs/prost-build/badge.svg)](https://docs.rs/prost-build/)
[![Crate](https://img.shields.io/crates/v/prost-build.svg)](https://crates.io/crates/prost-build)

# `prost-build`

`prost-build` makes it easy to generate Rust code from `.proto` files as part of
a Cargo build. See the crate [documentation](https://docs.rs/prost-build/) for examples
of how to integrate `prost-build` into a Cargo project.

## License

`prost-build` is distributed under the terms of the Apache License (Version 2.0).

See [LICENSE](../LICENSE) for details.

Copyright 2017 Dan Burkert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole file needs to be updated for the protoc-bin crate?

@@ -0,0 +1,59 @@
//! `protoc-bin` vendors `protoc` binaries for common architectures.
//!
//! It is intended for use as a build dependency for libraries like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! It is intended for use as a build dependency for libraries like
//! It is intended for use as a build dependency for libraries like

//! * Linux aarch64
//! * macOS x86_64
//! * macOS aarch64
//! * Windows 32-bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No windows 64-bit?

@LucioFranco
Copy link
Member

I wrote an update here #575 (comment) I think for now we can close this PR and start with a new one. Thanks for taking the start on this @benesch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants