Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for a workspace that uses Cargo.toml files as the source of truth #5

Open
joneshf opened this issue Sep 14, 2023 · 13 comments
Open

Comments

@joneshf
Copy link

joneshf commented Sep 14, 2023

I'm having some troubles with a Cargo workspace that uses Cargo.toml files as the source of truth. Gazelle will generate third-party dependencies fine. But, I can't seem to get Gazelle to automatically generate the deps attributes properly for other workspace members.

Let's say I have a some Cargo.tomls like this:

# Cargo.toml
[workspace]
members = ["foo", "bar"]
resolver = "2"
# foo/Cargo.toml
[package]
name = "foo"
version = "0.1.0"
edition = "2021"

[dependencies]
bar = { version = "0.1.0", path = "../bar" }
# bar/Cargo.toml
[package]
name = "bar"
version = "0.1.0"
edition = "2021"

[dependencies]

And I import bar in foo:

// foo/src/lib.rs

use bar::*;

Whenever I run Gazelle, it does not find the dependency for bar:

$ bazel run //:gazelle
<snip>
gazelle: @test_workspace//foo/src:lib: no match for bar

I'm not 100% sure I've got the workspace setup properly. As mentioned, Gazelle can find third-party dependencies alright (I use things like clap and tracing in the actual workspace). But, that might be handled differently from first-party dependencies.

Looking at the examples/tests, nothing seems to use Cargo.toml files as the source of truth with gazelle_rust. Is it that this use case hasn't been fleshed out yet? If not, any thoughts on what the issue could be?

@Calsign
Copy link
Owner

Calsign commented Sep 20, 2023

This plugin does not currently look at any Cargo.toml files. Most likely it has chosen to name both rust_library targets "lib" since that's the name of the source files. You can rename the one in bar/src/BUILD.bazel to bar and then it should resolve correctly.

@joneshf
Copy link
Author

joneshf commented Sep 20, 2023

Right on, that worked! Thanks for the workaround!

@joneshf
Copy link
Author

joneshf commented Sep 20, 2023

As far as the original issue, any thoughts on having it respect Cargo.tomls as the source of truth? From what I can tell, gazelle_rust has access to either the Cargo.lock or rules_rust's cargo lock file. And one of those seems to have enough information to make it happen. I'm relatively new to Rust, so forgive me if I'm not seeing the forest for the trees.

As I understand it–and as explained in this SO answerthe Rust hierarchy consists of:

The Bazel hierarchy consists of:

It seems that the mapping between the two is:

  • Rust workspaces map to Bazel workspaces.
  • Rust packages map to Bazel packages.
  • Rust crates map to Bazel targets.
  • Rust modules map (loosely) to files.

gazelle_rust is currently looking at files and determining where to place a Bazel package and which Bazel targets to generate. E.g. a foo/bar/baz.rs file generates either a rust_library or rust_binary Bazel target, and either puts those targets in the Bazel package //foo/bar if it exists, or creates a new Bazel package at //foo/bar. It seems like gazelle_rust could instead look at the Rust crate–by way of the Cargo.toml files–to determine where to place the Bazel package and which Bazel targets the package should have.

More visually, gazelle_rust is looking at something like this:

$ tree foo
foo
├── Cargo.toml
└── src
    └── lib.rs

1 directory, 2 files

And generating something like this:

tree foo
foo
├── Cargo.toml
└── src
    ├── BUILD.bazel
    └── lib.rs

1 directory, 3 files

Where foo/src/BUILD.bazel looks like:

load("@rules_rust//rust:defs.bzl", "rust_library")

rust_library(
    name = "lib",
    srcs = ["lib.rs"],
)

But gazelle_rust could instead generate something like this:

$ tree foo
foo
├── BUILD.bazel
├── Cargo.toml
└── src
    └── lib.rs

1 directory, 3 files

Where foo/BUILD.bazel looks like:

load("@rules_rust//rust:defs.bzl", "rust_library")

rust_library(
    name = "lib",
    crate_name = "foo",
    srcs = ["src/lib.rs"],
)

That seems to align better with conventions in both models, at least in my head. Does that seem like a thing you'd want to support in gazelle_rust? If so, would you want it to do it by default, or opt-in with a Gazelle directive (or something like that)?

@Calsign
Copy link
Owner

Calsign commented Sep 22, 2023

Agree with everything you've said. I would like to support cargo-compatible workspaces in some capacity, but there's multiple potential ways to do this and I haven't yet figured out precisely how it should work.

It seems that the mapping between the two is:

  • Rust workspaces map to Bazel workspaces.
  • Rust packages map to Bazel packages.
  • Rust crates map to Bazel targets.
  • Rust modules map (loosely) to files.

There's some more nuance I'd like to add here:

  • You can place multiple cargo workspaces into one bazel workspace. gazelle_rust supports this by allowing multiple gazelle:rust_cargo_lockfile directives in different subdirectories. But for most people it's probably 1:1.
  • Rust packages can have multiple binary crates but only one library crate. Bazel packages can have any number of binary or library crates (as well as tests, shared libraries, etc.).

Now there are multiple ways that one could imagine wanting their project's build to be set up with gazelle_rust:

  1. Pure-cargo: gazelle_rust updates Cargo.toml files based on the sources
  2. Generating from cargo: BUILD.bazel files are generated from Cargo.toml files, which they live next to; gazelle_rust doesn't read sources at all
  3. Partially generating from cargo: BUILD.bazel files are generated from sources, but use Cargo.toml files to determine crate structure
  4. Cargo-compatible generation: BUILD.bazel files are generated from sources without looking at Cargo.toml, but in a way that naturally mirrors the way that cargo projects are laid out (e.g. putting BUILD.bazel in the parent directory, not in src)
  5. Pure-bazel: targets are generated in a way that is most natural in bazel, and may not match the structure that cargo uses

Currently gazelle_rust does option 5. This is what I use for my personal projects that work with bazel, and also what I am prototyping for use at my company. It might be possible to maintain Cargo.toml files in parallel, but that isn't the goal. Using pure-bazel allows you to do fancy things like codegen and multi-language interop, which are much harder to do with cargo.

But what I find most interesting is that using bazel and gazelle, it is easy to make lots of small targets rather than large monolithic ones. In the projects that I have used gazelle for, I typically have targets with a small number of files (like <5), and multiple targets per package. This is what we use for C++ and Python at work, and I think this is definitely the right decision if you have a gazelle plugin that can manage all those targets for you. I haven't yet scaled a rust project to the point where this has felt necessary, but that's why I have a strong preference for making small targets over large ones. For rust, this means largely not making submodules, unless necessary due to dependency cycles. This ends up being entirely incompatible with cargo's project structure, which is why I gave up on cargo compatibility for the rust projects that I build with bazel.

That being said, maintaining Cargo.toml files is also valid and important for some people, so it's worth it for gazelle_rust to support that. Option 1 is a bit fanciful, and won't be supported anytime soon, but is theoretically possible. The question in my mind is which of options 2, 3, and 4 should be supported.

I think option 2 is the easiest, but it does mean you have to update Cargo.toml before you can get a working build, so gazelle_rust is basically just a way to maintain a bazel build in a workspace that is otherwise an unmanaged, pure-cargo one.

In practice gazelle_rust would operate in different modes, and we would probably control this with a directive like gazelle:rust_mode generate_from_cargo. (This might be doable on a per-directory basis, allowing a mix of pure-bazel and cargo-generated targets in the same workspace. I'm not sure how useful that is.) I think the pure-bazel mode should be default since it is strictly more powerful than the generate-from-cargo mode.

There's also other things that gazelle_rust should support, like automatically grouping child modules into the parent target. I guess I haven't added support for this because I mostly create separate crates and not modules, but this is required for the pure-bazel to be correct as well.

Is there a reason we would want options 3 or 4, or perhaps something else?

@joneshf
Copy link
Author

joneshf commented Oct 7, 2023

Thank you for the very thorough reply!

Without multiple years of experience writing Rust, I would actually lean toward the next option to add support for being option 3. As mentioned, I'm still fairly new at writing Rust. I don't know if my thoughts are colored by my previous experiences, or if they're in line with the bigger picture. So, take them for what they're worth. That said, I think having both the manifest and the sources as inputs for gazelle_rust can provide the right balance.

For option 2, the thing that pops to mind is unused_crate_dependencies. If gazelle_rust reads source files, it can collect information about the actually used dependencies from the source files. Dependencies in a Cargo.toml are package-wide. If gazelle_rust only reads Cargo.toml files, it has no way of knowing which crates actually need which dependencies. So the best it can do with the deps attribute on a rust_binary or rust_library is guess that it needs all of them. That's effectively what the rules_rust all_crate_deps macro does. And it means that unused_crate_dependencies will most likely be a violation if a rust_binary and rust_library are in the same Bazel package (because those two targets generally won't have the same set of dependencies). Similar if multiple rust_binarys share a Bazel package: the dependencies might not be the same.

I also don't know enough about Rust yet to know if there are other pieces of information that can only be found within source files. If so, that's another point against option 2. But if not, maybe trying to find a resolution to unused_crate_dependencies would be fruitful?

For option 4, I get the sense that gazelle_rust would have to put in a good amount of effort explaining the opinions it's making about Cargo.toml. There are bound to be features of Cargo that aren't going to be implemented in gazelle_rust that would need to be documented, and that seems like chasing the tail. Or, there will be something in a Cargo.toml file that isn't possible to infer from the sources. Either way, option 4 feels like a path to making an opinionated build tool that is neither Bazel nor Cargo. I don't have enough familiarity with Cargo to say whether that's a good idea or not.

For option 3, it seems to be the option that provides the least amount of friction. If I'm understanding it properly, someone would be able to take most any Cargo project, run through the setup instruction in the README, and have bazel build work without having to learn a different model for organizing code. I think that's the persona I've been thinking about without saying it explicitly. What I really want is to introduce gazelle_rust at work, but I know that people aren't going to want to restructure their Rust code to make gazelle_rust work. I also know they aren't going to want to have to edit the generated targets to make things work.

Option 3 feels like it's in line with how Gazelle works with Go: you write Go code like normal, and don't have to think about what's generated in the vast majority of the cases. Granted, Go has a simpler manifest and package structure convention. But, I hope the point is coming across.

@joneshf
Copy link
Author

joneshf commented Oct 7, 2023

But what I find most interesting is that using bazel and gazelle, it is easy to make lots of small targets rather than large monolithic ones. In the projects that I have used gazelle for, I typically have targets with a small number of files (like <5), and multiple targets per package. This is what we use for C++ and Python at work, and I think this is definitely the right decision if you have a gazelle plugin that can manage all those targets for you. I haven't yet scaled a rust project to the point where this has felt necessary, but that's why I have a strong preference for making small targets over large ones. For rust, this means largely not making submodules, unless necessary due to dependency cycles. This ends up being entirely incompatible with cargo's project structure, which is why I gave up on cargo compatibility for the rust projects that I build with bazel.

This sounds very similar to what I started doing for a PureScript integration with Bazel a while back. Started rolling it out at a previous job, and it showed potential. Being able to incrementally compile individual modules, and cut-off the build when possible was a huge increase in build efficiency. Having the ability to break down a PureScript package into its constituent modules was great for really narrowing down the actual dependencies you had to think about (and Bazel had to compile). We could put the code in a structure that made sense, instead of being forced to follow the convention of putting all the source files in a src directory, and all the test files in a test directory because someone long ago made a decision based on their biases.

Something that I realized at that job (and that is reinforced at my current job) is that most people see Bazel as a means to an end. It would be nice to have everything work through the bazel CLI so you wouldn't have to worry about differences between languages. But, the vast majority of people aren't going to learn a bunch of bazel commands while they still need to do things like go mod tidy, or cargo add … in order to have other tooling work that doesn't understand Bazel well enough (like IDEs). I hope for a world where there is one actual tool we can all use.

I think option 5 (the current behavior) is the right way to go. But I also recognize that until Bazel (or something else) eats the world, the other tools are going to need to exist, people are going to interact with them, and I don't have the energy to change minds.

@joneshf
Copy link
Author

joneshf commented Oct 7, 2023

As far as what to do next, I know I said that option 3 seems like the best, but I'd also be fine with option 2 for now and work around the unused_crate_dependencies. Not ideal, but I think having gazelle_rust do stuff is an incremental improvement. Is there more you'd like to think about or should I start sending PRs?

@Calsign
Copy link
Owner

Calsign commented Oct 18, 2023

Thanks for the thoughts. There's a lot here and I'll think about this for a bit.

I am convinced now that option 3 is the way to go. I think the approach is basically make a BUILD.bazel next to every Cargo.toml, use cargo_toml to parse Cargo.toml, and then read all the sources from Cargo.toml and use that information to populate the generated targets. This should work fine. I'll start working on it.

But I also recognize that until Bazel (or something else) eats the world, the other tools are going to need to exist, people are going to interact with them, and I don't have the energy to change minds.

At my job we have tried our dardnest to make this a reality for C++ and Python, to large success. I am experimenting with the same for Rust and I think it's pretty close.

@Calsign
Copy link
Owner

Calsign commented Oct 21, 2023

I've added experimental support for generating from Cargo.toml as described above in 04e5450. It can be enabled with # gazelle:rust_mode generate_from_cargo.

I also added a bootstrap_from_cargo.py script which attempts to setup and build an existing cargo project with bazel. I've tested with this on some of my own projects and several open source projects. It works OK, but there are still some pretty big limitations - there's a bunch of issues with crate_universe, there's no support for conditional dependencies, and there are a couple of bugs I haven't figured out yet. But when it does work it's pretty magical.

I think the real test is whether it's usable at scale in a hybrid cargo/bazel repo, but that requires time and investment.

@joneshf
Copy link
Author

joneshf commented Oct 30, 2023

That all sounds great! Thanks for implementing it. I'll give it a whirl when I get a minute.

@obmarg
Copy link

obmarg commented Dec 16, 2023

I've been attempting to build an existing rust project of mine with bazel. I had hoped to use gazelle_rust to generate the BUILD files for this project, but the existing modes don't work very well for my use case.

Most of the crates in my project are open source crates so they need to be buildable by cargo, and ideally I'd like bazel to be building the same thing as cargo. In order for that to be the case I want a single source of truth for both of them.

It doesn't seem like that's possible with the current modes? Of the options mentioned towards the start of the issue I think Option 2 would work best for me - I'm already maintaining Cargo.toml files and if they could be the source of truth for everything that would work well.

Option 1 could potentially also work, though it looks like the parser in gazelle_rust would need to understand a lot more rust than it currently does for that to be effective. A lot of my projects use feature flags, platform specific cfg attributes, renamed modules etc. which I don't see in the current parser.rs file.

@Calsign
Copy link
Owner

Calsign commented Dec 25, 2023

@obmarg I believe the current approach as implemented could still work for you, although there are gaps.

In practice the detection of which crates are used in a given file should work quite well, and if it doesn't this would also be an issue when used in a pure-bazel context. I'm happy to fix such issues. But you've listed other gaps that are also issues with the pure-bazel version of gazelle_rust:

Platforms-specific cfg attributes - parsing this isn't too difficult (it's similar to the existing handling of test-only dependencies), but gazelle does not currently support merging expressions that contain arbitrary selects. (Limited support is available, but only for golang. My colleague has a proposal that would improve that situation: bazel-contrib/bazel-gazelle#1072). For this reason the best option I have at the moment is tricks like this: https://github.com/Calsign/gazelle_rust#ignoring-dependencies.

Feature flags - it's not clear to me how this should work. The current behavior is to effectively assume that all features are enabled for tracking dependencies, but not do anything to enable those features. With rules_rust, the features that are enabled for a crate are determined at the site of the rust_library, which is different from (and less ergonomic than) how cargo determines the enabled features for a crate by combining all enabled features across the entire workspace. One could imagine gazelle_rust handling features in various ways - adding dependencies based on the features currently enabled for the create; updating the crate_features attribute to contain all possible features defined in the source; or doing something fancy with selects (although that would run into the same issue as platform-specific cfg attributes). I'm not sure which would make the most sense here, but I'm open to suggestions.

Renamed modules - I assume you are referring to rust-lang/cargo#5653. rules_rust also supports this with the aliases attribute, but gazelle_rust does not currently support that. I think all that's needed here is adding support for aliases and, when in generate-from-cargo mode, extracting the aliases from Cargo.toml. Today you could achieve something similar by using gazelle:resolve.

Each of the above items potentially warrant their own issues.

I haven't tried maintaining a large rust project that builds with both cargo and bazel using gazelle_rust, so there may be a lot that I'm missing. But my experience with bazel-built projects has been that while these missing features hurt, in practice it is possible to work around them, and once those workarounds are implemented, they do not need to be changed frequently. So it's a matter of one's appetite for managing two parallel build systems.

@obmarg
Copy link

obmarg commented Jan 10, 2024

@obmarg I believe the current approach as implemented could still work for you, although there are gaps.

I don't think it can. I wasn't trying to suggest that the gazelle_rust approach couldn't be made to build my project. I was saying that I do not want two sources of truth for my builds, and that seems unavoidable with the current approach.

No worries if this isn't a use case you want to support - I ended up hacking together something closer to my preferred approach anyway. Just thought I might offer the feedback.

In practice the detection of which crates are used in a given file should work quite well, and if it doesn't this would also be an issue when used in a pure-bazel context. I'm happy to fix such issues. But you've listed other gaps that are also issues with the pure-bazel version of gazelle_rust:

Agreed - some of these are problems in a pure bazel world too.

Wasn't trying to suggest you couldn't fix the problems - I just think they could be sidestepped entirely with a different approach. Just my 2c though.

Feature flags - it's not clear to me how this should work. The current behavior is to effectively assume that all features are enabled for tracking dependencies, but not do anything to enable those features. With rules_rust, the features that are enabled for a crate are determined at the site of the rust_library, which is different from (and less ergonomic than) how cargo determines the enabled features for a crate by combining all enabled features across the entire workspace. One could imagine gazelle_rust handling features in various ways - adding dependencies based on the features currently enabled for the create; updating the crate_features attribute to contain all possible features defined in the source; or doing something fancy with selects (although that would run into the same issue as platform-specific cfg attributes). I'm not sure which would make the most sense here, but I'm open to suggestions.

Can't say which would be best here, but I think the active set of features needs to be user controlled somehow. If you're just enabling all the features that exist then why even have the features.

Renamed modules - I assume you are referring to rust-lang/cargo#5653. rules_rust also supports this with the aliases attribute, but gazelle_rust does not currently support that. I think all that's needed here is adding support for aliases and, when in generate-from-cargo mode, extracting the aliases from Cargo.toml. Today you could achieve something similar by using gazelle:resolve.

I was actually referring to using the path attribute to rename modules. Probably fairly easy to support that one, but there's other variations of the same problem:

  • Tonic has an include_proto function that pulls in a rust file from OUT_DIR. Going to be quite hard for your parser to detect that.
  • I've definitely done things similar to include!(concat!(env!("OUT_DIR"), "/helloworld.rs")); directly before.

But anyway, agree these are potentially separate issues, don't want to derail this thread too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants