-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Official API for parsing Cargo.toml
s schema
#12801
Comments
My hope is that we can switch to a I've updated string-rosetta-rs to include equality checks and everything but The main alternative to all of this is to make our types generic over the string type which I'd rather avoid because that gets messy. Next step, benchmarking cargo with the change from |
Edit: there is one more big advantage to
The thing that makes the equality check so fast is that it does not need to dereference the pointer. Before
A specific benchmark I would care about is building a large lock file. Currently all the benchmarks we have measure the time to build the project with a locked file. Furthermore, they all involve git dependencies so rebuilding a lock file is dominated by git. Which means that "just" adding a "generate lock file" call does not tell us about regressions to the performance of the resolver. |
Note that arcstr's performance as currently shown in the linked repo is a bit misleading: it only has that cheap clone when comparing strings cloned from a shared arcstr. As soon as you compare any strings that aren't that special case, you'll get the usual O(n) comparison of the string data. If you want to avoid string comparisons when comparing strings, you need some form of interning. It can be pointer based (store strings, compare pointers) instead of purely symbolic (store indices, ask cache for real string), but you do need to deduplicate the strings somehow (i.e. intern them) to be able to exclusively test pointer equality. |
This depends on how much we are doing comparisons of the data from the same source or not.
We have string interning today. The problem is that I doubt people want that as part of the |
Just did a quick benchmark with $ ../dump/cargo-12801-bench.rs run
Finished dev [unoptimized + debuginfo] target(s) in 0.07s
Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run`
warning: `package.edition` is unspecified, defaulting to `2021`
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run`
$ hyperfine "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 119.6 ms ± 2.7 ms [User: 99.6 ms, System: 19.6 ms]
Range (min … max): 115.6 ms … 124.5 ms 25 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 118.9 ms ± 2.8 ms [User: 98.3 ms, System: 20.2 ms]
Range (min … max): 115.6 ms … 125.3 ms 25 runs
Summary
../cargo-new check ran
1.01 ± 0.03 times faster than ../cargo-old check
$ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 19.626 s ± 0.198 s [User: 151.846 s, System: 22.022 s]
Range (min … max): 19.348 s … 19.966 s 10 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 19.807 s ± 0.403 s [User: 152.382 s, System: 22.086 s]
Range (min … max): 19.469 s … 20.861 s 10 runs
Summary
../cargo-old check ran
1.01 ± 0.02 times faster than ../cargo-new check
$ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 20.864 s ± 0.217 s [User: 152.427 s, System: 22.366 s]
Range (min … max): 20.632 s … 21.227 s 10 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 20.934 s ± 0.509 s [User: 151.831 s, System: 22.348 s]
Range (min … max): 20.410 s … 21.871 s 10 runs
Summary
../cargo-old check ran
1.00 ± 0.03 times faster than ../cargo-new check Code:
#!/usr/bin/env nargo
```cargo
[dependencies]
lexopt = "0.3"
xshell = "0.2"
```
fn main() -> Result<(), Box<dyn std::error::Error>> {
let args = Args::parse_args()?;
for command in args.commands {
match command {
Command::Setup(args) => setup(args)?,
Command::Run => run()?,
}
}
Ok(())
}
fn setup(args: SetupArgs) -> Result<(), Box<dyn std::error::Error>> {
let sh = xshell::Shell::new()?;
let tmp = tmp();
let old_path = tmp.join("cargo-old");
let new_path = tmp.join("cargo-new");
let built_path = std::path::Path::new("target/release/cargo");
if args.hard && tmp.exists() {
std::fs::remove_dir_all(&tmp)?;
}
std::fs::create_dir_all(&tmp)?;
if !tmp.join("cargo").exists() {
let _dir = sh.push_dir(&tmp);
xshell::cmd!(sh, "git clone [email protected]:rust-lang/cargo.git").run()?;
}
xshell::cmd!(sh, "git checkout master").run()?;
xshell::cmd!(sh, "cargo build --release").run()?;
dbg!(built_path);
dbg!(&old_path);
std::fs::copy(built_path, &old_path)?;
xshell::cmd!(sh, "git checkout intern").run()?;
xshell::cmd!(sh, "cargo build --release").run()?;
std::fs::copy(built_path, &new_path)?;
xshell::cmd!(sh, "git checkout master").run()?;
Ok(())
}
fn run() -> Result<(), Box<dyn std::error::Error>> {
let sh = xshell::Shell::new()?;
let tmp = tmp();
let _dir = sh.push_dir(&tmp.join("cargo"));
xshell::cmd!(sh, "../cargo-old check").run()?;
xshell::cmd!(sh, "hyperfine '../cargo-old check' '../cargo-new check'").run()?;
xshell::cmd!(sh, "hyperfine --prepare 'cargo clean' '../cargo-old check' '../cargo-new check'").run()?;
xshell::cmd!(sh, "hyperfine --prepare 'cargo clean && rm -f Cargo.lock' '../cargo-old check' '../cargo-new check'").run()?;
Ok(())
}
fn tmp() -> std::path::PathBuf {
"target/tmp/bench".into()
}
enum Command {
Setup(SetupArgs),
Run,
}
struct SetupArgs {
hard: bool
}
struct Args {
commands: Vec<Command>,
}
impl Args {
fn parse_args() -> Result<Self, lexopt::Error> {
use lexopt::prelude::*;
let mut args = Self {
commands: Default::default(),
};
let mut parser = lexopt::Parser::from_env();
while let Some(arg) = parser.next()? {
match arg {
Value(command) => {
let command = command.parse_with(|command| match command {
"setup" => Ok(Command::Setup(SetupArgs{
hard: false
})),
"run" => Ok(Command::Run),
_ => Err("invalid command, needed `setup`, `run`"),
})?;
args.commands.push(command);
}
Long("hard") => {
if let Some(Command::Setup(args)) = args.commands.last_mut() {
args.hard = true;
} else {
return Err(arg.unexpected());
}
}
Long("soft") => {
if let Some(Command::Setup(args)) = args.commands.last_mut() {
args.hard = false;
} else {
return Err(arg.unexpected());
}
}
_ => {
return Err(arg.unexpected());
}
}
}
Ok(args)
}
} |
To double check my assumptions, I went ahead and ran the benchmark with $ hyperfine "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms]
Range (min … max): 115.6 ms … 124.3 ms 24 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms]
Range (min … max): 115.7 ms … 123.6 ms 24 runs
Summary
../cargo-old check ran
1.00 ± 0.03 times faster than ../cargo-new check
$ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s]
Range (min … max): 19.605 s … 21.123 s 10 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s]
Range (min … max): 19.764 s … 20.420 s 10 runs
Summary
../cargo-new check ran
1.01 ± 0.02 times faster than ../cargo-old check
$ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s]
Range (min … max): 20.156 s … 22.010 s 10 runs
Benchmark 2: ../cargo-new check
Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s]
Range (min … max): 20.703 s … 22.462 s 10 runs
Summary
../cargo-old check ran
1.01 ± 0.03 times faster than ../cargo-new check |
To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```
To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```
To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```
To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```
refactor(toml): Decouple parsing from interning system ### What does this PR try to resolve? To have a separate manifest API (#12801), we can't rely on interning because it might be used in longer-lifetime applications. To keep this limited in scope, this only removes `InternedString` from manifest parsing. Everything else still uses `InternedString`. ### How should we test and review this PR? I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. See [my comment](#12801 (comment)) for the script that managed the benchmarks. Benchmarks: <details> ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ``` </details> ### Additional information I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that.
Update cargo 7 commits in 708383d620e183a9ece69b8fe930c411d83dee27..b4d18d4bd3db6d872892f6c87c51a02999b80802 2023-10-27 21:09:26 +0000 to 2023-10-31 18:19:10 +0000 - refactor(toml): Cleanup noticed on the way to rust-lang/cargo#12801 (rust-lang/cargo#12902) - feat(trim-paths): set env `CARGO_TRIM_PATHS` for build scripts (rust-lang/cargo#12900) - feat: implement RFC 3127 `-Ztrim-paths` (rust-lang/cargo#12625) - docs: clarify config to use vendored source is printed to stdout (rust-lang/cargo#12893) - Improve the margin calculation for the search command's UI (rust-lang/cargo#12890) - Add new packages to [workspace.members] automatically (rust-lang/cargo#12779) - refactor(toml): Decouple parsing from interning system (rust-lang/cargo#12881) r? ghost
refactor(toml): Pull out the schema ### What does this PR try to resolve? On its own, this PR is a net negative for readability / complexity. It moves all of the serde types related to manifest from `toml/mod.rs` to `toml/schema.rs`, leaving a lot of the functions and some trait implementations back in `toml/mod.rs` (some basic functions that made sense for the type on their own were also moved). So why do this? This is an ooch towards having the schema broken out into a separate package (#12801). To do this, we need to - Identify what all types need to be put in the package. This refactor highlights the dependence on `RustVersion` and `PackageIdSpec` - Highlights what functionality we need to find a new home for Follow up PRs would - Find better homes for the logic in `toml/mod.rs`, moving us away from having `impl schema::Type` blocks. - Pull out a `src/cargo/util_semver` package to own `PartialVersion` (at least) as prep for making a `cargo-util-semver` package - Move `RustVersion` to `manifest-toml/schema.rs`, deciding what functionality needs to move with the type - Move or copy `PackageIdSpec` into `manfest-toml/schema.rs`, deciding what functionality remain in `core/` and what moves over - Move `toml/schema.rs` to `src/cargo/util_schema` - Actually make `cargo-util-semver` and `cargo-util-manifest-schema` packages So why do this now? This is a big change! By being incremental - Reduce churn for me and others - Be easier to review - Collect feedback as we go on the whole plan to avoid more painful changes later We *can* back this out if needed but the further we go, the more painful it will be. ### How should we test and review this PR? ### Additional information
refactor(toml): Pull out the schema ### What does this PR try to resolve? On its own, this PR is a net negative for readability / complexity. It moves all of the serde types related to manifest from `toml/mod.rs` to `toml/schema.rs`, leaving a lot of the functions and some trait implementations back in `toml/mod.rs` (some basic functions that made sense for the type on their own were also moved). So why do this? This is an ooch towards having the schema broken out into a separate package (#12801). To do this, we need to - Identify what all types need to be put in the package. This refactor highlights the dependence on `RustVersion` and `PackageIdSpec` - Highlights what functionality we need to find a new home for Follow up PRs would - Find better homes for the logic in `toml/mod.rs`, moving us away from having `impl schema::Type` blocks. - Pull out a `src/cargo/util_semver` package to own `PartialVersion` (at least) as prep for making a `cargo-util-semver` package - Move `RustVersion` to `manifest-toml/schema.rs`, deciding what functionality needs to move with the type - Move or copy `PackageIdSpec` into `manfest-toml/schema.rs`, deciding what functionality remain in `core/` and what moves over - Move `toml/schema.rs` to `src/cargo/util_schema` - Actually make `cargo-util-semver` and `cargo-util-manifest-schema` packages So why do this now? This is a big change! By being incremental - Reduce churn for me and others - Be easier to review - Collect feedback as we go on the whole plan to avoid more painful changes later We *can* back this out if needed but the further we go, the more painful it will be. ### How should we test and review this PR? ### Additional information
For `cargo install` we'll now show a more specific parse error for semver, much like other parts of cargo. This came out of my work on rust-lang#12801. I was looking at what might be appropriate to put in a `cargo-util-semver` crate and realized we have the `ToSemver` trait that exists but doesn't do much, so I dropped it.
This came out of my work on rust-lang#12801. I was looking at what might be appropriate to put in a `cargo-util-semver` crate and realized we have the `VersionExt` trait that exists but doesn't do much, so I dropped it.
For `cargo install` we'll now show a more specific parse error for semver, much like other parts of cargo. This came out of my work on rust-lang#12801. I was looking at what might be appropriate to put in a `cargo-util-semver` crate and realized we have the `ToSemver` trait that exists but doesn't do much, so I dropped it.
For `cargo install` we'll now show a more specific parse error for semver, much like other parts of cargo. This came out of my work on rust-lang#12801. I was looking at what might be appropriate to put in a `cargo-util-semver` crate and realized we have the `ToSemver` trait that exists but doesn't do much, so I dropped it.
Originally for rust-lang#12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
Originally for rust-lang#12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
Originally for rust-lang#12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
Originally for rust-lang#12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
Originally for rust-lang#12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
refactor(schemas): Pull out mod for proposed schemas package Originally for #12801 we talked about a `cargo-util-manifest-schema` package - `util` in the name to not clash with plugins - manifest specific to keep the scope down The problem is we have types that aren't manifest specific, like - `PartialVersion` (currently slated for `cargo-util-semverext`) - `RustVersion` - `PackageIdSpec` - `SourceKind` (soon) Things get messy if we try to break things down into common packages. Instead, I think it'd be useful to have a schemas package that has mods for each type of schema, re-exporting what is needed. Normally, componentizing your package by the layer in the stack is a recipe for pain. I don't think that'll apply here because these are meant to be so low level. The other big concern could be compile times. My hope is it won't be too bad. So this moves the `util/toml` types into the module and we can add more in the future.
This is a step towards moving `PackageIdSpec` into `schema` as manifests need it as part of rust-lang#12801. While I didn't take this approach in rust-lang#13080, that was largely how core these functions are / how pervasive their use is.
This is a step towards moving `PackageIdSpec` into `schema` as manifests need it as part of rust-lang#12801. While I didn't take this approach in rust-lang#13080, that was largely how core these functions are / how pervasive their use is.
refactor: Pull PackageIdSpec into schema ### What does this PR try to resolve? This removes one of the remaining ties `util_schemas` has on the rest of cargo as part of #12801. ### How should we test and review this PR? This is broken up into small commits ### Additional information
refactor(schema): Remove reliance on cargo types ### What does this PR try to resolve? This is another step towards #12801. The only thing left is `validate_package_name` which I left out because I want to explore ways of handing that and don't want this commit blocked on it ### How should we test and review this PR? ### Additional information
fix: Fill in more empty name holes ### What does this PR try to resolve? This is a follow up to #13152 and expands on work done in #12928. This is also part of #12801 as I am refactoring how we do validation so that it parts of it can move into the schema, removing the last dependency the schema has on the rest of cargo. ### How should we test and review this PR? This prevents empty registry names which should be fine for the same reason as preventing empty feature names in #12928, that this was a regression due to changes in toml parsers ### Additional information
refactor: Pull name validation into `util_schemas` ### What does this PR try to resolve? In preparation for #12801, this moves the last dependency on the rest of the `cargo` crate into the future `util_schemas` crate. It does this by refocusing the code from being validation functions to being newtypes. I did not try to thread the newtypes everywhere, that can be an exercise for the future. There are places I didn't put newtypes because it didn't seem worth it (e.g. places needing `StringOrVec`, `ProfileName` not being used in CLI because of the `check` psuedo-profile, etc) The main risk with this is when validation changes according to a nightly feature, like packages-as-namespaces. My thought is that the validation code would be updated for the nightly behavior and then the caller would check if it isn't nightly and fail in that case. This has a side effect of improving the error messages for manifest parsing because we will show more context ### How should we test and review this PR? #13164 should be reviewed first ### Additional information
refactor(schemas): Pull out as `cargo-util-schemas` ### What does this PR try to resolve? Fixes #12801 ### How should we test and review this PR? See the individual commits for further justifications on the changes ### Additional information
Thank you for working on this! 💚 💙 💜 💛 ❤️ |
Problem
Something is needed by crates.io for creating index entries, rather than reading the json blob which might be out of sync. They switched to community developed crates but this creates hidden relationships when making manifest changes (rust-lang/crates.io#6914).
Proposed Solution
Bigger requests, like #4614, are blocked on the fact that the code ties into everything within cargo. So instead, let's get an incremental step by just putting out the schemas, with no business logic.
Notes
Steps
mod util_semver
#12940util_schemas
#13166cargo-util-schemas
#13178Related work
See also #2259
The text was updated successfully, but these errors were encountered: