-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider recalibrating how bits are divided in Span #413
Comments
Rustc enforces a file size limit of 4 GB, so a token cannot be bigger than that. use std::fs::File;
use std::io::Write as _;
fn main() {
let buf = vec![b' '; 1024 * 1024];
let mut file = File::create("spanoverflow.rs").unwrap();
file.write_all(b"fn main() {\n").unwrap();
for _ in 0..4100 {
file.write_all(&buf).unwrap();
}
file.write_all(b"}\n").unwrap();
} $ ls -lh spanoverflow.rs
-rw-r--r-- 1 dtolnay users 4.1G Oct 9 18:41 spanoverflow.rs
$ rustc spanoverflow.rs
fatal error: rustc does not support files larger than 4GB |
There appears to be no limit on the total amount of text parsed by rustc, even though its internal representation for BytePos is 32 bits. https://github.com/rust-lang/rust/blob/1.73.0/compiler/rustc_span/src/lib.rs#L2010-L2014 If you parse more than 232 bytes, it overflows and you get bogus spans referring to the wrong files. use std::fs::File;
use std::io::Write as _;
fn main() {
let buf = vec![b' '; 1024 * 1024];
let mut file = File::create("spanoverflow.rs").unwrap();
file.write_all(b"mod module;\n").unwrap();
for _ in 0..2050 {
file.write_all(&buf).unwrap();
}
file.write_all(b"fn main() {}\n").unwrap();
let mut file = File::create("module.rs").unwrap();
for _ in 0..2050 {
file.write_all(&buf).unwrap();
}
file.write_all(b"pub fn f() {}\n").unwrap();
} According to Item {
attrs: [],
id: NodeId(10),
span: spanoverflow.rs:2:4194319: 2:4194332 (#0),
ident: f#0,
kind: Fn( and this is the location of Item {
attrs: [],
id: NodeId(12),
span: /home/dtolnay/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:2952:2144473580: 2952:2144473592 (#0),
ident: main#0,
kind: Fn( The correct locations would be module.rs and spanoverflow.rs respectively, which you get if the files do not overflow 232 bytes total size. |
For scale, currently there is 200 GB of Rust code published on crates.io. Looking at just the newest version of every crate, it is 16 GB of code. So a workload that involves parsing this, even on multiple threads, would currently hit overflow. |
Currently fallback spans store a pair of 32-bit low and high character indices.
proc-macro2/src/fallback.rs
Lines 491 to 496 in fecb02d
A span in which
lo > hi
is malformed, so right off the bat, approximately half of possible Span bit patterns are wasted.Separately, tokens are usually small compared to the total amount of input parsed by a thread. If we switch to storing
lo
andhi - lo
instead oflo
andhi
, then an even split of 32 bits each may not be the wisest allocation. For example, we could decide to give 36 bits tolo
(supporting 64 GB input size) and 28 bits tohi - lo
(limiting token size to 256 MB). Or some other uneven split.The text was updated successfully, but these errors were encountered: