-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible #51050
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Triage ping, @kennytm, this PR is waiting for your review. |
Thanks :) @bors r+ (I've checked that |
📌 Commit 96ce56d has been approved by |
🔒 Merge conflict |
std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible When reading a directory with `read_dir`, querying metadata for a resulting `DirEntry` is done by building the whole path and then `lstat`ing it, which requires the kernel to resolve the whole path. Instead, one can use the file descriptor to the enumerated directory and use `fstatat`. This make the resolving step unnecessary. This PR implements using `fstatat` on linux, android and emscripten. ## Compatibility across targets `fstatat` is POSIX. * Linux >= 2.6.19 according to https://linux.die.net/man/2/fstatat * android according to https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#392 * emscripten according to https://github.com/kripken/emscripten/blob/7f89560101843198787530731f40a65288f6f15f/system/include/libc/sys/stat.h#L76 The man page says "A similar system call exists on Solaris." but I haven't found it. ## Compatibility with old platforms This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10 The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine. ## Benefit I did the following silly benchmark: ```rust use std::io; use std::fs; use std::os::linux::fs::MetadataExt; use std::time::Instant; fn main() -> Result<(), io::Error> { let mut n = 0; let mut size = 0; let start = Instant::now(); for entry in fs::read_dir("/nix/store/.links")? { let entry = entry?; let stat = entry.metadata()?; size += stat.st_size(); n+=1; } println!("{} files, size {}, time {:?}", n, size, Instant::now().duration_since(start)); Ok(()) } ``` On warm cache, with current rust nightly: ``` 1014099 files, size 76895290022, time Duration { secs: 2, nanos: 65832118 } ``` (between 2.1 and 2.9 seconds usually) With this PR: ``` 1014099 files, size 76895290022, time Duration { secs: 1, nanos: 581662953 } ``` (1.5 to 1.6 seconds usually). approximately 40% faster :) On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit: Before ``` 1014099 files, size 76895290022, time Duration { secs: 391, nanos: 739874992 } ``` After ``` 1014099 files, size 76895290022, time Duration { secs: 388, nanos: 431567396 } ``` ## Testing The tests were run on linux `x86_64` ``` python x.py test src/tools/tidy ./x.py test src/libstd ``` and the above benchmark. I did not test any other target.
I rebased on today's master, but there was no merge conflict... |
Rollup of 12 pull requests Successful merges: - #51050 (std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible) - #51123 (Update build instructions) - #51127 (Add doc link from discriminant struct to function.) - #51146 (typeck: Do not pass the field check on field error) - #51147 (Stabilize SliceIndex trait.) - #51151 (Move slice::exact_chunks directly above exact_chunks_mut for more con…) - #51152 (Replace `if` with `if and only if` in the definition dox of `Sync`) - #51153 (Link panic and compile_error docs) - #51158 (Mention spec and indented blocks in doctest docs) - #51186 (Remove two redundant .nll.stderr files) - #51203 (Two minor `obligation_forest` tweaks.) - #51213 (fs: copy: Use File::set_permissions instead of fs::set_permissions) Failed merges:
@symphorien Unfortunately the new code failed to compile when
|
This should fix the failure. |
Thanks again, now let's retry... @bors r+ |
📌 Commit 8dec03b has been approved by |
std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible When reading a directory with `read_dir`, querying metadata for a resulting `DirEntry` is done by building the whole path and then `lstat`ing it, which requires the kernel to resolve the whole path. Instead, one can use the file descriptor to the enumerated directory and use `fstatat`. This make the resolving step unnecessary. This PR implements using `fstatat` on linux, android and emscripten. ## Compatibility across targets `fstatat` is POSIX. * Linux >= 2.6.19 according to https://linux.die.net/man/2/fstatat * android according to https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#392 * emscripten according to https://github.com/kripken/emscripten/blob/7f89560101843198787530731f40a65288f6f15f/system/include/libc/sys/stat.h#L76 The man page says "A similar system call exists on Solaris." but I haven't found it. ## Compatibility with old platforms This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10 The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine. ## Benefit I did the following silly benchmark: ```rust use std::io; use std::fs; use std::os::linux::fs::MetadataExt; use std::time::Instant; fn main() -> Result<(), io::Error> { let mut n = 0; let mut size = 0; let start = Instant::now(); for entry in fs::read_dir("/nix/store/.links")? { let entry = entry?; let stat = entry.metadata()?; size += stat.st_size(); n+=1; } println!("{} files, size {}, time {:?}", n, size, Instant::now().duration_since(start)); Ok(()) } ``` On warm cache, with current rust nightly: ``` 1014099 files, size 76895290022, time Duration { secs: 2, nanos: 65832118 } ``` (between 2.1 and 2.9 seconds usually) With this PR: ``` 1014099 files, size 76895290022, time Duration { secs: 1, nanos: 581662953 } ``` (1.5 to 1.6 seconds usually). approximately 40% faster :) On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit: Before ``` 1014099 files, size 76895290022, time Duration { secs: 391, nanos: 739874992 } ``` After ``` 1014099 files, size 76895290022, time Duration { secs: 388, nanos: 431567396 } ``` ## Testing The tests were run on linux `x86_64` ``` python x.py test src/tools/tidy ./x.py test src/libstd ``` and the above benchmark. I did not test any other target.
☀️ Test successful - status-appveyor, status-travis |
When reading a directory with
read_dir
, querying metadata for a resultingDirEntry
is done by building the whole path and thenlstat
ing it, which requires the kernel to resolve the whole path. Instead, onecan use the file descriptor to the enumerated directory and use
fstatat
. This make the resolving stepunnecessary.
This PR implements using
fstatat
on linux, android and emscripten.Compatibility across targets
fstatat
is POSIX.The man page says "A similar system call exists on Solaris." but I haven't found it.
Compatibility with old platforms
This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10
The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine.
Benefit
I did the following silly benchmark:
On warm cache, with current rust nightly:
(between 2.1 and 2.9 seconds usually)
With this PR:
(1.5 to 1.6 seconds usually).
approximately 40% faster :)
On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit:
Before
After
Testing
The tests were run on linux
x86_64
and the above benchmark.
I did not test any other target.