Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible #51050

Merged
merged 2 commits into from
May 31, 2018

Conversation

symphorien
Copy link
Contributor

When reading a directory with read_dir, querying metadata for a resulting DirEntry is done by building the whole path and then lstating it, which requires the kernel to resolve the whole path. Instead, one
can use the file descriptor to the enumerated directory and use fstatat. This make the resolving step
unnecessary.
This PR implements using fstatat on linux, android and emscripten.

Compatibility across targets

fstatat is POSIX.

The man page says "A similar system call exists on Solaris." but I haven't found it.

Compatibility with old platforms

This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10
The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine.

Benefit

I did the following silly benchmark:

use std::io;
use std::fs;
use std::os::linux::fs::MetadataExt;
use std::time::Instant;

fn main() -> Result<(), io::Error> {
    let mut n = 0;
    let mut size = 0;
    let start = Instant::now();
    for entry in fs::read_dir("/nix/store/.links")? {
        let entry = entry?;
        let stat = entry.metadata()?;
        size += stat.st_size();
        n+=1;
    }
    println!("{} files, size {}, time {:?}", n, size, Instant::now().duration_since(start));
    Ok(())
}

On warm cache, with current rust nightly:

1014099 files, size 76895290022, time Duration { secs: 2, nanos: 65832118 }

(between 2.1 and 2.9 seconds usually)
With this PR:

1014099 files, size 76895290022, time Duration { secs: 1, nanos: 581662953 }

(1.5 to 1.6 seconds usually).

approximately 40% faster :)

On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit:
Before

1014099 files, size 76895290022, time Duration { secs: 391, nanos: 739874992 }

After

1014099 files, size 76895290022, time Duration { secs: 388, nanos: 431567396 }                 

Testing

The tests were run on linux x86_64

python x.py test src/tools/tidy
./x.py test src/libstd 

and the above benchmark.
I did not test any other target.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 24, 2018
@frewsxcv frewsxcv added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 25, 2018
@TimNN
Copy link
Contributor

TimNN commented May 29, 2018

Triage ping, @kennytm, this PR is waiting for your review.

@kennytm
Copy link
Member

kennytm commented May 29, 2018

Thanks :) @bors r+

(I've checked that fstatat is supported on macOS 10.10+, so unfortunately it can't be applied to macOS since our minimum is 10.7 :(.)

@bors
Copy link
Contributor

bors commented May 29, 2018

📌 Commit 96ce56d has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 29, 2018
@bors
Copy link
Contributor

bors commented May 30, 2018

🔒 Merge conflict

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 30, 2018
kennytm added a commit to kennytm/rust that referenced this pull request May 30, 2018
std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible

When reading a directory with `read_dir`, querying metadata for a resulting `DirEntry` is done by building the whole path and then `lstat`ing it, which requires the kernel to resolve the whole path. Instead, one
can use the file descriptor to the enumerated directory and use `fstatat`. This make the resolving step
unnecessary.
This PR implements using `fstatat` on linux, android and emscripten.

## Compatibility across targets
`fstatat` is POSIX.
* Linux >= 2.6.19 according to https://linux.die.net/man/2/fstatat
* android according to https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#392
* emscripten according to https://github.com/kripken/emscripten/blob/7f89560101843198787530731f40a65288f6f15f/system/include/libc/sys/stat.h#L76

The man page says "A similar system call exists on Solaris." but I haven't found it.

## Compatibility with old platforms
This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10
The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine.

## Benefit
I did the following silly benchmark:
```rust
use std::io;
use std::fs;
use std::os::linux::fs::MetadataExt;
use std::time::Instant;

fn main() -> Result<(), io::Error> {
    let mut n = 0;
    let mut size = 0;
    let start = Instant::now();
    for entry in fs::read_dir("/nix/store/.links")? {
        let entry = entry?;
        let stat = entry.metadata()?;
        size += stat.st_size();
        n+=1;
    }
    println!("{} files, size {}, time {:?}", n, size, Instant::now().duration_since(start));
    Ok(())
}
```
On warm cache, with current rust nightly:
```
1014099 files, size 76895290022, time Duration { secs: 2, nanos: 65832118 }
```
(between 2.1 and 2.9 seconds usually)
With this PR:
```
1014099 files, size 76895290022, time Duration { secs: 1, nanos: 581662953 }
```
(1.5 to 1.6 seconds usually).

approximately 40% faster :)

On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit:
Before
```
1014099 files, size 76895290022, time Duration { secs: 391, nanos: 739874992 }
```
After
```
1014099 files, size 76895290022, time Duration { secs: 388, nanos: 431567396 }
```
## Testing
The tests were run on linux `x86_64`
```
python x.py test src/tools/tidy
./x.py test src/libstd
```
and the above benchmark.
I did not test any other target.
@symphorien
Copy link
Contributor Author

I rebased on today's master, but there was no merge conflict...

bors added a commit that referenced this pull request May 30, 2018
Rollup of 12 pull requests

Successful merges:

 - #51050 (std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible)
 - #51123 (Update build instructions)
 - #51127 (Add doc link from discriminant struct to function.)
 - #51146 (typeck: Do not pass the field check on field error)
 - #51147 (Stabilize SliceIndex trait.)
 - #51151 (Move slice::exact_chunks directly above exact_chunks_mut for more con…)
 - #51152 (Replace `if` with `if and only if` in the definition dox of `Sync`)
 - #51153 (Link panic and compile_error docs)
 - #51158 (Mention spec and indented blocks in doctest docs)
 - #51186 (Remove two redundant .nll.stderr files)
 - #51203 (Two minor `obligation_forest` tweaks.)
 - #51213 (fs: copy: Use File::set_permissions instead of fs::set_permissions)

Failed merges:
@kennytm
Copy link
Member

kennytm commented May 30, 2018

@symphorien Unfortunately the new code failed to compile when #[cfg(target_os = "fuchsia")].

[01:01:24] error[E0609]: no field `dirp` on type `&mut sys::unix::fs::ReadDir`
[01:01:24]    --> libstd/sys/unix/fs.rs:232:52
[01:01:24]     |
[01:01:24] 232 |                 let entry_ptr = libc::readdir(self.dirp.0);

@symphorien
Copy link
Contributor Author

This should fix the failure.

@kennytm
Copy link
Member

kennytm commented May 31, 2018

Thanks again, now let's retry...

@bors r+

@bors
Copy link
Contributor

bors commented May 31, 2018

📌 Commit 8dec03b has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 31, 2018
@bors
Copy link
Contributor

bors commented May 31, 2018

⌛ Testing commit 8dec03b with merge 5342d40...

bors added a commit that referenced this pull request May 31, 2018
std::fs::DirEntry.metadata(): use fstatat instead of lstat when possible

When reading a directory with `read_dir`, querying metadata for a resulting `DirEntry` is done by building the whole path and then `lstat`ing it, which requires the kernel to resolve the whole path. Instead, one
can use the file descriptor to the enumerated directory and use `fstatat`. This make the resolving step
unnecessary.
This PR implements using `fstatat` on linux, android and emscripten.

## Compatibility across targets
`fstatat` is POSIX.
* Linux >= 2.6.19 according to https://linux.die.net/man/2/fstatat
* android according to https://android.googlesource.com/platform/bionic/+/master/libc/libc.map.txt#392
* emscripten according to https://github.com/kripken/emscripten/blob/7f89560101843198787530731f40a65288f6f15f/system/include/libc/sys/stat.h#L76

The man page says "A similar system call exists on Solaris." but I haven't found it.

## Compatibility with old platforms
This was introduced with glibc 2.4 according to the man page. The only information I could find about the minimal version of glibc rust must support is this discussion https://internals.rust-lang.org/t/bumping-glibc-requirements-for-the-rust-toolchain/5111/10
The conclusion, if I understand correctly, is that currently rust supports glibc >= 2.3.4 but the "real" requirement is Centos 5 with glibc 2.5. This PR would make the minimal version 2.4, so this should be fine.

## Benefit
I did the following silly benchmark:
```rust
use std::io;
use std::fs;
use std::os::linux::fs::MetadataExt;
use std::time::Instant;

fn main() -> Result<(), io::Error> {
    let mut n = 0;
    let mut size = 0;
    let start = Instant::now();
    for entry in fs::read_dir("/nix/store/.links")? {
        let entry = entry?;
        let stat = entry.metadata()?;
        size += stat.st_size();
        n+=1;
    }
    println!("{} files, size {}, time {:?}", n, size, Instant::now().duration_since(start));
    Ok(())
}
```
On warm cache, with current rust nightly:
```
1014099 files, size 76895290022, time Duration { secs: 2, nanos: 65832118 }
```
(between 2.1 and 2.9 seconds usually)
With this PR:
```
1014099 files, size 76895290022, time Duration { secs: 1, nanos: 581662953 }
```
(1.5 to 1.6 seconds usually).

approximately 40% faster :)

On cold cache there is not much to gain because path lookup (which we spare) would have been a cache hit:
Before
```
1014099 files, size 76895290022, time Duration { secs: 391, nanos: 739874992 }
```
After
```
1014099 files, size 76895290022, time Duration { secs: 388, nanos: 431567396 }
```
## Testing
The tests were run on linux `x86_64`
```
python x.py test src/tools/tidy
./x.py test src/libstd
```
and the above benchmark.
I did not test any other target.
@bors
Copy link
Contributor

bors commented May 31, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: kennytm
Pushing 5342d40 to master...

@bors bors merged commit 8dec03b into rust-lang:master May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants