-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide {to,from}_{ne,le,be}_bytes
functions on integers
#51919
Conversation
r? @bluss (rust_highfive has picked a reviewer for you, use r? to override) |
I think these methods also make more sense than the |
CC #49792 |
src/libcore/num/mod.rs
Outdated
/// | ||
/// [`to_native_bytes`]: #method.to_native_bytes | ||
#[rustc_deprecated(since = "1.29.0", reason = "method doesn't communicate | ||
intent, use `to_native_bytes`, `to_le_bytes` or `to_be_bytes` instead")] | ||
#[stable(feature = "int_to_from_bytes", since = "1.29.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this PR is accepted, we should better keep to_bytes()
/from_bytes()
unstable or just remove them, instead of stabilize and immediately deprecate them.
Very fair point! Three different method classes makes sense. |
Hi! I don't know if I'm supposed to add my 2 cents here, but since I can, I will. I've been using the
I'm using this to make a generalized N-D Perlin (and in the future, Simplex) noise library. Since I'm using the hash's bytes as the seed for the pseudorandom generator, I didn't care about endianness when I implemented it. But now that I think of it, specifying the endianness would allow me to have the exact same noise patterns regardless of the platform in which the library is running. So I think, even in my case, this is a better feature than the recently merged Thanks for your hard work 🙇♂️ |
Python implements a similar API on integers:
(Note that the |
This was discussed briefly at @rust-lang/libs triage yesterday but the conclusion was that we're going to leave comments individually here if we feel so. I personally would not want to merge this PR. I think that leveraging the already-stable |
@rfcbot fcp close Let's see what others think though! |
Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
As you can see in this thread, the implied |
This alternative was discussed in #49792 and I mildly preferred the current design in order to avoid the multiplication of API surface. This PR proposes replacing two methods with six. I don’t really mind changing, especially since @tbu-, do you feel that this change would prevent bugs? What do you think of the "intermediate" option of renaming the existing methods but not adding the be/le ones? (Keep documenting them as to be combined with |
Yes, this is the intention behind this change. As you can see above, we already caught one with the PR alone. :)
I think that would be worse than what we currently have, as I think the optimization based on number of functions is misguided. In C, zero-terminated "strings" share the same type, whether they're encoded as UTF-8 or just treated as bag of bytes. Rust makes the distinction because it is useful to prevent errors, it makes sense to make the user cast between bytes and UTF-8-strings, even if this results in an enormous method duplication between byte slices and strings. As shown with the Python example, other languages consider the endianness of the integers part of the encoding process, too. |
I think I'm personally just not sold on this. I see the cost of six functions, some of which duplicate the functionality of existing functions, as not worth it. Dealing with endianness is inherently tricky and I don't feel like making all the methods longer and more wordy will really reduce the trickiness, but rather just make it less ergonomic to call when you already know which one you'd like. |
I understand this part.
I don't really understand this part. It's
To me, it looks like it decreases the ergonomics of doing the platform-specific thing slightly but also increases the ergonomics of doing the platform-independent thing slightly, and the important part: It brings them to the same level. Currently, doing the platform-independent thing is slightly less ergonomic. Rust wants to encourage good code, if we want to bring ergonomics of the methods into play, then I think an even playing field between platform-independent methods and platform-dependent methods is in line with Rust's objectives. |
I feel that "native" in a name doesn’t really work on its own. It only sort of does in "native endian", and even then only by opposition to little/big endian. What do you all think of Also the more I think about it the more I’m becoming partial to having explicit |
@tbu- to me ergonomics isn't literally how many characters you type but also how easy it is to remember idioms, and remembering that there's endian conversions on integers plus byte<-> int conversions means it's easy to remember how to compose. With everything it can be confusing about knowing which one was the one implemented and there's now a choice to be made of which to do in various circumstances. @SimonSapin I think we could try to find a better name, yeah, but I'd reserve |
I am today starting on my first Rust program. I need to read a binary file format. When reading and writing files or network data, the endianness is an inherent part of the format's or protocol's definition. It seems far more complicated to me to first read an incorrect intermediate value using host endianness and then adjust it to the correct value with bit manipulations. Byte-swapping is a completely unnecessary complication when we can just read it correctly in the first place. I would really appreciate it if conversions of the form 'u32::from_le_bytes([u8; 4])' were made available. |
The old issue has already been in FCP, a new issue was opened for the new API.
@SimonSapin Done. |
@bors r+ Thanks! |
📌 Commit 0ddfae5 has been approved by |
…Sapin Provide `{to,from}_{ne,le,be}_bytes` functions on integers If one doesn't view integers as containers of bytes, converting them to bytes necessarily needs the specfication of encoding. I think Rust is a language that wants to be explicit. The `to_bytes` function is basically the opposite of that – it converts an integer into the native byte representation, but there's no mention (in the function name) of it being very much platform dependent. Therefore, I think it would be better to replace that method by three methods, the explicit `to_ne_bytes` ("native endian") which does the same thing and `to_{le,be}_bytes` which return the little- resp. big-endian encoding.
This comment has been minimized.
This comment has been minimized.
Rollup of 14 pull requests Successful merges: - #51919 (Provide `{to,from}_{ne,le,be}_bytes` functions on integers) - #52940 (Align 6-week cycle check with beta promotion instead of stable release.) - #52968 (App-lint-cability) - #52969 (rustbuild: fix local_rebuild) - #52995 (Remove unnecessary local in await! generator) - #52996 (RELEASES.md: fix the `hash_map::Entry::or_default` link) - #53001 (privacy: Fix an ICE in `path_is_private_type`) - #53003 (Stabilize --color and --error-format options in rustdoc) - #53022 (volatile operations docs: clarify that this does not help wrt. concurrency) - #53024 (Specify reentrancy gurantees of `Once::call_once`) - #53041 (Fix invalid code css rule) - #53047 (Make entire row of doc search results clickable) - #53050 (Make left column of rustdoc search results narrower) - #53062 (Remove redundant field names in structs)
💥 Test timed out |
Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963
`{to,from}_{ne,le,be}_bytes` for unsigned integer types Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963
I would really like to see this be put into a trait rather than functions. As a trait this could be implemented by a builtin This would allow for simple creation of packed (or C structures by ignoring padding) structures in Rust (like an IP header or binary file header) and then the automatic conversion to and from the raw bytes on the wire/file. I have a library to do this now and I use it in many of my projects. I think the ability to be able to handle reading and writing packed structures safely from bytestreams is critical enough to a systems language to justify it being put into the core language rather than being an external library. Especially given it would be almost no code increase to the Rust base (would be a procmacro), and would have no implications to programs unless they decide to use the feature. |
@gamozolabs I think it would be appropriate to publish that library on crates.io, garner adoption and prove out the API. At that point, someone could consider writing an RFC, but I think it's probably immature to jump to that right now. |
* Fix typos and unused lifetimes * Remove duplicated cast * As rust-lang/rust#51919 changes function name, use to_ne_bytes instead. * Remove stabled feature flag * Update toolchain to latest version
@BurntSushi will this affect |
@jonhoo That's a pretty broad question! Nothing immediately pops out at me. |
Hehe, that's true. I guess I was more wondering whether |
Oh! Hah. That didn't cross my mind at all because byteorder is one of the most conservative crates I maintain, and I won't increase its MSRV for something like this. |
If one doesn't view integers as containers of bytes, converting them to
bytes necessarily needs the specfication of encoding.
I think Rust is a language that wants to be explicit. The
to_bytes
function is basically the opposite of that – it converts an integer into
the native byte representation, but there's no mention (in the function
name) of it being very much platform dependent. Therefore, I think it
would be better to replace that method by three methods, the explicit
to_ne_bytes
("native endian") which does the same thing andto_{le,be}_bytes
which return the little- resp. big-endian encoding.