Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space-optimize Option<T> for integral enum T #14540

Closed
ghost opened this issue May 30, 2014 · 10 comments · Fixed by #45225
Closed

Space-optimize Option<T> for integral enum T #14540

ghost opened this issue May 30, 2014 · 10 comments · Fixed by #45225
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@ghost
Copy link

ghost commented May 30, 2014

_Summary_
I propose a space optimization for variables of type Option<E> when E is a nullary, integral enum type.

_Motivation_
There's no need to waste memory for storing a separate tag in variables of type Option<E> if E is an integral enum type and the set of valid values of E does not cover all possible bit patterns. Any bit pattern (of the size of E) that doesn't represent a valid value of type E could be used by the compiler to represent the None value of type Option<E>.

_Details_
Given a nullary, integral enum type E, the compiler should check if some bit pattern exists which does not represent a valid value of type E (the only valid values are the ones determined by the nullary enum variants of E). If such "invalid" bit patterns are found, the compiler should use one of them to represent the None value of type Option<E> and omit storing the tag in variables of type Option<E>. If more than one such "invalid" bit pattern exists, there should be a language defined method to deterministically determine which one of those bit patterns is used to represent the None value. I think the bit pattern of None should be language defined rather than implementation defined in order to make Option<E> values serialized to disk more stable between different compilers / compiler versions.

In determining whether a certain value of such space optimized type Option<E> is None or not, the algorithm should simply check whether or not the binary representation of said value is equal to the binary representation of the language defined "invalid" value.

@erickt
Copy link
Contributor

erickt commented Jun 3, 2014

I whipped up a dummy example to see how this would optimize Option<Result<T, E>> types, and it has a pretty nice 8% speedup: https://gist.github.com/erickt/8a6be5c8a2542eaf0c45. This would be especially helpful for my libserialize rewrite RFC.

@ghost
Copy link
Author

ghost commented Jun 4, 2014

I don't want to comment too much on what should be the language defined method of determining the one "invalid" bit pattern that would get chosen among all the possible ones. But I do believe that if at least one "invalid" bit pattern (interpreted as the underlying integral type of the enum) exists such that it is either larger than the largest enum variant or smaller than the smallest enum variant, then one of those should be guaranteed to get chosen. This enables users to create C-like enums that they use as integers bounded to a certain continuous range of values (by providing their own unsafe iterators for such an enum type). For example, given the following enum:

#[repr(u8)]
enum Value {
    Min = 1,
    Max = 3
}

The language should guarantee that the "invalid" bit pattern used for representing the None value of type Option<Value> is either 0b_0000_0000 (less than Value::Min), or in the range of values [0b_0000_0100, 0b_1111_1111] (greater than Value::Max). That's assuming the underlying integral type of Value is u8.

[Edit]
Ignore this whole comment. When I wrote it I didn't realize that it's undefined behaviour to have an enum set to a discriminant value that's not included in the definition of the enum. Therefore an iterator that traverses from Value::Min to Value::Max in increments of 1, would have to along the way store the non-existing discriminant value 2 into the enum which would be UB. Therefore it would be fine to use value 2 as a the underlying representation for Option::None::<Value>. Although maybe the rule that determines which value is used to represent internally the None case should favor the value 0 the most, because I think comparison against 0 may be faster than comparison against other values on some processors.

@pczarn
Copy link
Contributor

pczarn commented Jun 5, 2014

Don't you think it could be done transitively for all tagged unions and integral enum types?

I have three optimizations on my mind:

  • use free patterns for zero-length variant representations in non-integral enums, including checks for nullable pointers and all of the above
  • allow #[repr(int type)] on non-integral enums
  • use #[packed] on enums to force all possible optimizations

I think the bit pattern of a variant should should be as close to 0 as possible in the order of declaration, just like in integral enums. It could stay undefined or become implementation defined

@huonw
Copy link
Member

huonw commented Jun 5, 2014

use #[packed] on enums to force all possible optimizations

Why wouldn't this be the default? (That is, why would all optimisations be applied by default.)

@ghost
Copy link
Author

ghost commented Jun 7, 2014

Don't you think it could be done transitively for all tagged unions and integral enum types?

I don't really understand the question. I'm sure that there are plenty of other possible optimizations, but they can be implemented irrespective of this proposed optimization and I think they should get their own dedicated github-issues.

This proposed optimization should happen automatically, just like the non-null based space-optimization of Option<&T> and Option<~T> values.

@pczarn
Copy link
Contributor

pczarn commented Jun 13, 2014

@huonw, because it's a space-time tradeoff. Ideally, all possible values of an ADT could be represented within a minimal number of bits. However, values can be moved out of enums, so they should exist somewhere with proper alignment. Matching complex packed enums could still get expensive without simple discriminants:

enum Value {
  A = 20,
  B = 21,
}

// Option<Result<Value, IoErrorKind>>
match maybe_result {
  // matches on {22, _}
  None => a,
  // matches on {18, 0}
  Some(Err(ShortWrite(0))) => b,
  // matches on {x, _} if 20 <= x && x < 22
  Some(Ok(_)) => c,
}

@ghost
Copy link
Author

ghost commented Oct 1, 2014

@pczarn But there is no space-time tradeoff with the space-optimization that is being proposed here.

@pczarn
Copy link
Contributor

pczarn commented Oct 1, 2014

Of course, sorry, I was referring to something else.

@steveklabnik
Copy link
Member

Triage: no changes I'm aware of.

@Mark-Simulacrum
Copy link
Member

I'm going to close this in favor of rust-lang/rfcs#1230 since there's a lot of potential layout optimizations involving enums but this one specific issue isn't key enough that it should be tracked separately.

bors added a commit that referenced this issue Oct 14, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Oct 29, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 13, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 19, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 5, 2023
…gle_brace, r=Veykril

Fix allow extracting function from single brace of block expression

Fix allow extracting function when selecting either `{` or `}`

Fix rust-lang#14514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants