Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<span> does not have standard layout #874

Closed
ned14 opened this issue Jun 2, 2020 · 41 comments · Fixed by #877
Closed

<span> does not have standard layout #874

ned14 opened this issue Jun 2, 2020 · 41 comments · Fixed by #877
Labels
enhancement Something can be improved fixed Something works now, yay!

Comments

@ned14
Copy link

ned14 commented Jun 2, 2020

I appreciate that <span> is not required to have standard layout, however every other <span> implementation does have standard layout: libstdc++'s, libc++'s, and all the third party library implementations that I am aware of. Except for Microsoft's.

I assume that as usual the ABI ship has sailed, so this will be a wontfix. However, fixing this would be great. Lots of code passes spans to non-C++ code, and not having avoidable UB here would be nice.

@miscco
Copy link
Contributor

miscco commented Jun 2, 2020

@ned14 Are you referring to the empty base class optimization that is applied to span?

@AdamBucior
Copy link
Contributor

I assume that as usual the ABI ship has sailed, so this will be a wontfix.

c++latest has unstable ABI

@miscco
Copy link
Contributor

miscco commented Jun 2, 2020

As I am the culprit here my 2 cents:

Generally I would always prefer composition over inheritance. However, as far as I understand [[no_unique_address]] is currently not supported by MSVC, which is why we went with inheritance.

I do not know whether we could change it given that clang supports it. If the maintainers are ok with [[no_unique_address]] I am happy to provide a patch for that.

@ned14
Copy link
Author

ned14 commented Jun 2, 2020

@miscco I upgraded MSVC, tried compiling my code, and static asserts about loss of standard layout fired everywhere. The cause was that the code detected that MSVC now provides span<T>, and thus used it for the first time.

Re no unique address for span<T>, I assume that such an optimisation would only make sense if the span's size is consteval known, and thus storage for the span could be completely eliminated entirely. I'll be blunt in saying that I don't, personally speaking, see much use for such an optimisation as span's tend to occupy ephemeral storage which the compiler can as-if eliminate storage in any case.

Me personally, though perhaps you should bring in advice from the Microsoft people first, I'd choose C compatibility over a future ABI break caused by MSVC implementing no unique address on the basis that no unique address optimisation for spans really would be very very rarely useful - only where people store spans in long term storage where lifetime escapes analysis, and they have compile-time known lengths. I know I've never written code like that, and if I did, I'd use std::array like any other sane person. But that's my personal opinon only.

@miscco
Copy link
Contributor

miscco commented Jun 2, 2020

Could you check with a different standard library? As far as I can see both libstdc++ (via [[no_unique_address]]) and libc++ (via a specialization without the _Size data member) implement the same empty base class optimization as we do.

Also I can see some use cases of static size spans, e.g. for sliding windows over data etc.

@ned14
Copy link
Author

ned14 commented Jun 2, 2020

Could you check with a different standard library? As far as I can see both libstdc++ (via [[no_unique_address]]) and libc++ (via a specialization without the _Size data member) implement the same empty base class optimization as we do.

We were already up on libc++ and libstdc++ in C++ 20, and were using their span<T> implementations if C++ 20 is available. In fact, they had quality of implementation problems which we reported (for reference, yours was fine, worked first time, apart from standard layout).

@CaseyCarter
Copy link
Member

CaseyCarter commented Jun 2, 2020

Lots of code passes spans to non-C++ code, and not having avoidable UB here would be nice.

Could you clarify this statement? Your request is light on technical details. What code do you believe has defined behavior only with a standard-layout span?

@CaseyCarter CaseyCarter added enhancement Something can be improved info needed We need more info before working on this labels Jun 2, 2020
@cbezault
Copy link
Contributor

cbezault commented Jun 2, 2020

I think the point here is that passing an object to some C code (or whatever other language) is only well defined if the object has standard layout. Presumably @ned14 is doing that with spans.

@sylveon
Copy link
Contributor

sylveon commented Jun 3, 2020

That depends on the internal layout of the span, which is a less than stellar idea.

@cbezault
Copy link
Contributor

cbezault commented Jun 3, 2020

Agreed, and ned mentions as much.

I appreciate that is not required to have standard layout, however every other implementation does have standard layout: libstdc++'s, libc++'s, and all the third party library implementations that I am aware of. Except for Microsoft's.

In general, however, I don't actually see a problem in attempting to make classes have standard layout if there's no reason to not have standard layout. (Please correct me if I'm wrong)

Edit: That's not to say I'm in favor of bending over backwards to support something which isn't in the standard. Just if it's easy and there's no downside, why not?

@ned14
Copy link
Author

ned14 commented Jun 3, 2020

Most of the span implementations use a pointer and a size. The technique I used in Outcome to preserve standard layout was something like:

template<class T, bool pointer_is_consteval, bool size_is_consteval>
struct span_storage
{
  T *_ptr{nullptr};
  size_t _length{0};
};
template<class T, bool pointer_is_consteval>
struct span_storage<T, pointer_is_consteval, true>
{
  T *_ptr{nullptr};
};
template<class T>
struct [[no_unique_address]] span_storage<T, true, true>
{
};

template<class T>
struct span : span_storage ...

So, basically, keep all your member variables inside a single class within the inheritance hierarchy, and standard layout is preserved. Composing member variables by inheritance is what removes standard layout.

@AdamBucior
Copy link
Contributor

Could you give some example of code that needs span to be standard layout?

@miscco
Copy link
Contributor

miscco commented Jun 3, 2020

Ah now I see, https://en.cppreference.com/w/cpp/named_req/StandardLayoutType

We have our members defined in two classes, _Span_extent and span itself.

I have prepared a PR #877 to move to compostition. However, I am unsure if this can be merged before [[no_unique_address]] is available in MSVC

@ned14
Copy link
Author

ned14 commented Jun 3, 2020

Many thanks for the fix!

@ned14 ned14 closed this as completed Jun 3, 2020
@miscco
Copy link
Contributor

miscco commented Jun 3, 2020

It is not yet merged so I would not close the issue

@CaseyCarter CaseyCarter linked a pull request Jun 3, 2020 that will close this issue
@StephanTLavavej
Copy link
Member

I'd also like to see a concrete example, since as @sylveon noted, this seems to be taking a dependency on internal layout, not just the standard layout property.

I also tend to think that if this is so important, it should have been in the Standard (or filed as an LWG issue). We do provide non-Standard guarantees from time to time, but it's not a great habit to get into.

@cbezault
Copy link
Contributor

cbezault commented Jun 3, 2020

Ah I see now, does @ned14's use-case likely (since we still don't have an example) require knowing the internal layout of span? Not just the fact that it's standard layout? That is definitely not something we should make a change for.

@ned14
Copy link
Author

ned14 commented Jun 3, 2020

@StephanTLavavej You can test in consteval for layout compatibility with a C structure easily enough, and static assert if it doesn't match up.

For me personally, my use case is LLFIO where llfio::file_handle::buffers_type is a span<buffer_type>, where buffer_type is identical to span<byte> except that it guarantees that the layout exactly matches a struct iovec, when on POSIX. C code, in turn, calls LLFIO functions (all of which are C-compatible, as per P1095's proposed implementation of P0709, except using Experimental.Outcome as the lightweight exceptions emulation), and it all works just lovely, no horrible C++ bindings needed. I don't strictly speaking need standard layout here, I don't really care about UB here as all the major compilers are well defined in this area. But, if UB can be avoided, that's a nice to have situation.

BTW I did ask WG21 to make span<T> standard layout, and I was told no, as the standard doesn't require layout of implementations without a truly great reason.

@StephanTLavavej
Copy link
Member

Ok, so what you're looking for is a layout guarantee, not standard_layout - and I don't understand why you'd want a specific layout from MSVC's STL which doesn't run on POSIX.

#877 may still be worth considering if we can simplify our inheritance and improve debug codegen, but this issue increasingly sounds like a non-Standard request which we consider to be a non-goal.

@ned14
Copy link
Author

ned14 commented Jun 4, 2020

Ok, so what you're looking for is a layout guarantee, not standard_layout - and I don't understand why you'd want a specific layout from MSVC's STL which doesn't run on POSIX.

The code is portable and runs on Windows. You're overthinking the layout guarantee, we have static asserts which won't compile the code if layout is incompatible. This is a situation of "guarantee through facts on the ground" rather than "guarantee through warranty". And an awful lot of C, and C++, is written on the former (e.g. good luck getting a lot of C code to work on 10 bit char platforms).

#877 may still be worth considering if we can simplify our inheritance and improve debug codegen, but this issue increasingly sounds like a non-Standard request which we consider to be a non-goal.

C++ 20 requires standard layout for C++ types to be interoperable with C types. Yes that's far far too conservative, but that's the rule as in the normative wording.

I think it would be really great if future C++ relaxed the requirements to:

  1. Trivially copyable or move bitcopying.
  2. For any C++ type for which std::is_layout_compatible is true for any other type definable in C.

... and now the standard would match what the compilers have guaranteed for donkey's years now (well apart from the move bitcopying, that's currently still in EWG-I).

@StephanTLavavej
Copy link
Member

You're overthinking the layout guarantee, we have static asserts which won't compile the code if layout is incompatible.

Compatible with what? What code, specifically, is expecting a particular layout? If I understand your comments correctly, on Windows you don't need to be compatible with any external library - you are simply using a span and then providing a guarantee that the layout resembles a struct iovec on POSIX. I would appreciate clarification if I'm misunderstanding.

In #877, we have a question - should the pointer or the length appear first in the representation? The answer from looking at POSIX docs appears to be that the pointer should be first - which is not what that PR was initially doing. Hence my focus on the specific request here.

I am still inclined to accept the PR as a special exception to our usual goals, since it does seem reasonable to expect a pointer-length representation, there is no particular cost to doing so (other than ABI churn, and /std:c++20 is not yet locked down), and there are debug codegen gains to be had from reworking the inheritance. But I would feel better if I clearly understood the scenario.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jun 8, 2020

Alasing a span with an iovec is still an undefined behavior in the C++ standard IMO, even if they are layout compatible, unless the span object and the iovec object are both members of some union.
For standard layout types, the C++ standard currently only additionally defines the behavior of inspecting common initial sequences of different members of union.

@ned14
Copy link
Author

ned14 commented Jun 11, 2020

As this is the small verification program I submitted to libstdc++, I also submit it here for reference, though it's not particularly apposite for non-POSIX:

#include <span>
#include <type_traits>
#include <sys/uio.h>

using buffer_type = std::span<std::byte>;
static_assert(sizeof(buffer_type) == sizeof(struct iovec));
static_assert(std::is_trivially_copyable_v<buffer_type>);
static_assert(std::is_standard_layout_v<buffer_type>);
static_assert(std::is_layout_compatible_v<buffer_type, struct iovec>);

Alasing a span with an iovec is still an undefined behavior in the C++ standard IMO, even if they are layout compatible, unless the span object and the iovec object are both members of some union.

I don't think you understand the use context I have in mind. What I care about is that the compiler, which can see layout compatibility between input buffer sequences and output buffer sequences even though they are of unrelated types, could elide the buffer sequence repack under the as-if rule.

Current compilers won't do that of course, because a syscall causes the compiler to assume that the scatter-gather array contents have escaped analysis, and so therefore the repack is always done. However myself and various people on WG14 and WG21 are working on function annotation to prevent this, and one day soon hopefully, we can reach avoiding having to employ UB in high performance zero copy code.

@frederick-vs-ja
Copy link
Contributor

I don't think you understand the use context I have in mind. What I care about is that the compiler, which can see layout compatibility between input buffer sequences and output buffer sequences even though they are of unrelated types, could elide the buffer sequence repack under the as-if rule.

I mean that "making these types layout compatible according to the C++ standard" itself doesn't reduce UB.

While caring about the compiler, we should say that incompatibility between the old implementation and iovec is not caused by lack of standard-layoutness or layout compatibility in the standard. It's just caused by the order of members and bases.

Even if span with dynamic extent is not standard-layout, e.g. the pointer and the length are not declared in the same class, compilers, including MSVC and other compilers using Itanium ABI, can still make its actual layout same as iovec - for example, let the first base class hold the pointer and the second base class hold the length.

The implementation improvement you proposed is meaningful in practice, though it doesn't actually reduce UB (in the standard) currently. I think your working on function annotation is worthy.

@StephanTLavavej StephanTLavavej removed the info needed We need more info before working on this label Jun 23, 2020
StephanTLavavej added a commit that referenced this issue Jun 24, 2020
Fixes #874.

Co-authored-by: Casey Carter <[email protected]>
Co-authored-by: Stephan T. Lavavej <[email protected]>
@StephanTLavavej StephanTLavavej added fixed Something works now, yay! and removed work in progress labels Jun 24, 2020
ahanamuk pushed a commit to ahanamuk/STL that referenced this issue Jul 1, 2020
Fixes microsoft#874.

Co-authored-by: Casey Carter <[email protected]>
Co-authored-by: Stephan T. Lavavej <[email protected]>
@frederick-vs-ja
Copy link
Contributor

The implementation improvement you proposed is meaningful in practice, though it doesn't actually reduce UB (in the standard) currently. I think your working on function annotation is worthy.

That is exactly the defect of the C++ standard tbh. Technically everything is UB. Calling assembly? UB. Calling any APIs with syscalls? Undefined behavior. printf("Hello World"); undefined behavior under a different exec-charset like EBCDIC. In practice, the standard library itself is implemented as undefined behavior.

In practice, [[gnu::may_alias]] attributes just do work. I think [[gnu::may_alias]] should get just standardized.

You are almost completely wrong. Calling assembly is conditionally supported with implementation-defined semantics and there is no "undefined behavior" specified by the C++ standard. Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Nov 17, 2020

Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.

It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB.

Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.

The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined".
You should read [intro.compliance]/(6.3) carefully.

What constitutes an interactive device is implementation-defined.

By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.

@frederick-vs-ja
Copy link
Contributor

Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.

It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB.
Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.

The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined".
You should read [intro.compliance]/(6.3) carefully.

What constitutes an interactive device is implementation-defined.

By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.

I am sorry. You are wrong.
Read the discussion we had before. It is even hard to say whether printf or even write is an I/O function
expnkx/fast_io#20 (comment)

In practice, you have to use memory barriers etc to avoid UB since yes, the compiler is free to optimize anything away.

https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/

Linus is technically right.

The *fact* is that gcc documents type punning through unions as the
"right way". You may disagree with that, but putting some theoretical
standards language over the *explicit* and long-time documentation of
the main compiler we use is pure and utter bullshit.

I've said this before, and I'll say it again: a standards paper is
just so much toilet paper when it conflicts with reality. It has
absolutely _zero_ relevance. In fact, I'll take real toilet paper over
standards any day, because at least that way I won't have splinters
and ink up my arse.

The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.

@frederick-vs-ja
Copy link
Contributor

The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.

Linus is right does not depend on personal opinions. He had experiences and made much more code than you. His word is more reliable than you or even the standard.

There are facts. "Personal opinions" is just the excuse for you who disagree with facts.

"(e.g. something described as "implementation-defined" or "unspecified")"

How do these "implementation-defined" or "unspecified" what you said get implemented without undefined behavior? No, they cannot. They must contain undefined behavior.

Then why a third party library is not implementation-defined behavior anymore?

No, those are not facts, just your opinions.
Linus is not so reliable because it has not been shown that he has written enough code according the strict aliasing rule, and hence it has not been shown that he is capable to capable to compare the standard sematic and -fno-strict-aliasing sematic. I'm convinced that Linus's word is more reliable than me about the his code relied on -fwrapv, -fno-strict-aliasing etc., but not about "which sematic is better".
I don't care whether "they" can or cannot, they are permitted to be implemented without "behavior in C/C++". Again we should discuss this elsewhere.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Nov 17, 2020

Because it is "may be" undefined. LOL

Such word is not said by the standard. It's concluded by Cubbi (an administrator of cppreference) and modified by me.

P0593R6 (implicit object creation) partially resolved this question. However, the well-definedness of modifying objects via unsigned char/std::byte lvalue is underspecified in C++ standard, so there's unresolved defects IMO.

@frederick-vs-ja
Copy link
Contributor

However, the well-definedness of modifying objects via unsigned char/std::byte lvalue is underspecified in C++ standard, so there's unresolved defects IMO.

You admit the standard library itself is UB. Great!

Nope. Defects (CWG issues) are treated differently IMO.

Linus is not so reliable because it has not been shown that he has written enough code according the strict aliasing rule, and hence it has not been shown that he is capable to capable to compare the standard sematic and -fno-strict-aliasing sematic.

These are facts. Calling POSIX apis is UB.

It's unrelated. Can you show that the implementation of POSIX APIs requires UB?

@FrankHB
Copy link

FrankHB commented Nov 17, 2020

Calling with syscalls roughly has same properties. The difference between ASCII/EBCDIC charset doesn't affect anything here if the code is compiled with correct options.

It is undefined behavior. Calling POSIX APIs is undefined behavior since standard only treats stdio and iostream as I/O. Any other functions, the compiler is free to optimize it away since it is not a part of standard. That is UB.
Implementation defined behavior. Then yes, [[gnu::may_alias]] is implementation-defined behavior, still does not change the fact that is UB.

The standard doesn't say anything like "Any other functions, the compiler is free to optimize it away since it is not a part of standard". Don't misuse items "unspecified" and "undefined".
You should read [intro.compliance]/(6.3) carefully.

What constitutes an interactive device is implementation-defined.

By the way, [[gnu::may_alias]] modifies types and adds non-standard properties, so it may result well-defined behavior in implementation-defined manners. I think [[gnu::may_alias]] is acceptable, partially.

I am sorry. You are wrong.
Read the discussion we had before. It is even hard to say whether printf or even write is an I/O function
expnkx/fast_io#20 (comment)
In practice, you have to use memory barriers etc to avoid UB since yes, the compiler is free to optimize anything away.
https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
Linus is technically right.

The *fact* is that gcc documents type punning through unions as the
"right way". You may disagree with that, but putting some theoretical
standards language over the *explicit* and long-time documentation of
the main compiler we use is pure and utter bullshit.

I've said this before, and I'll say it again: a standards paper is
just so much toilet paper when it conflicts with reality. It has
absolutely _zero_ relevance. In fact, I'll take real toilet paper over
standards any day, because at least that way I won't have splinters
and ink up my arse.

The item "I/O function" was mentioned by you, but not me. Again, not guaranteed by the standard (e.g. something described as "implementation-defined" or "unspecified") doesn't mean undefined. Whether Linus is right depends on personal opinions. And it's not an appropriate place for such discussion here.

Disclaimer The referenced discussions on I/O functions does not imply there is any UB. OTOH, it effectively claims that the standard has a series of (potentially more serious) defects on the basic definitions of I/O. I have submitted it to CWG chair, but there is still no reply. I don't know why, but anyway this is none of the business here.

As of Linus Torvalds' theory: full of BS. Nobody is obligated to take care on any implementation details, which is effectively nothing. The standard implies the conformance as the consensus to refine the things being standardized, which is logically strictly more than nothing.

@StephanTLavavej
Copy link
Member

Please do not use our issues for off-topic discussion.

@microsoft microsoft locked as off-topic and limited conversation to collaborators Nov 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Something can be improved fixed Something works now, yay!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants