-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse raw identifiers. #2857
base: main
Are you sure you want to change the base?
Parse raw identifiers. #2857
Conversation
ccf9744
to
65275bd
Compare
1d0edfb
to
7e873b7
Compare
error = LexingDiagnostic(.unprintableAsciiCharacter, position: position) | ||
} | ||
} | ||
if isEmpty && !scalar.isOperatorStartCodePoint || !scalar.isOperatorContinuationCodePoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding parenthesis here would help readability.
if isEmpty && !scalar.isOperatorStartCodePoint || !scalar.isOperatorContinuationCodePoint { | |
if (isEmpty && !scalar.isOperatorStartCodePoint) || !scalar.isOperatorContinuationCodePoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
) | ||
] | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we diagnose raw identifiers that only consist of whitespace? As far as I can tell, they are forbidden by the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't; thanks for catching that. Fixed.
This is the parser implementation for SE-0451.
7e873b7
to
6c7fc5a
Compare
I think I have a hard need for determining if an identifier wrapped in backticks is raw or just an escaped keyword. Having a member such as |
// excluding any other ASCII non-printables and Unicode separators. In | ||
// other words, the only whitespace code points allowed in a raw | ||
// identifier are U+0020, and U+200E/200F (LTR/RTL marks). | ||
return (c >= 0x0009 && c <= 0x000D) as Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to express this as isWhitespace && !isPermittedRawIdentifierWhitespace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser explicitly avoids the use of Unicode.Scalar.Properties
because they're not necessarily stable between Unicode versions and may change depending on the Unicode tables that are built into libswiftCore
, so we don't want the behavior of the parser to change depending on things like what version of the operating system the compiler is running on.
It's probably exceedingly rare that new whitespace will be added (or worse, that a non-whitespace code point would become whitespace or vice versa), but hardcoding the code points means the behavior is deterministic and we can choose to make breaks via new language modes, if needed in the future.
I think you can do this today, in a somewhat roundabout fashion, using existing APIs from
...since It's not the most efficient though since it kicks off two parsers. I agree it's probably a good idea to have a separate API for this that does the same single-pass over the identifier that the lexer does. |
@swift-ci please test |
The parser implementation of SE-0451.