Parse raw identifiers. #2857

allevato · 2024-09-22T13:26:42Z

The parser implementation of SE-0451.

ahoppen · 2025-01-02T09:10:08Z

Sources/SwiftParser/Lexer/Cursor.swift

+          error = LexingDiagnostic(.unprintableAsciiCharacter, position: position)
+        }
+      }
+      if isEmpty && !scalar.isOperatorStartCodePoint || !scalar.isOperatorContinuationCodePoint {


I think adding parenthesis here would help readability.

Suggested change

if isEmpty && !scalar.isOperatorStartCodePoint || !scalar.isOperatorContinuationCodePoint {

if (isEmpty && !scalar.isOperatorStartCodePoint) || !scalar.isOperatorContinuationCodePoint {

ahoppen · 2025-01-02T09:13:04Z

Tests/SwiftParserTest/translated/EscapedIdentifiersTests.swift

+        )
+      ]
+    )
+  }


Do we diagnose raw identifiers that only consist of whitespace? As far as I can tell, they are forbidden by the proposal.

We weren't; thanks for catching that. Fixed.

This is the parser implementation for SE-0451.

grynspan · 2025-01-06T18:40:17Z

I think I have a hard need for determining if an identifier wrapped in backticks is raw or just an escaped keyword. Having a member such as TokenKind.isRawIdentifier would be very useful here.

grynspan · 2025-01-06T18:41:28Z

Sources/SwiftParser/Lexer/UnicodeScalarExtensions.swift

+    // excluding any other ASCII non-printables and Unicode separators. In
+    // other words, the only whitespace code points allowed in a raw
+    // identifier are U+0020, and U+200E/200F (LTR/RTL marks).
+    return (c >= 0x0009 && c <= 0x000D) as Bool


Would it be better to express this as isWhitespace && !isPermittedRawIdentifierWhitespace?

The parser explicitly avoids the use of Unicode.Scalar.Properties because they're not necessarily stable between Unicode versions and may change depending on the Unicode tables that are built into libswiftCore, so we don't want the behavior of the parser to change depending on things like what version of the operating system the compiler is running on.

It's probably exceedingly rare that new whitespace will be added (or worse, that a non-whitespace code point would become whitespace or vice versa), but hardcoding the code points means the behavior is deterministic and we can choose to make breaks via new language modes, if needed in the future.

allevato · 2025-01-06T19:04:46Z

I think I have a hard need for determining if an identifier wrapped in backticks is raw or just an escaped keyword. Having a member such as TokenKind.isRawIdentifier would be very useful here.

I think you can do this today, in a somewhat roundabout fashion, using existing APIs from SwiftParser:

let id = "<some keyword or raw identifier without backticks>"
let quotedID = "`\(id)`"
let isKeywordish = id.isValidSwiftIdentifier(for: .memberAccess)
  && quotedID.isValidSwiftIdentifier(for: .memberAccess)

...since x.class and x.`class` are both valid for escaped keywords but x.some raw identifier and x.`some raw identifier` would fail the first test. There may be some failure cases I'm missing here, though...

It's not the most efficient though since it kicks off two parsers. I agree it's probably a good idea to have a separate API for this that does the same single-pass over the identifier that the lexer does.

allevato · 2025-01-10T19:31:27Z

@swift-ci please test

allevato force-pushed the rich-identifiers branch 2 times, most recently from ccf9744 to 65275bd Compare October 23, 2024 13:48

allevato force-pushed the rich-identifiers branch 2 times, most recently from 1d0edfb to 7e873b7 Compare December 24, 2024 14:00

allevato marked this pull request as ready for review December 24, 2024 14:04

allevato requested review from ahoppen and bnbarham as code owners December 24, 2024 14:04

ahoppen reviewed Jan 2, 2025

View reviewed changes

Parse raw identifiers.

6c7fc5a

This is the parser implementation for SE-0451.

allevato force-pushed the rich-identifiers branch from 7e873b7 to 6c7fc5a Compare January 6, 2025 13:56

grynspan reviewed Jan 6, 2025

View reviewed changes

allevato mentioned this pull request Jan 7, 2025

Support raw identifiers (backtick-delimited identifiers containing non-identifier characters). swiftlang/swift#76636

Open

ahoppen approved these changes Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse raw identifiers. #2857

Parse raw identifiers. #2857

allevato commented Sep 22, 2024 •

edited

Loading

ahoppen Jan 2, 2025

allevato Jan 6, 2025

ahoppen Jan 2, 2025

allevato Jan 6, 2025

grynspan commented Jan 6, 2025

grynspan Jan 6, 2025

allevato Jan 6, 2025 •

edited

Loading

allevato commented Jan 6, 2025

allevato commented Jan 10, 2025

	if isEmpty && !scalar.isOperatorStartCodePoint \|\| !scalar.isOperatorContinuationCodePoint {
	if (isEmpty && !scalar.isOperatorStartCodePoint) \|\| !scalar.isOperatorContinuationCodePoint {

Parse raw identifiers. #2857

Are you sure you want to change the base?

Parse raw identifiers. #2857

Conversation

allevato commented Sep 22, 2024 • edited Loading

ahoppen Jan 2, 2025

Choose a reason for hiding this comment

allevato Jan 6, 2025

Choose a reason for hiding this comment

ahoppen Jan 2, 2025

Choose a reason for hiding this comment

allevato Jan 6, 2025

Choose a reason for hiding this comment

grynspan commented Jan 6, 2025

grynspan Jan 6, 2025

Choose a reason for hiding this comment

allevato Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

allevato commented Jan 6, 2025

allevato commented Jan 10, 2025

allevato commented Sep 22, 2024 •

edited

Loading

allevato Jan 6, 2025 •

edited

Loading