Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Java21 string template #1010

Closed

Conversation

butterunderflow
Copy link
Contributor

86390d0: Support for string templates and corresponding tests. But won't work, because output tokens of com.sun.tools.javac.parser.Scanner are not sorted by their position, and then cause the verification here failed.
86390d0: Read all tokens from Scanner, and sort them if jdk version >= 21.

Some links may be useful:
How jdk parse string template from tokens: https://github.com/openjdk/jdk/blob/3d9d353edb64dd364925481d7b7c8822beeaa117/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java#L695-L746
How tokens of string templates are builded: https://github.com/openjdk/jdk/blob/3d9d353edb64dd364925481d7b7c8822beeaa117/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java#L342-L420

won't work because the output tokens of `com.sun.tools.javac.parser.Scanner` are not sorted by position
Copy link

google-cla bot commented Dec 15, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@cushon cushon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I had started working on this in parallel before seeing this PR, and had merged a very simple initial handling of string templates in b5feefe. It just passes the entire string template through without formatting it. This approach is more complete, and I would like to merge most of what you have here.

@@ -104,6 +123,8 @@ public static ImmutableList<RawTok> getTokens(
break;
}
if (last < t.pos) {
/* If the current token is not immediately following the previous one,
treat the gap as a token */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent here was to only add tokens for these 'gaps' corresponding to whitespace. I think with this change there's also a gap for the \ that escapes { in the string templates, and this logic is adding in that token.

One alternative I was thinking about was to include the \ in the preceding string fragment token. That also solves the problem I mentioned below where an empty token is leading to a crash in JavaInput.buildToks.

I have a demo of that approach here, what do you think? https://github.com/google/google-java-format/compare/master...cushon:google-java-format:stringtemplate?expand=1#diff-a1167950838171d2ee18097c833af4c3155bfededea60870f677cb8a72738aa1R161

Copy link
Contributor Author

@butterunderflow butterunderflow Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your demo will work without problem.

There's one thing I'm not very sure, will that too tricky to split \ and { into two tokens? For example, can we guarantee that { will follow \ immediately in output?

After some test of your demo, I can't find a negative test case to illustrate this. Even for very long string fragment, \ and { are stay together perfectly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be OK, and that \{ is handled as an escape sequence where it wouldn't be valid to have any separate between the two characters.

/** Lex the input and return a list of {@link RawTok}s. */
public static ImmutableList<RawTok> getTokens(
String source, Context context, Set<TokenKind> stopTokens) {
private static ImmutableList<Token> readAllToken(String source, Context context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the test cases that were added for I981, I am seeing an issue where this returns tokens with empty string values for some string templates, which crash JavaInput.buildToks. I think the issue is that for string templates like "foo\{X}\{Y}bar" there are three string fragment tokens, and the middle one is empty.

java.lang.StringIndexOutOfBoundsException: Index 0 out of bounds for length 0

...
	at java.base/java.lang.String.charAt(String.java:1555)
	at com.google.googlejavaformat.java.JavaInput.buildToks(JavaInput.java:385)
	at com.google.googlejavaformat.java.JavaInput.buildToks(JavaInput.java:335)
	at com.google.googlejavaformat.java.JavaInput.<init>(JavaInput.java:277)
	at com.google.googlejavaformat.java.Formatter.getFormatReplacements(Formatter.java:270)
	at com.google.googlejavaformat.java.Formatter.formatSource(Formatter.java:257)
	at com.google.googlejavaformat.java.Formatter.formatSource(Formatter.java:223)


String frag = fragIt.next();
if (!fragIt.hasNext()) {
token(frag + "\"");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this handles unicode escapes. The approach in the other change was to just pass through the token for string fragments:

which is similar to how string literals worked previously:

String sourceForNode = getSourceForNode(node, getCurrentPath());

@butterunderflow
Copy link
Contributor Author

Thank you for this detailed reviewing!

@butterunderflow
Copy link
Contributor Author

Should I close this PR and wait for yours to merge? There are some details in the original PR that weren't considered, and I found they're already handled in your fork.

@cushon
Copy link
Collaborator

cushon commented Dec 21, 2023

Should I close this PR and wait for yours to merge? There are some details in the original PR that weren't considered, and I found they're already handled in your fork.

Thanks for taking a look! If the changes I made look OK I will go ahead and merge that version (and credit you).

copybara-service bot pushed a commit that referenced this pull request Dec 21, 2023
The initial implementation passed through the entire string unmodified, this allows formatting the Java expressions inside the `\{...}`.

See #1010

Co-authored-by: butterunderflow <[email protected]>
PiperOrigin-RevId: 592889073
copybara-service bot pushed a commit that referenced this pull request Dec 21, 2023
The initial implementation passed through the entire string unmodified, this allows formatting the Java expressions inside the `\{...}`.

See #1010

Co-authored-by: butterunderflow <[email protected]>
PiperOrigin-RevId: 592889073
copybara-service bot pushed a commit that referenced this pull request Dec 21, 2023
The initial implementation passed through the entire string unmodified, this allows formatting the Java expressions inside the `\{...}`.

See #1010

Co-authored-by: butterunderflow <[email protected]>
PiperOrigin-RevId: 592940163
@cushon
Copy link
Collaborator

cushon commented Dec 21, 2023

This has been merged in 38de9c4. Thanks again for the contribution!

@cushon cushon closed this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants