-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactivity API: Move Store's data encoding to the echo
call
#51974
Interactivity API: Move Store's data encoding to the echo
call
#51974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We can fix another potential problem. See Dennis' comment below.
echo "<script id=\"wp-interactivity-store-data\" type=\"application/json\">$store</script>"; | ||
echo sprintf( | ||
'<script id="wp-interactivity-store-data" type="application/json">%s</script>', | ||
wp_json_encode( self::$store ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we ought to also escape potentially ambiguous characters
wp_json_encode( self::$store, JSON_HEX_TAG | JSON_HEX_AMP )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context (Dennis explained this to me privately), a user could inject {"content": "this is a </script>"}
and it will close the script element unexpectedly. Or a malicious user could inject {"content": "this is a </script><script>malicious();</script>"}
and execute malicious code.
Using JSON_HEX_TAG
turns that into {"content": "this is a \u003C/script\u003E"}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add a small PHPUnit test for this, and it turned out a malicious script can't be injected that way because trailing slashes are escaped by default (there is a flag to deactivate this behavior, see JSON_UNESCAPED_SLASHES
). 🤷
The example above actually becomes as follows:
{"content":"this is a <\/script><script>malicious();<\/script>"}
As <\/script>
is not a valid closing tag, the <script>
remains open, and the malicious code cannot be interpreted and executed.
However, this already happens with trailing slashes anywhere! E.g.,
{ "url": "http://mydomain.com/page/123" }
turns into
{"url":"http:\/\/mydomain.com\/page\/123"}
So json_encode
seems to be secure by default (in that regard), although it is modifying strings in a way that we are not handling correctly in the Interactivity API runtime.
Now, my questions:
- Should we stay with the json_encode's default behavior (secure) and, in the client, process the
<script>
content to convert all\/
to/
and then parse the result? - Or should we use the above flags, along with
JSON_UNESCAPED_SLASHES
, to avoid processing the<script>
content in the client anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
process the <script> content to convert all / to / and then parse the result?
JSON.parse
seems to be handling these cases well, including the dangerous script
tag:
JSON.parse('{"url":"http:\/\/mydomain.com\/page\/123"}').url
// outputs 'http://mydomain.com/page/123'
JSON.parse('{"content":"this is not a <\/span><span>malicious<\/span> content"}').content
// outputs 'this is not </span><span>malicious</span> content'
JSON.parse('{"content":"this is a <\/script><script>malicious();<\/script> script"}').content
// outputs 'this is a \x3C/script>\x3Cscript>malicious();\x3C/script> script'
So unless we want to allow script
tags in the state strings (which we don't), I don't think we need to do anything in terms of client parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Re)learning stuff about JSON after years and years of using it. 😄
Yup, backslash escaping in strings is part of the JSON specs, so JSON.parse
already handles them correctly. As stated in the specification (link):
"A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes."
So, you are right; nothing to do in terms of client parsing.
So unless we want to allow
script
tags in the state strings (which we don't), I don't think we need to do anything in terms of client parsing.
Actually, when you do console.log
(i.e., printing the string to users in the client), the output of that string is the expected one:
a </script><script>malicious();</script> script
Again, nothing to do in this regard. I guess the code is secure, after all? I do not want to add any security breaches, although I don't see issues with the current implementation. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems mostly fine; thanks for testing everything. I've tried myself and can't figure out how to break it the way I thought I could. I've asked for external review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Dennis!
I've asked for external review.
Should we wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's wise to be cautious :) There's no penalty to using JSON_HEX_TAG
other than making the raw HTML slightly harder for a human to read.
I know of one way the current output (without the JSON_HEX_TAG
flag) can be broken: an opening comment tag.
What content the HTML spec allows inside <script>
tags (even those that have a non-executable type
attribute value) is really odd.
Try saving the following as wat.php
and loading it in a browser:
<!DOCTYPE html>
<html>
JavaScript is weird
<!-- First Script -->
<script type="application/json"><?php echo json_encode( ["one" => "two<!-- <script three"] ); ?></script>
<!-- Second Script -->
<script>document.write('but fun')</script>
right?
</html>
At first glance, it looks like it should generate a DOM that, to a human, would read "JavaScript is weird but fun right?". Instead, you'll see that it reads "JavaScript is weird right?". The problem is the <!-- <script
in the first script. When a browser sees a byte sequence like that inside a <script>
tag, it will keep that script tag open until it finds a closing comment tag (-->
) followed by a closing script tag (</script>
) (with anything between the two).
In the example above, that rule means that the browser only "sees" one script tag, and it has the following contents:
{"one":"two<!-- <script three"}</script>
<!-- Second Script -->
<script>document.write('but fun')
(Viewing the page source of wat.php in your browser or inspecting the DOM will make this clear.)
I don't know if it's possible for a byte sequence like <!-- <script
to appear in the data being JSON encoded here, but let's do
wp_json_encode( self::$store, JSON_HEX_TAG | JSON_HEX_AMP )
as suggested anyway. JSON_HEX_TAG
will prevent any of the above weirdness. JSON_HEX_AMP
isn't at all related, but it may prevent some further weirdness if anyone is trying to serve these pages as XHTML (a very very odd thing to do these days :)). There may be a couple other odd edge cases where JSON_HEX_AMP
is handy too - I don't remember (though they probably only come up in old browsers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did confirm that using echo json_encode( [ 'one' => '<!-- <script' ] )
results in the unwanted output to the browser. thanks for remembering that case, @mdawaffe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request has changed or added PHP files. Please confirm whether these changes need to be synced to WordPress Core, and therefore featured in the next release of WordPress. If so, it is recommended to create a new Trac ticket and submit a pull request to the WordPress Core Github repository soon after this pull request is merged. If you're unsure, you can always ask for help in the #core-editor channel in WordPress Slack. Thank you! ❤️ View changed files❔ lib/experimental/interactivity-api/class-wp-interactivity-store.php ❔ phpunit/experimental/interactivity-api/class-wp-interactivity-store-test.php |
3b33a6d
to
9176b2f
Compare
I messed up the branch with some merges, so I had to force-push again. Sorry for any trouble. 🙇 |
Flaky tests detected in 9176b2f. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5625683399
|
@@ -71,7 +61,9 @@ static function render() { | |||
if ( empty( self::$store ) ) { | |||
return; | |||
} | |||
$store = self::serialize(); | |||
echo "<script id=\"wp-interactivity-store-data\" type=\"application/json\">$store</script>"; | |||
echo sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using native WordPress functions like wp_register_script
that support inline scripts? They give better control over where the code is going rendered in the document, and plugin authors can use filtering if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would work, you can connect it with the existing interactivity script handle and put it before or after depending on the actual use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks. Added to the roadmap 🙂👍️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gziolo, do you know if wp_add_inline_script
works with application/json
scripts? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DAreRodz, most likely, you would need to use WP hooks to inject that custom type.
@adamsilverstein or @westonruter, what would you advise to handle in this special case? Is it something that we should bake into wp_add_inline_script
in WP core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of WordPress 6.3, such inline scripts get printed via \WP_Scripts::get_inline_script_tag()
which finally makes use of wp_get_inline_script_tag()
. This function fortunately includes a wp_inline_script_attributes
filter for manipulating the inline script's attributes, such as adding a nonce
or even change the type
to application/json
. Nevertheless, since this is a low-level function it isn't aware of the script handle specifically, although there is an id
attribute that is passed. So if the script handle was interactivity
and the script was a before
inline script, this should work:
add_filter(
'wp_inline_script_attributes',
static function ( $attributes ) {
if ( isset( $attributes['id'] ) && 'interactivity-js-before' === $attributes['id'] ) {
$attributes['type'] = 'application/json';
}
return $attributes;
}
);
I'm not sure whether it makes sense to add a more direct API to change the type of such inline scripts without seeing whether there are more such use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter, thank you so much for the detailed explanation. The id
should be enough in this case as it is standardized based on the script handle and always included for a few major WP releases.
I'm not sure whether it makes sense to add a more direct API to change the type of such inline scripts without seeing whether there are more such use cases.
Let's see if wp_add_inline_script
needs to be extended. If it's one of case, then the filter is more than enough.
What?
Moves
wp_json_encode()
to theWP_Interactivity_Store
'srender
method (next toecho
), thus escaping as late as possible.Also, removes the unneeded
serialize
method.Props to @danielwrobert. 😄
Why?
To conform with WP security guiding principles.
Testing Instructions
PHP unit tests should pass.