Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cssom-view] elementFromPoint, elementsFromPoint, and caretPositionFromPoint should not return an element inside a shadow tree #556

Open
rniwa opened this issue Sep 30, 2016 · 49 comments

Comments

@rniwa
Copy link

rniwa commented Sep 30, 2016

elementFromPoint and elementsFromPoint should not return an element inside a shadow tree. Instead, it should look for the highest shadow host of the element and return that instead so that it doesn't leak an element in shadow trees.

See the shadow DOM specification.

More precisely, once these methods are added on DocumentOrShadowRoot interface, then we need to retarget the element we found against the context object.

@rniwa
Copy link
Author

rniwa commented Sep 30, 2016

@annevk @hayatoito @zcorpan

@rniwa
Copy link
Author

rniwa commented Sep 30, 2016

Okay, there's another issue with the current definition either in CSS OM or shadow DOM specification. When the result of hit testing results in a Text node which is a direct child of ShadowRoot, what should we return? It looks like Chrome currently returns null but that seems broken. Given the definition of retargeting, it should be returning the shadow host instead.

But none of this is clear because either specification really says anything about in which tree the node is found (e.g. composed tree versus flat tree; probably the latter) and when & where the retargeting happens.

@rniwa
Copy link
Author

rniwa commented Oct 1, 2016

Another edge case to consider is when the hit testing results in a Text node which is assigned to a slot. I think we should return the slot when elementFromPoint is called from within the shadow tree and the shadow host when it's called outside the shadow tree.

@rniwa
Copy link
Author

rniwa commented Oct 4, 2016

I'm going to implement the proposed behavior in https://bugs.webkit.org/show_bug.cgi?id=162882 because whatever Chrome/Blink currently implements is awfully broken.

@zcorpan zcorpan changed the title elementFromPoint and elementsFromPoint should not return an element inside a shadow tree [cssom-view] elementFromPoint and elementsFromPoint should not return an element inside a shadow tree Oct 4, 2016
kisg pushed a commit to paul99/webkit-mips that referenced this issue Oct 5, 2016
https://bugs.webkit.org/show_bug.cgi?id=162882

Reviewed by Chris Dumez.

Source/WebCore:

Add elementFromPoint to ShadowRoot's prototype as specified at:
https://www.w3.org/TR/shadow-dom/#extensions-to-the-documentorshadowroot-mixin
with changes proposed at w3c/csswg-drafts#556

Added TreeScope::retargetToScope which implements

This patch also factors DocumentOrShadowRoot.idl out of Document and ShadowRoot interfaces to better match
the latest DOM specification: https://dom.spec.whatwg.org/#mixin-documentorshadowroot

Test: fast/shadow-dom/Document-prototype-elementFromPoint.html

* CMakeLists.txt:
* DerivedSources.make:
* WebCore.xcodeproj/project.pbxproj:
* dom/Document.cpp:
(WebCore::Document::nodeFromPoint): Moved to TreeScope.
(WebCore::Document::elementFromPoint): Moved to TreeScope.
* dom/Document.h:
* dom/Document.idl: Moved elementFromPoint and activeElement to DocumentOrShadowRoot.idl.
* dom/DocumentOrShadowRoot.idl: Added.
* dom/EventPath.cpp:
(WebCore::RelatedNodeRetargeter::checkConsistency): Use newly added TreeScope::retargetToScope.
* dom/ShadowRoot.idl: Moved activeElement to DocumentOrShadowRoot.idl.
* dom/TreeScope.cpp:
(WebCore::TreeScope::retargetToScope): Added. Implements https://dom.spec.whatwg.org/#retarget efficiently.
Instead of checking whether A (node) is a shadow-including inclusive ancestor of B (this scope) at each
parent, find the lowest ancestor which contains both A and B, and return the self-inclusive ancestor of B
in that tree. To find the lowest common ancestor in O(n), traverse all ancestors of A and B separately and
do a top-down traversal. The last tree scope in which A's ancestor and B's ancestor match is the lowest
common ancestor.
(WebCore::TreeScope::nodeFromPoint): Moved from Document.
(WebCore::TreeScope::elementFromPoint): Moved from Document. Use retargetToScope and parentInComposedTree
instead of parentNode and ancestorInThisScope to match the semantics proposed in
w3c/csswg-drafts#556
* dom/TreeScope.h:

LayoutTests:

Add a W3C style testharness.js test for elementFromPoint on ShadowRoot.

* fast/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint-expected.txt: Added.
* fast/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html: Added.

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@206795 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@rniwa rniwa changed the title [cssom-view] elementFromPoint and elementsFromPoint should not return an element inside a shadow tree [cssom-view] elementFromPoint, elementsFromPoint, and caretPositionFromPoint should not return an element inside a shadow tree Dec 7, 2016
@rniwa
Copy link
Author

rniwa commented Dec 7, 2016

caretPositionFromPoint is even weirder. It’s not clear to me what should happen to an offset when the node is retargeted.

@hayatoito
Copy link
Member

hayatoito commented Dec 7, 2016

Hmm. It is not good that we would not have interoperable implementations.

@yosinch FYI. He should know better than me about how Blink implemented these APIs.

@zcorpan
Copy link
Member

zcorpan commented Feb 27, 2018

So given WICG/webcomponents#735 (comment) it appears that chromium and gecko are moving towards what WebKit implemented. Is that right @rakina @smaug---- ?

It seems the spec needs to cover at least these things, in addition to the above:

But none of this is clear because either specification really says anything about in which tree the node is found (e.g. composed tree versus flat tree; probably the latter) and when & where the retargeting happens.

Where is flat tree defined?

@hayatoito
Copy link
Member

Where is flat tree defined?

flat tree is defined here: https://drafts.csswg.org/css-scoping/#flattening

@smaug----
Copy link

smaug---- commented Feb 28, 2018

So given WICG/webcomponents#735 (comment) it appears that chromium and gecko are moving towards what WebKit implemented.

I'm not quite sure, at least based on test results. Like Gecko returns null for text under shadowroot, since that is what is most logical (host is after all above ShadowRoot).
Current Nightly (with shadowdom pref) gives same test results as blink in https://wpt.fyi/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html

(random note, it is weird to have wpt for methods which behavior is mostly undefined.)

@rakina
Copy link
Member

rakina commented Mar 1, 2018

Blink's behavior is moving towards WebKit's implementation. A patch landed about a month ago but it's not reflected on wpt.fyi yet I guess (it seems wpt.fyi is still using Chrome 63?)

@smaug----
Copy link

And what is that behavior? Need to get the spec changed before making random changes to implementations.

@rakina
Copy link
Member

rakina commented Mar 2, 2018

For elementFromPoint, we retarget the hit test result againts context object (document/shadow root the elementFromPoint method is called on).

For elementsFromPoint, we retarget the hit test result againts context object, one-by-one and only include elements once in the list (no duplicates).

More details on cases with text nodes and slots etc, it is as the test at https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/external/wpt/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html?rcl=9f75a2842405ed0f378d8a4e60d6de2568704880&l=43 says, and I guess this WPT should be removed as long as the spec is not clear yet?

@hayatoito
Copy link
Member

hayatoito commented Mar 2, 2018

@rakina

My guess is that Blink's HitTest never returns an element which doesn't have LayoutBox.
If the result of HitTest is a text node, we have to adjust it to its parent element in LayoutTree.
If we honer that "We never returns an element which doesn't have LayoutBox", we shouldn't return a slot, even if a text node is assigned to the slot, as a result of adjustment, because slot's default style is "display: contents". We must return slot's parent in LayoutTree in that case.

From this perspective, the test looks correct to me.

@tabatkins
Copy link
Member

If the result of HitTest is a text node, we have to adjust it to its parent element in LayoutTree.

What's its parent element? The nearest ancestor that's an element in the flat tree? Or something else? (Sorry, I'm just not sure which spec concept the LayoutTree is approximating.)

@hayatoito
Copy link
Member

hayatoito commented Mar 16, 2018

The nearest ancestor that's an element in the flat tree? Or something else?

Let me clarify my intention:

  • Yeah, basically, the nearest ancestor that's an element in the flat tree. That's right.
  • However, it it doesn't have a Layout Box, e.g. <slot> elements with display:content, we should skip it and continue to traverse the flat tree up until it find an element which has a Layout Box.

I don't know any good terminology to express this. :(

@annevk
Copy link
Member

annevk commented Mar 16, 2018

@tabatkins LayoutTree is almost certainly the box tree.

It's not immediately clear to me however why hit testing wouldn't immediately go from the box in the box tree to the associated node in the flat tree/node tree, and then from there you find the nearest ancestor element.

It seems a little weird to have a special element hit testing primitive that finds a box corresponding to an element. At least I hadn't heard of that concept before.

@tabatkins
Copy link
Member

tabatkins commented Mar 16, 2018

Properly, hit-testing applies to fragments; they're what results from layout. (Boxes don't have positions or sizes, they're the result of applying box-construction only.) (explanation)

The relevant question is exactly how the traversal is done. Here's one method:

  1. Let |frag| be the fragment at |point|.
  2. If |frag| was generated by a text run (rather than a box), walk its ancestors in the fragment tree until you find a fragment generated by a box, and set |frag| to that.
  3. Let |box| be the box that generated |frag|.
  4. If |box| is an anonymous box or was generated by a pseudo-element, walk its ancestors in the box tree until you find a box generated by an element, and set |box| to that.
  5. Let |element| be the element that generated |box|.
  6. If |element| isn't allowed to be returned in this context due to shadow-hiding, walk its ancestors in the flat tree until you find one that's allowed to be returned and that generated a box, and set |element| to that.
  7. Return |element|.

@hayatoito Is this what we do? If not, what differs?

@phistuck
Copy link
Contributor

phistuck commented May 18, 2018

@tabatkins - for non-shadow-allowing contexts, that may return a display: contents element, right? Should it?

@tabatkins
Copy link
Member

No, it can't, because display:contents elements can't create boxes, so they can't create fragments, so they won't get selected by step 3.

@phistuck
Copy link
Contributor

phistuck commented May 18, 2018

@tabatkins - but they can be/contain shadow DOM, that does contain boxes and fragments and so on...

<ShadowDOM style="display: contents">
  <ShadowDOM-div>Hi</ShadowDOM-div>
</ShadowDOM>

And document.elementFromPoint(...) for the H in "Hi".

@tabatkins
Copy link
Member

@hayatoito Oh yeah, duh, you're right, they can just be merged with that change. So:

  1. Let |frag| be the top-most fragment at |point| that responds to pointer events. (Or all of them for elementsFromPoint(), etc.)
  2. If |frag| was generated by a text run (rather than a box), walk its ancestors in the fragment tree until you find a fragment generated by a box, and set |frag| to that.
  3. Let |box| be the box that generated |frag|.
  4. If |box| is an anonymous box, or was generated by a pseudo-element, or was generated by an element that's shadow-hidden from the call's context, walk its ancestors in the box tree until you find a box that doesn't match any of those conditions, and set |box| to that.
  5. Return the element that generated |box|.

@rniwa Taking your example, and assuming the slot was still its default display:contents, then the element returned would be the div. You click on the "hi" fragment, find that it was generated by a text run, then walk the box tree upwards until you find the div; this can then be returned.

If the slot was display:block or something, then if you called shadowRoot.elementFromPoint(), you'd return the slot; if you called document.elementFromPoint(), the slot would fail the condition in step 4, so we'd keep walking upwards until we found the div and return that.

@rniwa
Copy link
Author

rniwa commented Oct 17, 2018

I don't think that matches the author expectation. If a hit testing was done on a text assigned to a slot, then I want to know that the content inside a slot is picked, not on a shadow host. Whether an element generates a CSS box or not doesn't seem like an important distinction from the developer ergonomics standpoint of view.

With the proposed behavior, the author who wants to detect whether a given point is inside a slot or not, would have to check whether a node / element returned by these functions are assigned to some slot. It's even worse. If it was a descendent of another element with display: contents outside the shadow tree, we'd have to walk up the tree to find if the ancestor, which is a direct child of the shadow root is assigned to a slot or not. That seems like a lot of extra complexity for developers.

@hayatoito
Copy link
Member

hayatoito commented Oct 18, 2018

I see, however, your concern is also applied to the following case, right?

<div id=a>
   <div id=b style="display: contents">
     foo
   </div>
   <div id=c style="display: contents">
     bar
   </div>
</div>

In this case, we can't tell whether the point is on "foo" text node or "bar" text node.
elementFromPoint always returns <div id=a> in either case, I think.
So your concern can be converted into more general one, I'm afraid.

I think what we want here is nodeFromPoint, instead of tweaking elementFromPoint, if we want to resolve this general concern.

@tabatkins
Copy link
Member

Right.

So the deal is, we're excluding display:contents elements (of which slots usually are) for a reason - there's no way to click on such an element, because it doesn't exist on the page. You can click on its contents, sure, but that's something different.

Here's an example of why this makes sense: if you call elementsFromPoint(), you'll often get the full ancestor chain of the clicked element, because usually descendants are contained within the geometry of ancestors. But if you use abspos or something to move an element outside the ancestor's geometry, then when you click on the child the ancestor does not show up, because it's not underneath the mouse click. If the parent of the abspos is display:contents, should it be returned or not? It doesn't have any dimensions; if we return it in the list, and people then check its bounds, they won't contain the point that was clicked.

(For example, see http://software.hixie.ch/utilities/js/live-dom-viewer/saved/6290.)

If we don't return such an element in this situation, we shouldn't return it from elementFromPoint(), just because it happens to be the parent node of some clicked text; elementFromPoint() should always return the first element of the array returned by elementsFromPoint().

So yeah, insofar as capturing a useful notion of what text was clicked is important (and I agree it is), what you want is nodeFromPoint(), so we can return the text node, rather than trying to do some tricky nonsense with elementFromPoint() to return an element that is literally not at the specified point.

@rniwa
Copy link
Author

rniwa commented Oct 18, 2018

In this case, we can't tell whether the point is on "foo" text node or "bar" text node.

UA can certainly tell which text node given all major UAs support text selection via mouse drag.

Here's what I'm suggesting. Instead of look for the first element in CSS box tree, find the node and the return its parent in the composed tree if the node is not an element (obviously after considering which node should be visible to which tree).

@tabatkins
Copy link
Member

UA can certainly tell which text node given all major UAs support text selection via mouse drag.

By "we", Hayato was referring to the user of the API; they wouldn't be able to tell which text was at the provided point (given the suggested behavior), since they'd just get the container element.

Instead of look for the first element in CSS box tree, find the node and the return its parent in the composed tree if the node is not an element (obviously after considering which node should be visible to which tree).

That doesn't seem to address my points:

  • How is that consistent with elementsFromPoint()? (Should a display:contents ancestor be in the list or not? Why or why not?)
  • How is it consistent with getBoundingClientRect() (which returns a 0x0 rect at 0,0 for display:contents elements)?
  • What makes display:contents special here, given that a similar problem exists without shadow DOM at all - any markup with intermixed raw text and sibling elements doesn't tell people which text node is at the specified point, since it'll just report the parent element.

Defining the algorithm as suggested, where it finds the first element generating a box generating a fragment containing the specified point, gives us good, consistent answers to the first two bullet points. Adding a nodeFromPoint(), as Hayato suggested, solves the third bullet point in all of its variations - you can always tell which text node is at the specified point, regardless of how its surrounding/ancestor elements are styled.

@rniwa
Copy link
Author

rniwa commented Oct 26, 2018

Rough consensus at TPAC F2F: Add an option to include 'display: contents', implement what @hayatoito and @tabatkins proposed, and add nodeFromPoint().

@tabatkins
Copy link
Member

So the "option to include display: contents" would, I guess, run the algorithm normally, then also run a display:contents seeker, and return whichever is closer to the starting fragment in flat-tree order?

The display:contents seeker would be something like:

  1. Let |frag| be the top-most fragment at |point| that responds to pointer events.
  2. Walk |frag|’s inclusive ancestors in the fragment tree until you find a fragment generated by a box-tree node that was generated by a DOM text node or DOM element node. Let |DOM node| be that DOM text node or DOM element node.
  3. If |DOM node| is not an element, set |DOM node| to the first of its ancestors that is an element.
  4. Return |DOM node|.

Then nodeFromPoint() would be the same algorithm, just without step 4.

@smaug----
Copy link

We still need a proper algorithm here, especially for the cases where the first non-display:contents element is inside some other shadow tree. In that case elementFromPoint needs to find the first ancestor which is accessible to the ShadowRoot on which the method is called.

@tabatkins
Copy link
Member

Ugh, I rewrote the algorithm several times and ended up accidentally killing the "search for a shadow-exposed node" step. Assume that happens between steps 3 and 4.

@emilio
Copy link
Collaborator

emilio commented Dec 24, 2018

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

For example, the following HTML:

<!doctype html>
<style>
  html, body { margin: 0; }
  div {
    width: 100px;
    height: 100px;
    background: red;
    position: relative;
  }
  span {
    display: inline-block;
    width: 100px;
    height: 100px;
    background: green;
  }
  div::after {
    content: "";
    position: absolute;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    display: block;
    background: rgba(0, 0, 0, .2);
  }
</style>
<div><span></span></div>
<script>
  console.log(document.elementsFromPoint(50, 50));
</script>

Will log the <div> twice in Gecko, WebKit and Blink.

@SebastianZ
Copy link
Contributor

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

As far as I can see, CSSOM View doesn't say anything about generated content. So I guess that needs to be clarified, too.

Also, as an author I'd expect that it is somehow indicated that the first element is actually the pseudo-element and not the <div>, but let me know if that's out of scope for this issue.

Sebastian

@rakina
Copy link
Member

rakina commented Jan 25, 2019

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

Since this is not directly related to the issue topic (elementFromPoint etc behavior w/ Shadow DOM), I suggest this to be discussed in another issue.

@rakina
Copy link
Member

rakina commented Jan 25, 2019

So it looks like we agree to use the steps described by Tab here (with some changes to consider the shadow-hidden-ness) and here. Should I/anybody interested make a spec change PR with those steps then?

@tabatkins @rniwa @smaug----

@rniwa
Copy link
Author

rniwa commented Jan 25, 2019

@rakina As @smaug---- mentioned in #556 (comment), we still need a well defined algorithm to handle display: contents. I think we should have both algorithms nailed since there appears to be a lot of edge cases & details that we keep stumbling upon.

@rakina
Copy link
Member

rakina commented Jan 25, 2019

@rniwa What else do you think is missing from #556 (comment)?

For the shadow handling part, we can do something like the retarget algorithm, but working with flat tree instead. So while the node's root is a shadow root and node is not a flat-tree descendant of the context object, we set the node as its shadow host. (I don't know where flat-tree is defined in the specs though, cc @hayatoito )

@rniwa
Copy link
Author

rniwa commented Jan 27, 2019

Right, something like the retarget algorithm over flat tree is what we need. What we're missing is the precise definition / algorithm of what that is. #556 (comment) is insufficient because we don't really retarget the first ancestor in the flat tree for example. It would be great if you or someone else can come up with a precise algorithm for this. Unfortunately, I'm caught up in other urgent matters at the moment so won't have a time to do it myself.

FYI, CSS Scoping Module Level 1 defines flat tree.

@rniwa
Copy link
Author

rniwa commented Jul 7, 2022

These functions continue to behave differently in Blink vs WebKit. It's a major interoperability concern for shadow DOM at this point.

@josepharhar
Copy link
Contributor

It looks like there is a WPT here: https://wpt.fyi/results/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html

@rniwa does it reflect the behavior you desire? Are there any other relevant WPTs you are aware of?

@rniwa
Copy link
Author

rniwa commented Jan 13, 2023

@hober
Copy link
Member

hober commented Jan 17, 2023

CSS Fragmentation doesn't define "fragment tree." @tabatkins, is there an equivalent concept?

@JRJurman
Copy link

@rniwa @rakina - I notice that the DocumentOrShadowRoot-prototype-elementFromPoint.html has not been updated since this discussion.

Should the test be updated to reflect what is written in the WebKit repo, or is there still an agreement to reach here? I'm happy to copy over the WebKit implementation of the tests, if that reflects what we want going forward, just let me know, and I can do that work.

@zac-semolina-solutions
Copy link

I recently found myself needing to write an engine-independent DocumentOrShadowRoot.elementsFromPoint() method. See https://github.com/semolina-solutions/layer-aware-pointer-events/blob/211c8c1c983f641e45e374f0d56040670ccd869e/lib/index.ts#L223.

While I'm naive to the internal implementation details in each engine, I found that Gecko's elementsFromPoint() output differs in two ways at the time of writing:

  1. Gecko will return elements only within that DOM, rather from the html element.
  2. When an element with pointer-events: none styling is queried from its Shadow DOM in Gecko, it will always be included in the resulting list, despite itself not being a potential event target.

Whether or not elements in open shadow DOMs should be included in higher-level elementsFromPoint() calls is somewhat arbitrary IMO, although including them would avoid needing to repeatedly and expensively query from each shadow root to build up the full picture. It'd be advantageous to have a deep = true setting to pierce through all open shadow DOMs below.

It'd be nice however to address difference no. 2, i.e. the inconsistency of a shadow root host element being included in the ShadowRoot.elementsFromPoint() when it's styled to ignore pointer events. It's good that it's included queries of higher-level DOMs, because it indicates that "there's something in this shadow DOM that is reactive to pointer events".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests