Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning for Table of Content Block Functionality and Heading IDs #22874

Open
itsjusteileen opened this issue Jun 3, 2020 · 12 comments
Open
Labels
[Feature] Document Outline An option that outlines content based on a title and headings used in the post/page [Type] Discussion For issues that are high-level and not yet ready to implement. [Type] New API New API to be used by plugin developers or package users.

Comments

@itsjusteileen
Copy link
Contributor

This issue sets up discussion started during a Core Editor chat for the functionality of a Table of Contents (TOC) Block. Currently, there are several PRs/Issues that provide possible solutions.

Add Table of Contents block (dynamic rendering + hooks version) PR #21234
"Table of Contents" Block PR #11047
#15426 (Closed) PR #15426

From a technical point, when working with a TOC block, how are items that aren't in blocks like headings and next page tags counted, and how is it determined if those items precede the current block? Counting Heading blocks is relatively easy, but counting all headings in the HTML is more difficult, and counting all headings in the HTML preceding the current block seems impossible in some situations. This challenge is compounded when considering if the headings are in a dynamic block.

Resolving these questions impacts:

  • Completion of the TOC block
  • Addressing edge-cases with the heading level checker in PR Heading block: add heading level checker #22650
  • Application for the existing content structure tool in the top toolbar.
  • Some Full Site Editing (FSE) blocks like site-title or post-title if the HTML is not set as an h1

Specific challenges that need feedback are:

  • Should the TOC block only support Heading Blocks?
  • How is support added for 3rd-party heading blocks?
  • Should there be some kind of generic "headings" API for blocks to hook into that is abstracted away from HTML?
  • What is involved in supporting third-party heading blocks?

Possible solutions include:

  • Something like allowing a block to declare one of its attributes as contributing to the outline of a document and abstracting away outline/table of contents in a getOutline rather than just getting core/heading. @mtias
  • Potentially adding a dedicated block API for third-party blocks. @youknowriad

@ZebulanStanphill @mtias @youknowriad @MichaelArestad contributed to the original conversation. Additional feedback here is welcome.

@ZebulanStanphill ZebulanStanphill added the [Type] Tracking Issue Tactical breakdown of efforts across the codebase and/or tied to Overview issues. label Jun 4, 2020
@ZebulanStanphill
Copy link
Member

If we create some kind of document outline API, we should probably include page break (<!--nextpage-->) data in it, so you can easily determine what page a block would appear on. That's one of the challenges I've run into with the Table of Contents block PR.

@youknowriad youknowriad added [Type] Discussion For issues that are high-level and not yet ready to implement. and removed [Type] Tracking Issue Tactical breakdown of efforts across the codebase and/or tied to Overview issues. labels Jun 4, 2020
@mtias mtias added the [Feature] Document Outline An option that outlines content based on a title and headings used in the post/page label Jun 8, 2020
@ZebulanStanphill ZebulanStanphill added the [Type] New API New API to be used by plugin developers or package users. label Jun 17, 2020
@mahnunchik
Copy link

I'm looking forward to have it live.

@mcsf
Copy link
Contributor

mcsf commented Jul 9, 2020

Should the TOC block only support Heading Blocks?

I strongly recommend starting with just Heading blocks. This greatly simplifies things both product and implementation, and removes hurdles to getting started. The question of whether there is an opportunity for supporting other blocks (perhaps via an API at the level of the block type or of the block proper) or for supporting HTML-level indexing of heading tags (in my opinion, something to avoid) can then be explored separately and on top of a finished base.

@mcsf
Copy link
Contributor

mcsf commented Jul 9, 2020

Something like allowing a block to declare one of its attributes as contributing to the outline of a document and abstracting away outline/table of contents in a getOutline rather than just getting core/heading.

There are many parallels with the optional HTML anchor feature in core blocks. Recently, #23197 extended this feature to all static core blocks, and it's notable how everything hinges on block types adhering to the feature with a simple supports declaration. One can imagine something similar with ToC:

"supports": {
  "tableOfContents": true
}

Any block type declaring the above would be picked up by a ToC hook. This could then mean that such blocks automatically sport a control to include it in the ToC, or could mean a more subtle experience (e.g. adding an HTML anchor to a block that has tableOfContents: true automatically adds the block to the ToC).

@ZebulanStanphill
Copy link
Member

@mcsf There's a bit of a problem with "just supporting Heading blocks" in the case of the Table of Contents block. That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.

The Table of Contents block also needs to support paginated posts properly, and this also currently has to be done two different ways depending on if you're in the editor or the front-end.

Right now, the Table of Contents block works perfectly on the front-end, but relies entirely on HTML parsing (which definitely isn't a performant way to handle it). I can't even change the PHP implementation to only work with core Heading blocks, because there's no concept of blocks anymore at that point. The only way to get the necessary data would be through something kinda like the block context system, and no such API relating to headings and page breaks currently exists.

So as far as I can tell, it's not possible to provide a shippable Table of Contents block right now. There is no clean, simple solution, because what the block tries to do requires data that is currently only available by creating temporary clones of the post in memory to parse and scan for specific HTML tags and comment strings.

As far as I can tell, the Table of Contents block needs a table of contents API.

Specifically, here's what the Table of Contents block needs to know in both the editor and the front-end:

  • What headings exist in the entire post?
  • What is the level of each heading?
  • What is the content of each heading?
  • What is an anchor of each heading?
  • What page is each heading on?
  • What page will I be on in the front-end? (Necessary to support only showing headings from the current page.)

To provide this data, Heading blocks will likely have to provide this data to the API:

  • Their heading level.
  • Their content.
  • Their anchor/id, if any.

Page Break blocks will likely have to tell the API that they mark the start of a new page, and therefore all blocks following them should be considered to be on page 2 (or 3, and so on).

All of the data requirements I have just listed are absolutely necessary to make the Table of Contents block work. If any one of these is not provided by some sort of API, then the block has to resort to messy HTML parsing.

(Remember, you can't just provide a list of Heading block clientIds to the API, because the blocks no longer exist at the dynamic rendering stage, so you can't just pull their data during PHP rendering.)

@mcsf
Copy link
Contributor

mcsf commented Jul 9, 2020

That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.

I don't follow; why is the ToC back end not consuming the output of the PHP block parser? Even if the server can't parse as fully as the block editor (stage I is block demarcation and explicit attribute parsing; stage II is full attribute sourcing, validation, migration, and is JS-only), there should be enough to get us started, and it will be much faster and safer than ad-hoc parsing of HTML.

Things like pagination support are not necessarily trivial, but would fall into place as soon as we can use the proper parser on the server to clearly identify — always relying on blocks, not HTML — what is a heading, what is a page boundary, and what else is heading-like.


What page will I be on in the front-end? (Necessary to support only showing headings from the current page.)

This might be something that the (environment-agnostic) block context API nicely solves.

@ZebulanStanphill
Copy link
Member

Hmm... I'd forgotten about the PHP block parser. Thanks for reminding me. You're right that I could use that on the PHP implementation. I'm currently not using it because my current implementation is still trying to support 3rd party heading blocks. If I switch to sourcing the data from block attributes, I have to drop support for all headings outside of the core Heading block.

It's also worth noting that even headings in our own Custom HTML block will be ignored by a Table of Contents implementation that only checks Heading block attributes. My thinking was that if we had a table of contents API, we could at least update the Custom HTML block to provide data to the API so they would work as expected.

Would a Table of Contents block that only supports core Heading and Next Page blocks be acceptable? It feels kind of wrong to ship it without 3rd party block support. But if desired, I can update my PR to work that way.

Still, though, it seems less than ideal to parse the whole post for block data whenever it encounters a Table of Contents block.

@ZebulanStanphill
Copy link
Member

Also, I'm not certain that post pagination info can be provided through the block context API. If a whole post is considered a single source of data, how can it provide different answers to "what page am I on?"... it seems like you'd have to use "Page" blocks to divide up the post, rather than marker points like the current Next Page block. But maybe the block context API is more powerful than I think?

@ZebulanStanphill
Copy link
Member

Having thought about this for a while, it's clear to me now that block context can't solve this. Block context provides data from a parent to its children, but in the case of page breaks, there's no parent to provide this info.

If we were to redesign WordPress from scratch, paginated posts could have been implemented via a "Page" block that would contain all the content that goes on that page. However, that's not how things are. Page breaks are determined at the seam between one and the other via the <!--nextpage--> comment tag. Even if the Page Break block provided block context, it wouldn't be able to provide it to anything. Block context is parent-to-child, not sibling-to-sibling.

I don't want to prematurely abandon a potential path forward, though... so here's a question: would it be feasible to deprecate the <!--nextpage--> tag and recommend that users use the aforementioned hypothetical "Page" block? That would allow block context to be used to easily solve the pagination issue. However, I fear it might be considered too incompatible with existing posts. But then again, if you're intending to add a Table of Contents block, you're already actively editing existing content, so maybe the cost isn't as big as it seems?

This still doesn't solve the headings issue, however. As far as I can tell, we have to support 3rd-party heading blocks. Even within core, the Heading block isn't the only reasonable place to put an <h1>-<h6> element. It's just as valid to put one in a Custom HTML block, isn't it? And there are already other blocks in core like Site Title that have to use heading elements. For these reasons, I still think a document outline API is required to solve the issue with headings.

@mcsf
Copy link
Contributor

mcsf commented Jan 21, 2021

I think we have to accept that trade-offs will be made, and make a choice we can be happy with. Otherwise, this feature will crumble under the weight of its requirements.

My own opinion is that we should optimise for:

  1. Picking up Heading blocks
  2. Handling pagination

and that this can come at expense of:

  1. Prematurely devising APIs for heading-like blocks/tokens
  2. Supporting all PHP-generated scenarios

The choices above are in order of preference. So I think it's better to ditch premature APIs than to ditch support for dynamic content. This makes it easier to let the editor itself generate a static ToC, but I think we can still leverage existing hooks in the WP back end and make sure the ToC is present at the top of each page. For example:

$pages = apply_filters( 'content_pagination', $pages, $post );

— in class-wp-query.php

@ZebulanStanphill
Copy link
Member

Just to be clear, do you think we should support 3rd-party heading blocks or not? There are already many plugins that add some variation of an "advanced heading" block, including:

And this isn't taking into account any other blocks that use headings like accordion blocks.

This makes it easier to let the editor itself generate a static ToC, but I think we can still leverage existing hooks in the WP back end and make sure the ToC is present at the top of each page.

I don't think I understand what you're trying to say here? My Table of Contents block can be placed anywhere from the start of the page to the very end, and there can be multiple instances of it. (This is useful for allowing each page of a paginated post to have its own table of contents.)

It's also worth pointing out that the reason my Table of Contents block is dynamic is that that altering the static output every time a heading changed resulted in two undo steps being created rather than just one. Pressing undo just once would change the table of contents, but not the heading. So unless someone can come up with an alternative solution there, the Table of Contents block has to be completely dynamic.

I do agree it would be best to try and solve this problem without introducing new APIs if possible. To that end, I've tried my best to complete the Table of Contents block in #21234, and at the moment the implementation certainly works in all likely situations, but I am concerned about the performance of the block, and there are a few edge cases that I can't handle without adding even more performance overhead. If you have any suggestions on how to proceed there, let me know.

@mcsf
Copy link
Contributor

mcsf commented Jan 29, 2021

It's also worth pointing out that the reason my Table of Contents block is dynamic is that that altering the static output every time a heading changed resulted in two undo steps being created rather than just one. Pressing undo just once would change the table of contents, but not the heading. So unless someone can come up with an alternative solution there, the Table of Contents block has to be completely dynamic.

I think it's fine to keep it dynamic, as long as the block in the editor still accurately represents the final output. That said, just to touch on the undo question — in case you aren't familiar with it yet — __unstableMarkNextChangeAsNotPersistent may be the answer.

I do agree it would be best to try and solve this problem without introducing new APIs if possible. To that end, I've tried my best to complete the Table of Contents block in #21234, and at the moment the implementation certainly works in all likely situations, but I am concerned about the performance of the block, and there are a few edge cases that I can't handle without adding even more performance overhead

Thanks for the work you're doing there. I've been meaning to review, and I think it's a great feature, but haven't found enough time yet.

Just to be clear, do you think we should support 3rd-party heading blocks or not? There are already many plugins that add some variation of an "advanced heading" block, including:

In the long run, the editor should understand that, beyond core/heading, certain blocks act as headings, thus automatically allowing a Table of Contents block to pick them out from the content. But I don't think it's something that needs to be solved before we can implement a ToC block, and my point was that the current discussions around ToC should focus on the most correct implementation we can design without compromising for third-party blocks.

Other efforts out there, such as semantic template parts (#27337), deal with a similar ontological problem. Even if the domain is very different — templates and template parts — it's something to keep an eye on and learn from.

As always, the duty and luxury with Gutenberg is that we're building for the long run. So we can afford to take time to get some of these things right. I mean, just look at how many times we've visited footnotes (#1890) over nearly four years! So, to distill my original message: let's start by building a good ToC block in that it works well, feels right, and treats user data well. Only then should we worry about widening the reach of that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Document Outline An option that outlines content based on a title and headings used in the post/page [Type] Discussion For issues that are high-level and not yet ready to implement. [Type] New API New API to be used by plugin developers or package users.
Projects
None yet
Development

No branches or pull requests

6 participants