-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eRFC: Add JSON backend to Rustdoc #2963
Conversation
I've been working with @P1n3appl3 on this and have to say that I've been very impressed with how much attention to detail they put into it. This is definitely one of the most well-documented RFCs I've seen, which of course is fitting for a rustdoc RFC :) |
stabilize, we could accidentally stabilize the internal structures of `rustdoc`. We have tried | ||
to avoid this by introducing a mirror of `rustdoc`'s AST types which exposes as few compiler | ||
internals as possible by stringifying or not including certain fields. | ||
- Adding JSON output adds *another* thing that must be kept up to date with language changes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it take to generate rustdoc's current HTML from a JSON IR? It could be a way to ensure that changes are applied uniformly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to use the JSON output as the "source of truth" and make the HTML renderer consume that, we'd just slap serde derives on the clean
types so that we could round trip them rather than creating another set of types. The issue then is that you probably wouldn't want to make that JSON a public interface since it would have to expose some compiler internal types that the HTML renderer happens to use, and it would still be completely unstable.
I think it's worth keeping the JSON types as a separate representation because they have a different purpose than the clean
types. Since they're supposed to be a relatively stable public interface, they don't need to be updated to add support every time an unstable feature is added to the compiler. It also allows us to drop some information for the sake of making the JSON easier to handle.
----------|---------|------------------------------------------------------------------------------ | ||
`name` | String | The name of the crate. If `--crate-name` is not given, based on the filename. | ||
`version` | String | (*Optional*) The version string given to `--crate-version`, if any. | ||
`includes_private` | bool | Whether or not the output includes private items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to comment about being able to find that out from the items listed, but I realized you can't - if all the items are public, then this tells you that there were no private items, as opposed to hiding the private items.
No change necessary, just reasoning it out :)
Name | Type | Description | ||
----------|---------|------------------------------------------------------------------------------ | ||
`crate_num` | int | A number corresponding to the crate this Item is from. Used as an key to the `extern_crates` map in [Crate](#Crate). A value of zero represents an Item from the local crate, any other number means that this Item is external. | ||
`path` | [String] | The fully qualified path (e.g. `["std", "io", "lazy", "Lazy"]` for `std::io::lazy::Lazy`) of this Item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work for generics in qualified paths (std::vec::Vec<T>
)? Or for associated items (std::vec::Vec::into_iter
, <std::vec::Vec as IntoIterator>::IntoIter
? Or opaque types (fn f() -> impl Iterator
)?
More things that could break this at QPath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These paths are only to the actual items corresponding to IDs, so if you saw Vec<T>
in some function signature it'd give you the ID for Vec
and when you look it up in paths
you'd get the ItemSummary
with path = ["std", "vec", "Vec"]
. The info about the generic params would be contained at the use site (in the Type representation for Vec<T>
). If it's a local type then you can look it up in index
to get more info about what generic params it takes and their bounds, but for something like Vec
you don't get that info in your crates json blob. You'd have to generate the json for std
and go find it in there yourself (probably by using the path from ItemSummary
).
For reference here's what you get in the example above (with most fields removed for brevity)
pub fn f<T>(arg: Vec<T>) { ... }
{
"name": "f",
"kind": "function",
"inner": {
"decl": {
"inputs": [
[
"arg",
{
"kind": "resolved_path",
"inner": {
"name": "Vec",
"id": "5:4236",
"args": {
"angle_bracketed": {
"args": [ { "type": { "kind": "generic", "inner": "T" } } ],
"bindings": []
}
},
"param_names": []
}
}
]
],
"output": null
},
"generics": {
"params": [
{ "name": "T", "kind": { "type": { "bounds": [] } } }
],
"where_predicates": []
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense. How would you handle <T as Trait>::AssociatedItem
though? Normalize the type before generating the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the "kind" for the type would be "qualified_path" which has the necessary info.
----------|---------|------------------------------------------------------------------------------ | ||
`crate_num` | int | A number corresponding to the crate this Item is from. Used as an key to the `extern_crates` map in [Crate](#Crate). A value of zero represents an Item from the local crate, any other number means that this Item is external. | ||
`name` | String | The name of the Item, if present. Some Items, like impl blocks, do not have names. | ||
`span` | [Span](#Span) | (*Optional*) The source location of this Item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would this be useful for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though we don't output the source files themselves like the HTML backend, an alternative frontend could get the source and use these to link to it.
text/0000-rustdoc-json.md
Outdated
Name | Type | Description | ||
--------------|----------|------------------------------------------------------------------------- | ||
`type` | [Type](#Type) | The type of this static. | ||
`expr` | String | The stringified expression that this static is assigned to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a weird thing to expose in the public API (and is also very underspecified). What's the rationale for including it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to constants, you might want to show the value a static is assigned to:
pub static controller: Controller = Controller::new("some publicly useful arg", 42);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's mostly for pretty printing, so I think it should be marked as unstable and subject to change. Rustc has no stability guarantee on -Z unpretty=*
anywhere else, I don't think we should add it here.
### `kind == "macro"` | ||
|
||
A `macro_rules!` declarative macro. Contains a single string with the source representation of | ||
the macro with the patterns stripped, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about macros 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot they were a thing. Due to my unfamiliarity with those and proc macros I've left them out for now.
text/0000-rustdoc-json.md
Outdated
|
||
Name | Type | Description | ||
-----------|----------|---------------------------------------------------------------------------- | ||
`inputs` | [(String, [Type](#Type))] | A list of parameter names and their types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient to describe function signatures in Rust - you could have arbitrary patterns.
fn f((_, x): (usize, usize)) { ... }
fn g(MyStruct { field1: usize, .. }) { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah didn't know you could destructure stuff like that in function args. I think it's still fine though, patterns just get stringified like names so you end up with ["(_, x)", <Type representation of tuple of usizes>]
and ["MyStruct { field1: usize, .. }", <Type representation of MyStruct>]
for those examples. It would be a problem if the consumer of the json wanted to know the names x
and field1
because they'd have to parse arbitrary patterns, but I can't think of a case where they need that info off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with having that in the output, but it should be marked as unstable and subject to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to marking the format of the value as unstable and subject to change, but I think we could stabilize its existence and that it's a reasonable thing to put in the rendered docs. (Here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems reasonable. "You can display this to the end-user but you can't depend on parsing it".
the JSON. Leaving them unresolved seems preferable but it would mean that consumers have to do | ||
markdown parsing to replace them with actual links. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible in general to resolve intra-doc links with the format specified in this RFC. That requires knowing which paths are in scope where the item is documented, not all of which are documented or public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about the fact that being able to write unqualified paths means that scoping info is required. Maybe it would be possible to resolve them to their fully qualified module paths and modify the markdown to output those ("docstring referencing [Iterator]"
-> "docstring referencing [Iterator](trait@foo::Iterator)"
or something)? Alternatively we could have another field with the list of IDs or fully qualified paths corresponding to each link in the markdown block in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trait@foo::Iterator
This is still ambiguous, because you could have multiple crates with the same name, e.g. tokio 0.1
and tokio 0.2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an alternative is to have intra-doc links as a separate output field? Something like
{
"intra-doc-links": {
"struct@foo::Iterator": 0,
}
}
where the value is the ID. That assumes the linked item is part of the output though, which isn't always true for cross-crate re-exports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case how about we include a list of IDs corresponding to the resolved items from the intradoc links in the docstring. That way the consumer can use those to look up the items in path
or index
and generate links to them using their crate_num
which solves the versioning issue.
Another general thing I want to mention besides nitpicks: A lot of the format here seems to be based around |
Yeah the representation was chosen both because it was easy to implement with a bunch of Are there specific things about |
Yes, but they're not really relevant to this RFC. |
text/0000-rustdoc-json.md
Outdated
`docs` | String | The extracted documentation text from the Item. | ||
`attrs` | [String] | The attributes (other than doc comments) on the Item, rendered as strings. | ||
`links` | [[ID](#ID)] | A list of items corresponding to any intra-doc links in `docs` in order of appearance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are broken links? For the following code, I think rustdoc will internally store [<Def Id of f>]
, so you'd have no way to know whether that should belong to f
or to MissingItem
.
/// Link to [`MissingItem`], [`f`]
pub fn f() {}
I think a map from the name to the id is more resilient. Alternatively we could store null
to mean an unresolved link, but that requires the consumer of the output to do some more work on their end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a mapping does sound better, i was worried at first that we couldn't use a map because of duplicate keys but that isn't an issue because there's no way to have the same name resolve to different stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm now that you say that I'm not actually sure it's true ... consider:
pub struct S;
/// Link to [S]
pub fn f() {}
pub mod inner {
pub struct S;
/// Link to [S]
pub fn g();
}
Maybe we should include the ID of the item being documented as one of the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're still fine, this mapping is just for a single item's docstring so in that context it could only refer to one of them.
text/0000-rustdoc-json.md
Outdated
@@ -366,9 +368,9 @@ Name | Type | Description | |||
`is_unsafe` | bool | Whether this impl is for an unsafe trait. | |||
`generics` | [Generics](#Generics) | Information about the impl's type parameters and `where` clauses. | |||
`provided_trait_methods` | [String] | The list of names for all provided methods in this impl block. This is provided for ease of access if you don't need more information from the `items` field. | |||
`trait` | [Type](#Type) | The trait being implemented or `null` if the impl is "inherent". | |||
`trait` | [Type](#Type) | (*Optional*) The trait being implemented or `null` if the impl is "inherent". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explain what inherent means?
pub struct S;
impl S { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i definitely had to look that up when i read the code.
text/0000-rustdoc-json.md
Outdated
`for` | [Type](#Type) | The type that the impl block is for. | ||
`items` | [[ID](#ID)] | The list of method, constant, and typedef items contained in this impl block. | ||
`items` | [[ID](#ID)] | The list of associated items contained in this impl block. | ||
`negative` | bool | Whether this is a negative impl (e.g. `!Sized` or `!Send`). | ||
`synthetic` | bool | Whether this is an impl that's implied by the compiler (for autotraits). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`synthetic` | bool | Whether this is an impl that's implied by the compiler (for autotraits). | |
`synthetic` | bool | Whether this is an impl that's implied by the compiler (for autotraits, e.g. `Send` or `Sync`). |
text/0000-rustdoc-json.md
Outdated
`docs` | String | The extracted documentation text from the Item. | ||
`attrs` | [String] | The attributes (other than doc comments) on the Item, rendered as strings. | ||
`links` | [[ID](#ID)] | A list of items corresponding to any intra-doc links in `docs` in order of appearance. | ||
`attrs` | [String] | The attributes (other than doc comments) on the Item, rendered as strings (e.g. `["#[inline]", "#[test]"]`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`attrs` | [String] | The attributes (other than doc comments) on the Item, rendered as strings (e.g. `["#[inline]", "#[test]"]`). | |
`attrs` | [String] | The [unstable](#Unstable) stringified attributes (other than doc comments) on the Item (e.g. `["#[inline]", "#[test]"]`). |
…eGomez Add `--output-format json` for Rustdoc on nightly This enables the previously deprecated `--output-format` flag so it can be used on nightly to host the experimental implementation of [rfc/2963](rust-lang/rfcs#2963). The actual implementation will come in later PRs so for now there's just a stub that gives you an ICE. I'm _pretty_ sure that the logic I added makes it inaccessible from stable, but someone should double check that. @tmandry @jyn514
I suggest adding format version to the JSON output, and to the command line flag, requesting the output, similar to how This might be overly cautious, however -- with |
The worst that happens is that the version is permanently set to |
This seems like it would be great for detecting breaking changes in crates. One thing I would find incredibly useful, which it doesn't seem like the JSON output currently exposes, would be when the anonymous return type of an // Does the future returned by this function implement Send?
async fn my_async_fn() {
...
}
// Does the future returned by this function implement Send? What if T is not Send?
async fn my_async_fn<T>() -> T {
...
} |
Type inference is mutually incompatible with ignoring type errors in function bodies: rust-lang/rust#75100. So this is not possible, or at least extremely difficult. Is there a reason you can't return |
That's unfortunate.
We can't do that on an async fn without converting it to an |
Hm, that does sound useful. Theoretically I think it's possible to do, but it sounds like the implementation would be difficult right now. I think this would be a useful thing to add in the future. That does raise the question of what the process should be for extending the JSON output in non-breaking ways in the future. An FCP on the PR feels to me like the right amount of "weight" to give something like this. What do people on @rust-lang/rustdoc think? |
Honestly? I don't expect there to be many consumers of the JSON data. I'd like to not have it be stable, but be careful about changes (and perhaps version the JSON data). This might be a controversion proposal. |
If we keep it permanently unstable, what's the point of the RFC? Or are you suggesting we stabilize the existence of a JSON backend but not the content? That doesn't seem very useful ...
Getting off-topic wrt the RFC, but I want to push back against this. Even the current half-broken implementation for ignoring type errors is straining the query system to its limits. Difficult is an understatement - you'd need to mangle typeck as badly as I mangled name resolution in rust-lang/rust#73566. It would be at least multi-month project. |
Yeah, we stabilize the existence of the backend but not the format. This can still be done in a useful way by effectively having a version field as metadata, and we can stabilize that. I'm wary of stabilizing the format with no upgrade path because this might preclude us implementing some rustdoc features in the future. |
Oh, I think I see. We stabilize the flag and say if this has a Yeah, that sounds good to me :) And maybe require an FCP for any new version of the JSON to make sure it's not changing needlessly. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Non-objection, but thought it might be worth noting: It may be worth investigating JSON-LD and its suitability with respect to this goal, especially where this may or may not concern docs.rs and rust-lang's hosted standard library documentation. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Merged, tracking issue at rust-lang/rust#76578 . Please help fill it in with steps! |
There's a way to use this data beyond documentation, to verify whether crate's public API adheres to semver compatibility: https://github.com/Enselic/cargo-public-api And since this is becoming the way to learn about crate's types, I wonder if it could be extended to also include information about the ABI (such as type sizes, alignment, field offsets). For libraries semver-compat of their ABI may be as important as API. |
ABI is inherently unstable and changes with new versions of the rust compiler. I don't think that information belongs in rustdoc. If someone wants to make a custom rustc_driver that emits ABI info, that seems fine. |
I realize that's a scope creep from perspective of rustdoc, but from perspective of a general-purpose compiler-info-exporting tool (that this is close to becoming), it's not a big stretch.
|
Also you can't rely on rustdo for that because it only provides the items which match the provided |
Of course, that's an inherent limitation of the language. The combination of cfg's can be arbitrarily complex and mutually-exclusive, so I don't think it's even possible to get information for all combinations at once. But it still makes sense to gather it in multiple passes for a set of supported platforms. |
(I do think there is a potential extension to (I also think we may need a holistic way to start talking about "public api" for both docs and diagnostics; e.g. there's a case to be made for |
This is not the right approach I think. Also, this is a limitation which appears later on in the compilation stages and rustdoc relies on them (the later stages), meaning that the information you're looking for has already been stripped. Getting this information is complicated but not impossible, however, rustdoc is definitely not the right solution for it. |
Is there anything that could consume this JSON? Would rustdoc itself be able to convert the JSON to HTML? |
There are currently 12 packages on crates.io that consume rustdoc JSON: https://crates.io/crates/rustdoc-types/reverse_dependencies And many more if you count dependents of dependents, private projects, projects that don't depend on |
AFAIK one of the main motivations would be alternate documentation viewers (e.g. terminal / nicely formatted LaTeX documents?). Personally, going out of the terminal to view documentation is a bit of a pain. Ideally I'd like a nice vim-keybinded TUI for quickly searching across a crate's documentation. I am thinking of creating one but I'm limited by this RFC. |
At Tauri we're also keeping a close eye on this. We bridge a lot of our Rust APIs over to JS using FFI so they can be accessed on either the JavaScript side or the Rust side. Since that means we would have a lot of people reading both JavaScript and Rust documentation at the same time we're looking to build our own reference docs on the website to make it easier for people to switch between the two languages while still having a similar UI/UX. |
Rendered
This RFC is an extension of one from 2018 started by @QuietMisdreavus. I've been working on an implementation with mentorship from @tmandry and @betamos and some of the necessary refactoring has already been done in coordination with @GuillaumeGomez.
I'm still working on the reference level docs (pardon the TODOs) but I'd love feedback on the proposed design and unresolved questions.