-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON comments-- again #1571
Comments
Well, if I absolutely had to do this, I guess I'd:
Something like:
You'll notice right away that this isn't good enough. What about an input like Ok, so the "hidden" comments object/array needs to store arrays of comments. Oof. I mean, it's getting complicated. Can we simplify? Do we even have a specification for what JSON-with-comments is? Are we sure that whatever that spec is, no one will come along and ask for some variation we didn't think of? Where do we stop? Next issue: what if we need to select an input format per-input file? Do we want to derive the format from a file extension? Now things are really getting complicated. It's kind of a morass. If you want to flesh this out and send a design and PR, we can look at it and maybe compromise. But I have no energy and no desire to do any of this work at this time, so unless you're willing to put in the work knowing that we might ultimately reject it, well, then I think jq is just not the tool for you unless your external tool can get good enough to deal with it. Can you just... bug people who are using JSON-with-comments and get them to stop and adopt a schema that embeds commentary more like:
|
Moar crazy inputs:
|
Will respond re technical question separately. Re the question of whether we can bug people to incorporate comments into "proper" JSON: I understand that not including comments in the JSON spec is not a jq decision. However, to instead bug everyone to use the workaround, that's kind of like saying to all your programmers: "our compiler doesn't support comments, because the people who designed C++ decided not to, so when you write that code please put all comments into dummy string variables." Totally doable, and reasonable given the spec-- but sometimes the specs leave a lot of value on the table. Just like compilers often implement extensions that are not required by the corresponding programming language's formal spec (and which often subsequently then get incorporated into the spec), I'd think there's much value in having jq offering a "comment" extension even though it's not part of the JSON spec. For example, consider the below-- which is a poor example that I only gave 30 seconds of thought to but hopefully is illustrative enough-- and how much the use of the fake comment approach severely waters down the value of comments. This commented version:
is just a WHOLE lot more presentable and understandable than:
One of these (the former) is clear enough to put into technical documentation, verbatim. The other is not, and you can't really say to your technical doc reader, "well the reason we do this weird comment thing is that the JSON spec doesn't support any other way, so the onus unfortunately gets passed on to you to suffer the consequences". If we forget about decisions of the past and ask ourselves: when we want to communicate to another human being what our JSON is doing, would there be a better way to do so than using JSON with some comments in it? Maybe yes in many cases, but for many many cases, personally, I would say the answer is no, and I would guess that many would agree, for the same reason that we see (and that best practices recommend) code snippets everywhere that have embedded comments. |
@mewalig I'm sorry but that response is not helpful. JSON is a standard. There's NO standard for JSON-with-comments to my knowledge. A helpful response would point to such a standard. Second, people who make up their own JSON extensions and then rely on them are making a terrible mistake if they don't provide a way to get valid, unextended JSON out. Third, JSON is NOT a document format. I would not advise editing JSON-encoded data by hand. I don't really care how ugly it looks to express comments without extensions -- it's the UI that matters. If JSON is your (or some third party app's) UI, well, I guess that's a problem, but why should be everyone else's problem? I'm still open to considering something here, but nothing without a spec, full stop. |
Hold your horses please, I said at the top of that response that I would respond to the technical aspect separately |
Regarding technical comments: First: "can we simplify"? absolutely. I think even the following fairly narrow scope would be immensely useful:
*At least initially, though it wouldn't be that hard to add support for this if the rest of the proposed scope was already working Note that what I am proposing is that comments can be attached not just to dictionaries or arrays, but also to terminal values. Furthermore, comments do not attach directly to a dictionary or array, but rather, attach to the opening or closing bracket of the dictionary or array, or to one of the items in the dictionary or array. Generally speaking, the guiding principle I have in mind is that the comments would attach in a 1-1 fashion (to the extent a corresponding comment exists) with each line in a pretty-printed output. It would be conceivable to allow a comment to attach to a dictionary key name as well, but I don't think that's necessary-- certainly not for an initial iteration at least. So using above examples:
would be unacceptable because "comment 2" and "comment 1" are back-to-back (just to clarify-- I don't think we can't support this later-- only that for the initial, narrow scope, we make a decision not to) However,
would yield:
Similarly,
would be unacceptable input because of the back-to-back comments, but
would yield:
In addition, I may be able to make these changes to the jq code base, but I'd sure prefer to know before putting that time in that there is at least some sign of possibility of it being merged into the official base. At the least it would not be hard to describe the above in a formal grammar. |
Re your response: "A helpful response would point to such a standard". Please let me know if you think this point is not already addressed. I've provided a description of a standard and would be happy to provide a formal grammar for it if it looks like there is a reasonable possibility of it being used. "Second, people who make up their own JSON extensions and then rely on them are making a terrible mistake if they don't provide a way to get valid, unextended JSON out." I am not suggesting that, but rather that jq could parse C/C++-style comments and then have an option to include or not include them in the output "I don't really care how ugly it looks to express comments without extensions -- it's the UI that matters. If JSON is your (or some third party app's) UI, well, I guess that's a problem, but why should be everyone else's problem?" I am not suggesting we use JSON as a UI, but rather that it would be valuable to support comment parsing for the same reasons that, though we don't use C as a UI, it's still useful to support comments in C, and same for HTML. Furthermore, a UI is different from technical documentation and code snippets. As an example, for the same reason code snippets that contain comments have value, JSON snippets that contain comments have value, and are more valuable when the comments do not need to be manually removed in order to use the snippet. Sure we can use Swagger or OpenAPI etc to describe JSON structure but it isn't the same as seeing some samples with comments. Just to reiterate: I am just stating my opinion on how jq could be more useful to a wider range of circumstances and users, and how that might be implemented. Please do not take these suggestions as anything more than a desire to be helpful. |
Ok, thanks for addressing the technical issues. I'm not interested in attaching comments to tokens. The problem is that that makes the internal representation of all values "fatter", whereas for objects and arrays it's not so bad because they already have a lot of overhead. So comments have to only be attachable to "slots" in objects and arrays, and perhaps the whole object/array itself. This means: no top-level comments. That's the furthest I'd be willing to go. But this is not very general, and so it's just ugly, so I think "never mind". How about this: an JSON-with-comments transformer that lifts the input schema to make room for comments and produces an equivalent output in a new schema with comments as proper JSON values. E.g.,
or maybe use objects:
|
I think that could be useful if it could convert both ways e.g. A-with-C-comments => B-with-JSON-comments can be reversed back e.g. B-with-JSON-comments => A-with-C-comments (or a filtered version of A). If we only supported comments attached to objects, then, is there an efficient way to attach a comment to any individual element at least of a dictionary? For arrays, it's a bit less of an issue because it's common for each element of an array to be the same type, so a single comment would apply equally to every element-- but the same isn't true of dictionaries. Another approach that might have the benefit of not fattening any structures, not impacting performance except when the option is used and lots of comments are present, and might also work with individual tokens, might be to store comments in a separate heap, with a 1-way reference to the token it attaches to, and whenever a token is output, the comment store is checked for a match. Obviously, doing a lookup on every token print scales terribly; however, I think we get lucky here because the use cases where one wants to process JSON that contains comments, versus one where one wants to process large quantities of JSON that contain large quantities of commments, are probably almost always mutually exclusive, and the exceptions (I would be surprised if any exist in the entire world) are simply not what this would be intended to address. So it would be entirely reasonable limit the target scenario to small data sets, and if implemented as a rbtree the performance would probably be unnoticeable for any real-world use cases. Meanwhile, the impact would be nil for JSON that does not contain comments (or when the use-comments option is turned off). Also, it could be a pretty clean implementation, and none of the existing data structures need to be touched-- would just add the comment to the global store when parsed, and look for the corresponding token in the global store when printing. But if multiple filters run on separate threads or otherwise would cause issues in addressing shared memory, I can see how this could be problematic |
jq does not keep track of "tokens" once a The idea for hiding comments in an object was to have them in a separate object hidden inside the real one, and accessible with some special function. It's very tricky though. Lots of generality gets lost. E.g., the ability to find paths to values would not extend to comments, nor would the ability to use assignment operators. It'd be much to easy to lose track of comments too: I think I'd rather have a filter (possibly as a parser inside jq!) that converts JSON-with-comments to JSON with lifted-schema as shown earlier. Such a filter would, of course, be reversible. It would complicate jq programs dealing with such inputs, but I don't mind that, and the complication would not be a big deal, and no generality would be lost. |
ok, I think I understand generally but the specifics in terms of how it would affect input/output are hazy (due to my own limitations). So jq would output all comments as its own isolated object, and if you wanted to re-incorporate them, you would save that output as a variable, and then use another jq filter to re-combine the comments with the output? Rather than me try to understand every aspect of what happens in-between, maybe easiest to describe a few input/output scenarios. For the moment I'll use
to yield:
to yield:
Would this be achievable with the approach you have in mind? |
With the schema-lifting format converter you could put comments just about anywhere and yet manage to represent where they were in the input (though maybe not line number and column number). And you could print out the same way. One way to use this might be:
These examples assume no new command-line options to jq, just jq functions to do parsing and formatting of alternative encodings than JSON. This would extend to YAML, XML, ... Each non-JSON format would have to have a corresponding schema to represent the external data. For example, XML would parse into arrays where each element is an object with the following keys: For JSON with comments the schema would be roughly as described above, with every value in the original wrapped in an object that contains the value in a |
I see, that sounds interesting... so the schema-lifting functionality would be some sort of limited parser generator that, based on the input grammar (schema), generates a parser on the fly to then parse a non-JSON (presumably UTF-8) input into a JSON structure, and could apply equally well to commented JSON as well as XML YAML etc? |
Yes. I think that's cleaner. It would be nice to be able to process XML in jq -- jq as a pithier XSLT/XPath, what's not to like? |
That sounds very cool. Out of curiosity, do you already have a syntax in mind for defining the schema or an existing parser generator that you would use? |
@mewalig I don't yet have a way to express schema designs, not at this time anyways. |
@mewalig Suggestions as to how to express such a schema would be welcomed! |
I initially had a very long response to this. But my thoughts kept changing as I tried to translate them into pseudo-code. In the end, I think that the best way to really figure this out would be to take 2-4 target use cases, such as pure JSON, XML, YAML, and commented-C, and flesh out exactly how they might be represented, and compare / contrast based on those examples. It is too hard (at least for me) to consider all of the potential issues and solutions in my head-- I have to get it out in writing with actual examples to arrive at an opinion. Let's start with pure JSON, which definitely should be within scope. At the least, this should flush out the minimal set of parse rule related actions that would need to be supported. For no other reason than that I personally like bison grammar, I have used something like that below. Note that I have left out the lex rules for STR and NUMBER. As you must know given that jq has to parse UTF-8 and JSON number syntax, the rules aren't complex but they also can't be expressed in a single rule. I'm not sure what the best approach would be for this, but one simplification that I think is reasonable would be to only support grammars that parse UTF-8 input.
|
Not really a "standard", but some useful links:
This format is widely used within VS Code, which is quite useful since you can annotate your config file with comments. Indicated by this GitHub linguist PR, it's also used by other tools like (I can live with it not being first-class-handled by |
Why can't |
Just leaving this comment so I get notified of further discussion, so that when the inevitable comment-supporting fork of |
cc me |
+1 |
Got tired of waiting. Here's a pull request that simply strips comments of the form #, // and /*. I'm 99% sure I missed something because otherwise that was too easy.
yields
Or this:
yields
|
@MikeTaylor there it is |
@liquidaty wrote:
Without meaning to imply anything about the prospects of your contribution being accepted(*), one thing to consider is whether there should be an option to turn this functionality on or off. Backward compatibility and perhaps other considerations would argue for the default behavior not to change. Another issue is what the flag should be called. "--strict" is tempting, but given that your contribution does not address the known strictness issues, "--strict" might be premature. -- |
@pkoppstein I totally agree that it would be better to make this an option, and I think it would be easy to do so and would be happy to make that further enhancement. I just wanted to first get a sense of whether there was in fact any chance of it being accepted-- in either form-- before going that extra step. Personally, I think Agreed re maintenance being lacking. We already use our own a |
@liquidaty - I'll confine this post to the proposed "--allow-comments" enhancement. Without some kind of positive signal from @stedolan or one of the maintainers, I'd be quite pessimistic about any enhancement of the type envisioned here being incorporated into the official jq repository in the near term. However, I would suggest continuing the discussion with a view to an enhancement being incorporated into a fork or gojq. One point of discussion is consistency between jq's two JSON parsers (i.e. the regular one, and the "streaming" parser). Another point of discussion concerns comments in JSON within a jq program. Currently, I believe it's the case that any JSON that is accepted as input to a jq program will also be accepted as a jq program, or as a component of a jq program. It might be tricky to preserve consistency (between all three parsers) if only because any changes to parser.y can be tricky. In fact, allowing '//'-style comments within such embedded JSON is problematic in principle, because of the inherent ambiguity this introduces in expressions like:
(This could be interpreted as being equivalent to the jq expression Agreed, it might be best to circumvent the issue by simply leaving the existing jq parser as-is; in this case, I believe one should nevertheless ensure that the three parsers handle "JSON-with-#-comments" in the same way. But perhaps that is already the case? |
@liquidaty - Thanks for offering to help support a well-maintained "fork" of jq. The last "commit" to the jq repository by a maintainer appears to have been in May 2022, and to my knowledge, the maintainers have not given any encouragement about maintaining jq for quite some time. It seems to me that unless such encouragement is offered in the near term, we've just about reached the point where it would indeed be reasonable to create a "successor fork", that would only incorporate changes that are made with a view to having them incorporated in the stedolan repository. Ideally, such changes would also either reflect changes that have already been adopted in gojq, or would be acceptable to @itchyny for potential adoption in gojq. Obviously, maintaining jq at a high level presents a wide variety of challenges. Perhaps the place to start is to ensure that the initial team of official maintainers has all the required technical skills? cc: @stedolan @nicowilliams @wtlangford @dtolnay @leonid-s-usov @wader @tst2005 |
@pkoppstein thanks for your comments. This response is to the first of your last two posts.
I'm not that familiar with the jq internals but from what I can tell, both cases are handled properly. The modification, which is made in All that said, if the PR was accepted, I think it would be extremely important for at least some basic tests to be added to verify it works as expected (and for those tests to be part of CI)
TL DR now this is handled. It's not clear to me whether this should be treated as a separate issue, but I suppose it would be confusing otherwise so let's just agree it's one and the same. I've updated the PR with a tiny (4-line) change to lexer.l that adds C-style comment stripping (obviously, this could also be subject to an opt-in option) (the updated PR also includes the related derived changes to lexer.c and lexer.l). As for C++-style comments, it seems to me that the JQ command syntax is fundamentally incompatible. So, we could i) just never strip them, ii) only strip them outside of a JQ command, or iii) maybe only strip them if inside a JSON component of a JQ comment. My own thoughts are that ii) is preferable and this is what the PR reflects, though this subtlety isn't as important to me as the overall topic of "let's support some form of JSON comment stripping" and I'd defer to whatever the more popular view was on this subtlety. |
Perhaps we could compromise with a CLI flag to specify the precise format. We continue defaulting to parsing with the classic, rudimentary JSON AST. If the user adds a flag like Unfortunately, there don't appear to be many JSON5 transformation CLI or formatting tools. jq could be extended to be that tool. |
@mcandre I got tired of waiting for this and wrote a minimal json5 to json transformer a while ago. YMMV https://github.com/olix0r/j5j |
@olix0r , @mcandre , @nicowilliams , @pkoppstein, [update: adding @itchyny] Can we take a first step forward on this and commit to having something in 1.7? #2548 for example is a fully-QC-test-passing (at least as of a few months ago), super light / non-invasive update that moves the ball forward at least on a portion of this collection of issues that is very cut, dry and non-controversial. |
Yes! Please! |
@liquidaty can you rebase that PR? |
sharing my basic version, https://gist.github.com/metaory/70f15c242ca5c423638b6fcea843171d Caution full spec is NOT covered Basic JSONC ParserNote it removes
Caution its NOT covering
TL;DR
|
@metaory I do not believe a regex can ever fully work to parse comments. Using yours for example, this fails: Anyway a long time ago I submitted a fully-functional PR with tests and pretty much everything else needed to be drop-in and after multiple rounds of making every requested adjustment to have it merged in (other than the last one which I'm about to respond to), it hasn't gone anywhere so good luck. |
@nicowilliams Thanks for asking but no, I'm not going to, and here's why:
To be clear, I am more than happy to help as much as I can if a decision-maker is driving the process. Alternatively, if a core maintainer is willing to commit in writing that "Yes, if you do this, the PR will be merged into the next release"-- then I will reconsider. Until then, I've grown weary of jumping through hoops with no end in sight. |
I rather a solution that can be a single line and covers my requirements over something that's +100 lines and covers the full spec I already stated it doesn't cover the full spec, in most cases we just want single line comments, |
I'm not sure if something that can cover most cases in a single line sed needs to be part of jq, |
thanks I've updated the readme, honestly most cases can be a single line sed, for anything more accurate and spec compliant, pipe it in from one of many well known spec compliant parsers |
Arguably, many uses cases of |
converting something that is NOT JSON to JSON should be on jq as well? this is exactly that. not far from yaml, toml, and so on |
well...
From https://json5.org/ So, Chromium, Apple etc. are supporting it, but hey, @metaory knows better. |
@SchulteMarkus not sure where you're reading that "jq knows better"? i think there is an agreement amongst the current maintainers that support JSON5 comments in input JSON ( |
I am very sorry, I thought @metaory would be a member of jq (I updated the original comment). |
not super thrilled with mentions in this context! personally I live by the Unix philosophy, and have seen many projects slowly broadening their scope, losing composability having said all that, I agree, I might have been too far, supporting JSON5 comments is not as far as YAML/TOML. JQ ftw 🔥 |
Yet @SchulteMarkus brings up an entirely valid point. The initial strong objections from the It is silly for the jq project to continue to delay catching up, especially for this particular issue when the work is already done, with the effect of leaving |
The jq tool is following the spec too closely. I don't agree with the spec. But following the spec can help to reduce breaking changes. jq is one of few tools that can serve as syntax validating linters to catch data bugs before they hit production. We have Crockford's unhelpful opinionated spec to thank for that decision so long ago. It's well and past time for the industry to adopt JSON (5) and ditch the original spec. Microsoft has already done this, allowing comments in VSCode JSON configuration files. |
In my opinion, the friction comes from many users (like me) perceiving There's a [closed] VScode issue to request moving it's config from To me, comments are typically the MOST IMPORTANT information any file can contain, because the comment was written specifically for human consideration. So if a jsonc/json5 input file contained comments, I would expect the comment to always be output, and optionally silenced via some extra flag. A quick look at the I did see that there's is a public domain C library for parsing JSON/JSON5. https://github.com/WaterJuice/JsonLib Super interesting problem! For now I just prefix the key name with
|
I dont think VSCode and Microsoft is in anyway a role model to look up to... |
Yes Windows is the mainstream choice as well, doesnt make it the industry gold standard |
If someone is interested i hacked together a JSON5 implementation for jq here https://github.com/wader/json5.jq. It's mostly a stripped down and modified version of jqjq. |
I know this issue has been discussed before and the view of the jq team is that this should be in an external tool. I've written a poor external tool to show that some of this can be done externally, but it runs into limits that I don't believe can be overcome as an external tool (without re-implementing all of jq which isn't really an external tool then). So, I'm opening a new issue to specifically discuss:
The two reasons I wrote the external tool were a) to provide a quick and dirty hack for limited use cases and b) to explore the limits of what can be done using an external tool.
I understand there are differing views on whether JSON should support comments at all. This discussion is targeting use cases where we have already decided that having comments in JSON would be useful and are now evaluating what options are available to support them.
There are many characteristics and flavors of comment support, including:
Regarding comment placement and existence, I would propose that the comment is "attached" to its immediately-preceding token in the input JSON. If/when that corresponding token is output, the comment is output with it. This would apply when the JSON output is filtered as well.
As for pros/cons, I think the most important "pro" is to keep jq focused strictly to the JSON spec (and therefore keep the code base etc streamlined), and the most important "cons" are:
I have posted a partial proof-of-concept tool at https://github.com/mewalig/json-comments. All it does is strip comments out, save them to a file, and then add them back in. In between those steps, jq can be used to pretty-print the output, so the end result can be pretty-printed output with comments. Because it is proof-of-concept, it only supports ASCII and C++-style comments, but those limitations are easy enough to remove and what I'd classify as trivial.
The most important limitation is that it only adds comments back into equivalent JSON. In other words, it can't be used with a jq filter other than e.g. "."-- obviously this is a big limitation.
Which brings us to the final topic: limitations of an external tool and potential solutions. Some potential solutions:
struct jv
is augmented to support comment information associated with that valueIt has been proposed in other discussions to use an external tool, but I don't see how that could possibly work (or be less work) given the above issues, unless we are content to give up the combination of filtering + comment support.
Thoughts?
The text was updated successfully, but these errors were encountered: