Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect push/pop order when combining meta scopes, content scopes and pushing multiple things #51

Closed
trishume opened this issue Mar 18, 2017 · 12 comments

Comments

@trishume
Copy link
Owner

This is the bug that is causing at least some, probably most to all of the ASP tests to fail (cc @keith-hall).

The model of the parser outputting only the pushes and pops doesn't hold up to well in some cases. @keith-hall fixed the case of the weird behaviour of set, but I found another.

Consider the following section of the ASP syntax:

  class_definitions:
    - match: '\b(?i:Class)\s'
      scope: storage.type.asp
      push: [inside_class, class_name]

  class_name:
    - meta_scope: meta.class.asp meta.class.identifier.asp
    # ...
    - match: '{{identifier}}'
      scope: entity.name.class.asp
      pop: true
    # ...

  inside_class:
    - meta_content_scope: meta.class.asp meta.class.body.asp

Sublime Text passes the following test:

' SYNTAX TEST "Packages/ASP/ASP.sublime-syntax"
 Class TestClass2 Public Sub TestSub () Response.Write("wow") End Sub End Class
'^^^^^ meta.class.asp meta.class.identifier.asp storage.type.asp
'      ^ meta.class.asp meta.class.body.asp meta.class.asp meta.class.identifier.asp
'                ^ meta.class.asp meta.class.body.asp

Notice how when the meta_content_scope of inside_class is introduced after Class, it is introduced under meta.class.asp meta.class.identifier.asp.

syntect does not do this. It pushes the inside_class meta_content_scope on top, but then after TestClass2 when it pops the class_name context it pops 2 scopes, thinking it is popping the meta_scope of class_name, but actually the inside_class scopes are on top of the stack.

I see two ways of approaching this:

  • Make pushes like set is in Keith's set branch where they pop the early meta scopes and then re-apply them in the right order.
  • Make a breaking change to the API and scrap the idea of the parser outputting operations. Have stacks be linked lists instead of vectors, and have the parser output regions of text with a scope stack, where the scope stacks share tails. Unfortunately this would require a bunch of reference counting. It may be faster, it may be slower, I'm not sure. It would also be a bunch of work.

I'm inclined to take the first route. My worry is that if done simply it will lead to a bunch of redundant popping and pushing the same thing again in the case when only one item is pushed. It's probably a good idea to have a fast path for that case, although it would probably require a bunch more code.

@trishume
Copy link
Owner Author

I've done some work in the fix-test-bugs branch that makes things better. It cherry-picks fixes from @keith-hall's set branch and then touches them up a bit.

It doesn't fix this issue but it includes the fix to set that a fix for this issue could be modeled on.

@keith-hall
Copy link
Collaborator

keith-hall commented Mar 19, 2017

Nice one for investigating this - I'm glad my branch was helpful. I also prefer the first approach because I like the idea of the parser outputting operations :) And it would sure be a shame to have to rewrite everything just for one edge case.

@trishume
Copy link
Owner Author

^ I pushed a commit that I think fixes this bug to my branch, but I don't know what the performance impact is, and it didn't fix as many tests as I thought.

It fixed one ASP test, and possibly some tests in other languages but there's still lots of other failures.

One thing I'm investigating now is that my model of how the syntax-level scopes are applied is borked. The ones that are done with things like scope: text.html.asp at the top level. I have them acting like meta_content_scopes on the main context, but that doesn't seem to be how Sublime does it. I haven't quite figured out what the behaviour is supposed to be yet.

@keith-hall
Copy link
Collaborator

From my testing, I think the top level scope behaviour is:

  • for the top level syntax, the scope is never popped off the stack unless clear_scopes is used
  • pushing (or setting to) another syntax's main context will cause that syntax's top level scope to be pushed onto the stack, until the main context is no longer on the stack - i.e. like a meta content scope - so what you have now is correct from that regard
  • including (as opposed to pushing or setting) another syntax's main context will never cause that syntax's top level scope to be pushed onto the stack
  • pushing, setting or including a syntax's non-main context will never cause that syntax's top level scope to be pushed onto the stack

@trishume
Copy link
Owner Author

trishume commented Mar 21, 2017

@keith-hall ok cool that matches my previous understanding except for the first point, which is the new thing I discovered. I'll need to think of how to implement that in a clean way, but it shouldn't be too difficult.

At the moment I can't think of anything nice though. The way I have it now is nice in that it is just like any other meta_content_scope, and I'm not sure how to cleanly stop it from being popped in only that case.

@keith-hall
Copy link
Collaborator

I wonder if you could cheat and not deal with it in the parser, but instead don't let the scope stack become empty from a Pop operation, always keep one scope.

@trishume
Copy link
Owner Author

@keith-hall that would possibly work in most cases but wouldn't work correctly in all due to the ability of clear_scopes to clear the topmost scope.

I think in terms of reductions it works more like there being two different main contexts: one which is referred to by other files which has meta_content_scope of the file type but otherwise the same as in the yaml, and another that is used for the top level file that has meta_content_scope of the file type, and an empty match pattern that pushes a new context that is the main context in the yaml file as is.

@keith-hall
Copy link
Collaborator

That's a good way of looking at it @trishume :) probably my idea would have caused duplicate scopes to appear if main is set back into ;)


in terms of other minor syntax test failures, interestingly ST is able to ignore empty subpatterns sublimehq/Packages#830
and I noticed that fixing this LaTeX pattern to use a + instead of a * prevents the test from failing in syntect

apart from that, all other syntaxes except ASP and PHP are passing with flying colours, so great work! (I didn't look into the Markdown failure, but it was a nasty syntax with \G all over the place, and has been vastly improved in the Default Packages repo master branch so I'm not worried about it.)

@trishume
Copy link
Owner Author

Hmmm that's interesting, I wonder how Sublime does that ignoring empty patterns thing. It's possible the Rust and LaTeX cases you mention are separate mechanisms, but I think they might be the same:

  • Somehow ST gets the Regex engine to not match the empty string unless the pattern is the empty string. I get how they could do this with their engine, but do they manage to get Oniguruma to do it? If you add some backrefs and lookaheads to force Oniguruma does it continue to ignore empty matches?
  • It ignores empty matches of non-empty patterns, which effectively means that pattern is always ignored and never triggered because it will always prefer matching empty unless it is greedy.
  • Just regex parsing pattern rewrites, replace || with | when not in a character set. Replace * with + when it is applied to the only thing in a pattern.

@keith-hall
Copy link
Collaborator

I tried changing the LaTeX pattern [A-Za-z[:digit:]-]* to (?<=.)[A-Za-z[:digit:]-]*, and the Rust pattern \.\.\.|\.\.||[-=<>&|!~@?+*/%^''#$]|\b_\b to \.\.\.|\.\.|(?<=.)|[-=<>&|!~@?+*/%^''#$]|\b_\b and the syntax tests still pass in ST, even though it uses Oniguruma.
This would therefore imply it is not as simple as replacing || with | when not in a character set (and ignoring \||). I could understand if ST skipped empty matches that don't change the stack, but to skip an empty match inside a pattern and use an alternation that comes after the empty match intrigues me. Unless Oniguruma has an option to always try to find the longest match? Otherwise, I would expect such match patterns to reduce parsing performance.

Is this behaviour something that syntect should try to emulate? I would argue that it isn't that important - very rarely are match patterns written like this, and I've fixed the Rust one, and can submit a PR to fix the LaTeX one too if we want.

@keith-hall
Copy link
Collaborator

I've been looking at the ASP syntax a bit more, and I can't work out how it works in ST... And yet, I'm the one that wrote that syntax definition!

Specifically, this match pattern never matches in syntect, because the with_prototype from the HTML-ASP definition takes precedence, which makes sense.

So I had a go at re-writing the syntax def to avoid the prototype conflict, but unfortunately the result in syntect is the same. The only changes are to the `inside_block` and `html_inside_block` contexts and with a new `resume_inside_block` context. Click on this paragraph to see the new end of the `ASP.sublime-syntax` file.
  inside_block:
    - match: '%>'
      scope: punctuation.section.embedded.end.inside-block.asp
      push: [resume_inside_block, html_inside_block]
      with_prototype: # the with_prototype only applies to the last pushed context i.e. html_inside_block - https://forum.sublimetext.com/t/dev-build-3111/19240/17
        - match: '(?=<%)'
          pop: true

  resume_inside_block:
    - match: '(?=<%)'
      set:
        - match: '<%=?'
          scope: punctuation.section.embedded.begin.inside-block.asp
          pop: true
      # with_prototype: # in ST, it works without this prototype. In syntect, it still doesn't work even with this prototype...
      #   - match: '<%=?'
      #     scope: punctuation.section.embedded.begin.inside-block.asp
      #     pop: true

  html_inside_block:
    - clear_scopes: true
    - meta_scope: text.html.asp
    - match: '\s+$' # eat whitespace so that the lookahead on the next match pattern can match the next line if appropriate
      push:
        - match: ^
          pop: true
    - match: '(\s*")?(?=[^<>]*>|\s+\w+=\s*")' # if the next occurrence of a < or > is a >, we are inside a tag. If it looks like an attribute is being defined, we are probably in a tag
      scope: string.quoted.double.html punctuation.definition.string.end.html
      push: # use a push and an include so that the root scope (text.html.basic) isn't applied
        - meta_scope: meta.tag.after-embedded-asp.any.html
        - include: scope:text.html.basic#tag-stuff
      with_prototype:
        - match: '(?=<%)'
          pop: true
        - match: '>'
          scope: punctuation.definition.tag.end.html
          set: scope:text.html.asp#html
          with_prototype:
            - match: '(?=<%)'
              pop: true
    - match: ''
      push: scope:text.html.asp#html

So it seems we're missing some special rules that ST has related to with_prototype handling. I've proved that meta_include_prototype doesn't affect with_prototype, so it's not that...

@trishume
Copy link
Owner Author

@keith-hall weird. Good detective work.

One thing I might try to do is disassemble Sublime and see if I can figure out how it calls Oniguruma functions to see if it is using some fancy technique I don't know of. I may also re-read the Oniguruma docs to find all the options.

As for with_prototype the fact that the current ASP syntax works in Sublime is super weird and conflicts totally with my understanding of how with_prototype should work. I wonder if it has something to do with clear_scopes, maybe it also clears prototypes? I'm no longer sure at all that I understand how with_prototype should work, I'll have to do some experimenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants