Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the last few syntax tests #59

Open
3 of 4 tasks
trishume opened this issue Apr 4, 2017 · 10 comments
Open
3 of 4 tasks

Fix the last few syntax tests #59

trishume opened this issue Apr 4, 2017 · 10 comments
Labels

Comments

@trishume
Copy link
Owner

trishume commented Apr 4, 2017

After #55 fixed most syntax test exposed bugs, there's a few left. Now the syntax tests for almost all languages pass perfectly. The following languages still fail:

If you plan on attempting to diagnose or fix one of these bugs, create an issue for it and reference it here. I will do the same.

I don't foresee myself personally doing much work on this for the next little while, but I will review code and discuss and may get around to it eventually.

@keith-hall
Copy link
Collaborator

I'm investigating the Markdown test failure under #62

@keith-hall
Copy link
Collaborator

I've logged a new issue at #104 which details why the ASP syntax tests are still failing, and an additional bug which doesn't show up in the official ST default packages repo.

@keith-hall
Copy link
Collaborator

I think the reason why the LaTeX syntax has a failure in syntect is due to how ST and syntect optimize the matching - ST seems to check "does this regex pattern match at this point", and if no patterns match, it advances the "point" to the next character and does all the work again, whereas syntect caches the next match position for each regex. When there are dodgy regex patterns in the syntax definition that consume no characters, syntect doesn't see that actually there is a match further along that will consume characters, and instead just moves on to the next pattern match that "does something".

i.e. we can see from the operations that the pattern that matches the empty string matches before the close brace pattern that pops the context

blank match `[A-Za-z[:digit:]-]*` at column 27 on line `\usepackage[args]{mypackage, anotherpackage}`
\usepackage[args]{mypackage, anotherpackage}
^ +meta.preamble.usepackage.latex
\usepackage[args]{mypackage, anotherpackage}
^ +keyword.control.preamble.latex
\usepackage[args]{mypackage, anotherpackage}
^ +punctuation.definition.backslash.latex
\usepackage[args]{mypackage, anotherpackage}
 ^ pop 1 [<punctuation.definition.backslash.latex>]
\usepackage[args]{mypackage, anotherpackage}
           ^ pop 1 [<keyword.control.preamble.latex>]
\usepackage[args]{mypackage, anotherpackage}
           ^ +meta.group.bracket.latex
\usepackage[args]{mypackage, anotherpackage}
           ^ +punctuation.definition.group.bracket.begin.latex
\usepackage[args]{mypackage, anotherpackage}
            ^ pop 1 [<punctuation.definition.group.bracket.begin.latex>]
\usepackage[args]{mypackage, anotherpackage}
                ^ +punctuation.definition.group.bracket.end.latex
\usepackage[args]{mypackage, anotherpackage}
                 ^ pop 1 [<punctuation.definition.group.bracket.end.latex>]
\usepackage[args]{mypackage, anotherpackage}
                 ^ pop 1 [<meta.group.bracket.latex>]
\usepackage[args]{mypackage, anotherpackage}
                 ^ +meta.preamble.usepackage.latex
\usepackage[args]{mypackage, anotherpackage}
                 ^ +meta.group.brace.latex
\usepackage[args]{mypackage, anotherpackage}
                 ^ +punctuation.definition.group.brace.begin.latex
\usepackage[args]{mypackage, anotherpackage}
                  ^ pop 1 [<punctuation.definition.group.brace.begin.latex>]
\usepackage[args]{mypackage, anotherpackage}
                  ^ pop 3 [<meta.preamble.usepackage.latex>, <meta.preamble.usepackage.latex>, <meta.group.brace.latex>]
\usepackage[args]{mypackage, anotherpackage}
                  ^ +meta.preamble.usepackage.latex
\usepackage[args]{mypackage, anotherpackage}
                  ^ +meta.group.brace.latex
\usepackage[args]{mypackage, anotherpackage}
                  ^ +support.class.latex
\usepackage[args]{mypackage, anotherpackage}
                           ^ pop 1 [<support.class.latex>]
\usepackage[args]{mypackage, anotherpackage}
                                           ^ +punctuation.definition.group.brace.end.latex (NOTE: the `,` is col 27)
\usepackage[args]{mypackage, anotherpackage}
                                            ^ pop 1 [<punctuation.definition.group.brace.end.latex>]
\usepackage[args]{mypackage, anotherpackage}
                                            ^ pop 2 [<meta.preamble.usepackage.latex>, <meta.group.brace.latex>]

I know we've already agreed that the syntax definition should be fixed and not syntect, and I just want to state that I still strongly agree with this, I just wanted to share more detail about why this difference arises - it seems silly to remove our optimization or have extra code for working around poorly coded regex patterns.

@trishume
Copy link
Owner Author

trishume commented Dec 1, 2017

@keith-hall Interesting, thanks for the investigation.

I have a hope that perhaps in a few places Sublime uses a different model than syntect and that if we figure out that model and switch to it our code will get cleaner, more correct, and also possibly faster. For example #124 is much better now with the new comments, but it still makes me uneasy that all that special-case logic is necessary, it makes me wonder if Sublime does things in a way that doesn't require a bunch of special-cases. This will probably take a bunch of thinking and testing though, which I don't have time for yet, probably not till at least late December. The method of using "does this regex match at this point" that you mention sounds to me like it would be slower, but I can see how I might be wrong and it might be either just as fast or faster.

Ideally I think we should still strive for compatibility instead of just fixing the upstream packages, because it increases the likelihood of third party syntaxes working without modification, but it's definitely easier to fix upstream and perfect compatibility is a lot of effort that at least I personally have limited time to work on.

@keith-hall
Copy link
Collaborator

My knowledge of how Sublime works internally is of course not really knowledge at all but educated guesswork, based on my experiences and some posts made by sublimehq - that said, I know that Sublime, when reading syntax definitions, creates a "regex cache" file, which contains all regex patterns in the syntax definition. However, there are some patterns it can never cache - and those are the pop patterns that make use of backrefs. So I'm pretty sure that Sublime doesn't "special case" it - whenever a "target" context is pushed from a match pattern that had capture groups, it most likely stores the text that was captured and at that point in time / for that parse state: "injects" those into any pop patterns in the root level of that context that use backrefs. Therefore it it would be the same code for with_prototype as well as push/set. And because it flattens all regexs into a cache, it maybe doesn't have to worry about includes needing separate logic (#104), as the patterns from those contexts are just seen as being direct children of the context.

I'd also be interested to see how performance would differ for the point by point approach.
Compatibility should of course be an end goal, but for edge cases like this I would personally prioritize clean code and performance for now :)

@jskinner
Copy link

I haven't looked at Syntect at all, but FWIW Sublime Text does cache the next matching position for each regex (at least when we're using the Onigurama fallback, there's no need for this with sregex).

Only regexes associated with push, set, and embed actions are allowed to match against zero length input, other actions use ONIG_OPTION_FIND_NOT_EMPTY.

@keith-hall
Copy link
Collaborator

Thanks Jon, useful bit of info to know 👍 that should resolve the differences in how syntect parses LaTeX compared to ST 🎆

@Enselic Enselic added the bug label Nov 11, 2021
victor-gp added a commit to victor-gp/cmd-help-sublime-syntax that referenced this issue Mar 6, 2022
To prevent the detection of plain explanation lines as section headings,
caused by the previous 2 commits.

40 chars are a lot but some section names are really long, e.g.: grep's
"Pattern selection and interpretation:" (37 chars).

I was stuck here for a long while cause I tried to write the
length-measuring lookahead like '^(?=.{,40}\n)', where {,40} is a
quantifier that means "at least 0 but not more than 40 times" according
to the Oniguruma regex reference [1].

But it seems that matching zero length inputs is not allowed in some
cases [2], so {1,40} is the right way to write it.

[1] https://raw.githubusercontent.com/kkos/oniguruma/v6.9.1/doc/RE
[2] trishume/syntect#59 (comment)
@CGMossa
Copy link
Contributor

CGMossa commented Nov 27, 2023

I'd like to ask, is this part of fixing these tests:

failures:
    highlighting::highlighter::tests::can_parse
    highlighting::highlighter::tests::can_parse_with_highlight_state_from_cache
    highlighting::highlighter::tests::test_ranges
    highlighting::theme_set::tests::can_parse_common_themes
    html::tests::strings
    html::tests::tricky_test_syntax
    parsing::parser::tests::can_compare_parse_states
    parsing::parser::tests::can_parse_backrefs
    parsing::parser::tests::can_parse_includes
    parsing::parser::tests::can_parse_issue25
    parsing::parser::tests::can_parse_preprocessor_rules
    parsing::parser::tests::can_parse_simple
    parsing::parser::tests::can_parse_yaml
    parsing::syntax_set::tests::can_load

Because these currently fail on main.

@keith-hall
Copy link
Collaborator

"syntax tests" refers to http://www.sublimetext.com/docs/syntax.html#testing, specifically the test files in the sublimehq/Packages repo which syntect has as a submodule. All Rust tests should be working...

@CGMossa
Copy link
Contributor

CGMossa commented Nov 27, 2023

Thanks! I believe what happened was that locally, I didn't run

git submodule init
git submodule update

That helped with a bunch of tests; I'm sorry for diverging the discussion here. Although, I think this little instruction should be available somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants