Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use restricted Damerau-Levenshtein distance for diagnostics #108200

Merged

Conversation

jhpratt
Copy link
Member

@jhpratt jhpratt commented Feb 18, 2023

This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a restricted implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.

This was first brought up on IRLO when it was noticed that the diagnostic for prinltn! (transposed L and T) was print! and not println!. Only a single existing UI test was effected, with the result being an objective improvement.

I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.

r? @estebank

@rustbot label +A-diagnostics +C-enhancement

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Feb 18, 2023
@est31
Copy link
Member

est31 commented Feb 18, 2023

Great PR, when reading rustc's suggested names, I always felt like there was room to improve.

cargo also uses the same distance algorithm, stored in src/cargo/util/lev_distance.rs. Maybe after this PR is merged one could think about filing a PR there as well.

@steffahn
Copy link
Member

The documentation of lev_distance_with_substrings still talks about ordinary “Levenshtein distance” even though it calls the lev_distance function whose implementation this PR changes.

@jhpratt
Copy link
Member Author

jhpratt commented Feb 18, 2023

@steffahn I'm thinking about just changing it to be a generic "edit distance" reference. There's no reason the caller should care about the exact algorithm, after all.

@rust-cloud-vms rust-cloud-vms bot force-pushed the restricted-damerau-levenshtein-distance branch from 983e522 to 20282c1 Compare February 19, 2023 04:23
@jhpratt
Copy link
Member Author

jhpratt commented Feb 19, 2023

Addressed all comments so far. I moved the functions and modules to correspond to a general "edit distance" rather than Levenshtein or Damerau-Levenshtein. In doing so I left a comment about the current implementation for anyone reading to know what it is.

One thing I ran across in doing this was a diagnostic for macro_rules typos. It had a limit of 3, which I've reduced to 2. I did this because the original issue was regarding a transposition, which changes the distance from 2 to 1 with this PR.

@steffahn
Copy link
Member

Just so I understand your comment correctly, you are saying that a limit of 3 instead of 2 for macros would produce only (or mostly) unhelpful additional suggestions, thus 2 is better?

@jhpratt
Copy link
Member Author

jhpratt commented Feb 19, 2023

Note: this isn't for macros, it's a special diagnostic for macro_rules (as it's a contextual keyword). The original issue is #91227, with #91337 being the original implementation. The issue would be solved with 2 (current) or 1 (with this PR), but one was added to that.

I don't have strong feelings on it either way; it merely seemed appropriate.

@steffahn
Copy link
Member

Ah, thanks for the clarification and links. I only wanted to understand what this was about.

@tmiasko
Copy link
Contributor

tmiasko commented Feb 19, 2023

Apart from the new UI test, can you also add unit tests that would distinguish the implementation from the Levenshtein distance, and unrestricted Damerau–Levenshtein distance?

@jhpratt
Copy link
Member Author

jhpratt commented Feb 19, 2023

@tmiasko Done.

@tmiasko
Copy link
Contributor

tmiasko commented Feb 20, 2023

@bors r+

@bors
Copy link
Contributor

bors commented Feb 20, 2023

📌 Commit ab4c0dd has been approved by tmiasko

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 20, 2023
…tein-distance, r=tmiasko

Use restricted Damerau-Levenshtein distance for diagnostics

This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a _restricted_ implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.

This was first brought up [on IRLO](https://internals.rust-lang.org/t/18227) when it was noticed that the diagnostic for `prinltn!` (transposed L and T) was `print!` and not `println!`. Only a single existing UI test was effected, with the result being an objective improvement.

~~I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.~~

r? `@estebank`

`@rustbot` label +A-diagnostics +C-enhancement
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2023
…iaskrgr

Rollup of 5 pull requests

Successful merges:

 - rust-lang#108124 (Document that CStr::as_ptr returns a type alias)
 - rust-lang#108171 (Improve building compiler artifacts output)
 - rust-lang#108200 (Use restricted Damerau-Levenshtein distance for diagnostics)
 - rust-lang#108259 (remove FIXME that doesn't require fixing)
 - rust-lang#108265 ("`const` generic" -> "const parameter")

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 226ce31 into rust-lang:master Feb 20, 2023
@rustbot rustbot added this to the 1.69.0 milestone Feb 20, 2023
@jhpratt jhpratt deleted the restricted-damerau-levenshtein-distance branch February 20, 2023 22:17
notriddle added a commit to notriddle/rust that referenced this pull request Mar 11, 2023
Based on rust-lang#108200, for the same
rationale.

> This replaces the existing Levenshtein algorithm with the
> Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change
> (a transposition) instead of two (a deletion and insertion). More
> specifically, this is a restricted implementation, in that "ca" to "abc"
> cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the
> middle of a transposition. I believe that errors like that are sufficiently
> rare that it's not worth taking into account.

Before this change, searching `prinltn!` listed `print!` first, followed
by `println!`. With this change, `println!` matches more closely.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 12, 2023
… r=GuillaumeGomez

rustdoc: use restricted Damerau-Levenshtein distance for search

Based on rust-lang#108200, for the same rationale.

> This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a restricted implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.

Before this change, searching [`prinltn!`] listed `print!` first, followed by `println!`. With this change, `println!` matches more closely.

[`prinltn!`]: https://doc.rust-lang.org/nightly/std/?search=prinltn!
bors added a commit to rust-lang/cargo that referenced this pull request Apr 14, 2023
Use restricted Damerau-Levenshtein algorithm

This uses the same implementation as the one used in rustc, so review should be simple. As with rust-lang/rust#108200, the module and function names have been changed to be implementation-agnostic.

[Reference](https://github.com/rust-lang/rust/blob/13d1802b8882452f7d9d1bf514a096c5c8a22303/compiler/rustc_span/src/edit_distance.rs) for rustc's current implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants