-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use restricted Damerau-Levenshtein distance for diagnostics #108200
Use restricted Damerau-Levenshtein distance for diagnostics #108200
Conversation
Great PR, when reading rustc's suggested names, I always felt like there was room to improve. cargo also uses the same distance algorithm, stored in |
The documentation of |
@steffahn I'm thinking about just changing it to be a generic "edit distance" reference. There's no reason the caller should care about the exact algorithm, after all. |
983e522
to
20282c1
Compare
Addressed all comments so far. I moved the functions and modules to correspond to a general "edit distance" rather than Levenshtein or Damerau-Levenshtein. In doing so I left a comment about the current implementation for anyone reading to know what it is. One thing I ran across in doing this was a diagnostic for |
Just so I understand your comment correctly, you are saying that a limit of 3 instead of 2 for macros would produce only (or mostly) unhelpful additional suggestions, thus 2 is better? |
Note: this isn't for macros, it's a special diagnostic for I don't have strong feelings on it either way; it merely seemed appropriate. |
Ah, thanks for the clarification and links. I only wanted to understand what this was about. |
Apart from the new UI test, can you also add unit tests that would distinguish the implementation from the Levenshtein distance, and unrestricted Damerau–Levenshtein distance? |
@tmiasko Done. |
@bors r+ |
…tein-distance, r=tmiasko Use restricted Damerau-Levenshtein distance for diagnostics This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a _restricted_ implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account. This was first brought up [on IRLO](https://internals.rust-lang.org/t/18227) when it was noticed that the diagnostic for `prinltn!` (transposed L and T) was `print!` and not `println!`. Only a single existing UI test was effected, with the result being an objective improvement. ~~I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.~~ r? `@estebank` `@rustbot` label +A-diagnostics +C-enhancement
…iaskrgr Rollup of 5 pull requests Successful merges: - rust-lang#108124 (Document that CStr::as_ptr returns a type alias) - rust-lang#108171 (Improve building compiler artifacts output) - rust-lang#108200 (Use restricted Damerau-Levenshtein distance for diagnostics) - rust-lang#108259 (remove FIXME that doesn't require fixing) - rust-lang#108265 ("`const` generic" -> "const parameter") Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Based on rust-lang#108200, for the same rationale. > This replaces the existing Levenshtein algorithm with the > Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change > (a transposition) instead of two (a deletion and insertion). More > specifically, this is a restricted implementation, in that "ca" to "abc" > cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the > middle of a transposition. I believe that errors like that are sufficiently > rare that it's not worth taking into account. Before this change, searching `prinltn!` listed `print!` first, followed by `println!`. With this change, `println!` matches more closely.
… r=GuillaumeGomez rustdoc: use restricted Damerau-Levenshtein distance for search Based on rust-lang#108200, for the same rationale. > This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a restricted implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account. Before this change, searching [`prinltn!`] listed `print!` first, followed by `println!`. With this change, `println!` matches more closely. [`prinltn!`]: https://doc.rust-lang.org/nightly/std/?search=prinltn!
Use restricted Damerau-Levenshtein algorithm This uses the same implementation as the one used in rustc, so review should be simple. As with rust-lang/rust#108200, the module and function names have been changed to be implementation-agnostic. [Reference](https://github.com/rust-lang/rust/blob/13d1802b8882452f7d9d1bf514a096c5c8a22303/compiler/rustc_span/src/edit_distance.rs) for rustc's current implementation.
This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a restricted implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.
This was first brought up on IRLO when it was noticed that the diagnostic for
prinltn!
(transposed L and T) wasprint!
and notprintln!
. Only a single existing UI test was effected, with the result being an objective improvement.I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.r? @estebank
@rustbot label +A-diagnostics +C-enhancement