SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

mjherzog · 2021-11-26T18:17:52Z

Description

Please leave a brief description of the bug or feature request:

SCTK reports a license_score=7.33 for a set of Java files based only on the word "classpath" in comments.
The match details are:
matched_rule_identifier = classpath-exception-2.0_5.RULE
matched_rule_matcher = 2-aho
matched_rule_length = 2
matched_rule_match_coverage = 2
matched-rule_relevance = 11

Since the word "classpath" is likely to appear frequently in Java comments, it would be good to avoid this false positive.

What version of scancode-toolkit was used to generate the scan file? version 30.1.0

pombredanne · 2021-11-27T06:40:02Z

This make sense but https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/data/rules/classpath-exception-2.0_5.RULE has this text classpath exception

Would you have a file with the problematic detection to link or attach?

pombredanne · 2021-11-27T07:07:09Z

Never mind I can the see the issue now. For instance this C++ snippet from https://github.com/SanDisk-Open-Source/SSD_Dashboard/blob/f0240a983544a86989eec80a9a5210f2b14fa1c1/uefi/gcc/gcc-4.6.3/libjava/gnu/classpath/jdwp/natVMVirtualMachine.cc#L280:

	    using namespace gnu::classpath::jdwp::exception;
	    throw new InvalidLocationException ();

is detected by this rule and with matched_text: classpath::[jdwp]::exception; and it should not be detected if there are extra words between classpath and exception

Signed-off-by: Philippe Ombredanne <[email protected]>

This is applying the renaming doone in the code to the actual rules Signed-off-by: Philippe Ombredanne <[email protected]>

Rename all match filter functions to use more explict names. Refactor function to set the lines as a LicenseMatch method. Add misc. new and improved license detection rules. Improve the order in which some match filters are processed. For instance this help to ensure that non spurious smaller matches are not merged and discarded in short spurious matches too early. Refine non-continuous matches filter for #2769 Rename filter_if_only_known_words_rule() to filter_non_continuous_matches() Also rename "continuous" Rule field to "is_continuous" Add new filter_short_matches_scattered_on_too_many_lines() filter This works by discarding some short matches that are scattered on too many lines to be a correct match. Improve overlapping filter for two-token matches that precede or follow longer matches and overlap only on the word "license". In these cases, these may be spurious and may be discarded. Add new and improved license detection rules, and improve existing license metadata. Improve code formatting and logging. Move model fields comments as help text on the model field defeinitions, such as License and Rule. This will help generate API documentation later. Signed-off-by: Philippe Ombredanne <[email protected]>

mjherzog added bug license scan labels Nov 26, 2021

pombredanne added a commit that referenced this issue Dec 23, 2021

Add test for continuous detection #2769

bb54959

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne added a commit that referenced this issue Dec 23, 2021

Rename only_known_words to continuous #2769

ff7c148

This is applying the renaming doone in the code to the actual rules Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne mentioned this issue Mar 5, 2022

RFC: a plan for false positive license detection #2878

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

mjherzog commented Nov 26, 2021

pombredanne commented Nov 27, 2021

pombredanne commented Nov 27, 2021

SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

SCTK 30.1.0 detects classpath-exception-2.0 based only on word "classpath" in Java comments #2769

Comments

mjherzog commented Nov 26, 2021

Description

pombredanne commented Nov 27, 2021

pombredanne commented Nov 27, 2021