Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False-positive proprietary-license finding in Guava source code #2865

Open
sschuberth opened this issue Feb 15, 2022 · 10 comments
Open

False-positive proprietary-license finding in Guava source code #2865

sschuberth opened this issue Feb 15, 2022 · 10 comments

Comments

@sschuberth
Copy link
Collaborator

sschuberth commented Feb 15, 2022

Description

Scanning https://github.com/google/guava/blob/v31.0.1/guava/src/com/google/common/graph/StandardValueGraph.java#L36 results in a false-positive license finding of proprietary-license, although no licenses declaration is present at all. (Thanks to @PatteSI for finding this.)

The matched text just says

changes to the graph (if the graph is mutable) but may not be modified by the user.

I guess the "modified by the user" words are what triggers the finding. However, what's a bit disturbing is that the license score is 100.0 for this match... so ScanCode is ultimatively confident that this is a license match, and we can't get rid of it by adjusting the --license-score.

How To Reproduce

scancode --license --json-pp - StandardValueGraph.java

System configuration

Ubuntu Linux 18.04

@pombredanne
Copy link
Member

pombredanne added a commit that referenced this issue Feb 15, 2022
Reference: #2865
Reported-by: Patrick Kutter @PatteSI
Reported-by: Sebastian Schuberth @sschuberth
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Member

@sschuberth @PatteSI : there you go : dade204 :)

@sschuberth
Copy link
Collaborator Author

Thank you for the quick fix @pombredanne! I believe @PatteSI has a few more of similarly obvious cases, and will eventually report them as separate issues.

@pombredanne
Copy link
Member

Thank you for the quick fix @pombredanne! I believe @PatteSI has a few more of similarly obvious cases, and will eventually report them as separate issues.

Excellent! The more the better! @PatteSI I could also show you to fix these if you like!

@PatteSI
Copy link

PatteSI commented Feb 16, 2022

@pombredanne Thank you for the quick "fix".
However I wanted to ask if you think that this is the correct way, adding exceptions for specific wordings in specific source files instead of trying to make the matching pattern more specific resp. improving the rules.
Unfortunately after my first investigation I have to disagree with @sschuberth and in our projects, using ScanCode 30.1.0, we have literally 100s of those trivial false positives with the message reference "LicenseRef-scancode-unknown-*" We actually have our own curation team that currently does nothing else than globally curating those false positives for all our teams. They are not aware that those finding are considered a "bug" and should rather be fixed here as an exception than in an ORT yml curation file.
In all of those cases the known OSS license is usually clearly declared and correctly detected in the header. ScanCode then continues to scan the comments and usually matches something "unknown".
In my opinion in such cases it is rather unlikely (though I guess theoretically possible) that an OSS license is used in the header and some comment later indicating that certain parts are under an unknown proprietary license and not the mentioned license of the header. I would be interested in a real example of an OSS library that does that.
1. I suggest to not rate those findings with a 100.0 license score.
2. It would be nice to reduce the amount of false positives for "unknown" licenses by improving the matching rules. Unfortunately I am not aware how you created the existing rules in the first place and how they could be further improved.
3. Maybe it is possible to let the ScanCode users configure some kind of max. value for scoring for "unknown" findings. Obviously you can never be 100% sure that those unknown findings are an actual match for an unknown proprietary license and to my personal experience they are most frequently false positives.

If you do not agree with my suggestions maybe you or @sschuberth have an idea how we could could provide you with a list of all our globally curated false positives. I believe it would be possible to create ScanCode rule expections from the global curation file that our curation team is building up. Still I think the other mentioned options would be the better way to go forward here.

@PatteSI
Copy link

PatteSI commented Feb 17, 2022

Nevermind I think you were already aware of the problematic "unknown" license detection behavior: #1675 I guess this covers my point 2. Not sure how helpful point 1 and 3 would be. However the issue with false "unknown" detection still seems to be big.

@pombredanne
Copy link
Member

@PatteSI @sschuberth I am putting together a the outline of a "False" plan at #2878
And I will ping all interested parties so we can find the best solution.
One thing that would be really critical is to get more examples and test cases so we can better discern patterns to resolve this in the most efficient way

@pombredanne
Copy link
Member

pombredanne commented Mar 3, 2022

@PatteSI and I had forgotten to thank you for the detailed suggestions.

@pombredanne
Copy link
Member

@PatteSI re:

One thing that would be really critical is to get more examples and test cases so we can better discern patterns to resolve this in the most efficient way

gentle ping... have you looked into providing me the data you have about curations so that we can fix detection in ScanCode for everyone?
Thanks!

@PatteSI
Copy link

PatteSI commented Jul 24, 2023

hello @pombredanne . It's been a while.
I think a lot has been done since the first creation of this issue here and a lot of RFCs have been produced.
I consider the main discussion going on in #2878
We can close this issue here and move the discussion to the above issue.

Anyways I again asked our team here at Bosch if we can provide a list of our global curations that contain many false positives but I doubt this will happen soon (or at all) due to compliance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants