-
-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow references to license in another file: ScanCode reports the "SEE LICENSE IN <filename>" text in an NPM package.json as "Unkown" #1364
Comments
/cc @tsteenbe |
I think it should report a license and yes, unknown is not great. I have some plans to actually dereference any such detection in npm and beyond. For a start, license detection rules have a new field called "referenced_filenames" which is a list of filename references. See https://github.com/nexB/scancode-toolkit/search?l=YAML&q=%22referenced_filenames%22 Based on that, the logic will then be something more or less this way: For special cases such as npm "license" attribute that use a special convention, the logic would be to parse that directly and inject that logic in the code https://github.com/nexB/scancode-toolkit/blob/fd0a95a04658178b8b4e74351bb84a392b618383/src/packagedcode/licensing.py#L68 that does declared license normalization. |
I was briefly thinking about something like that, too, but felt it was overkill in this particular case as usually not only |
At this stage most license rules have been tagged with a |
@akugarg Here's an example npm package, see https://github.com/mongodb-js/vscode/blob/master/package.json#L34. Here We have to create a
|
Btw, instead of a post-scan plugin, this could be a The Also, we can focus on adding this just for files in the same directory, with proper tests, and then move on to adding more complicated cases in packagedcode. |
From chat@gitter Akanksha: Ayan: Akanksha: |
akugarg@8681972 |
See here for functions realted to Resource object on how to get parent resource. Have a look at the other functions for this class too. |
This should be still open:
|
@akugarg @AyanSinhaMahapatra ^ FYI |
Only follow license references match an exact filename In #2616 we introduced matching path of referenced_filenames based on matching filename or path suffix. This removes path suffix matching which is problematic. Before this we were using .endswith(path) and this led to weird and incorrect license dereferences Signed-off-by: Philippe Ombredanne <[email protected]>
Improve license referenced_filenames handling #1364
Signed-off-by: Philippe Ombredanne <[email protected]>
The new https://github.com/nexB/scancode-toolkit/blob/d1e725d3603a8f96c25f7e3f7595c68999b92a67/src/licensedcode/detection.py is what's needed to complete this. |
Out of curiosity, are you planning to make following these kind of references to only work when ScanCode is run with |
This is a feature of license detection in general so this is not only for In the context of a package scan the accuracy should be much better since we work from structured data. |
@sameer1046 FYI |
@sschuberth actually this is mostly there (at least in the develop branch): $ cat license-ref/COPYING $ cat license-ref/ref $ scancode -l --license-text --license-text-diagnostics --yaml - license-ref/ yields for
To finish this store, the use of the new "Detection" object that can merge multiple detection without loosing details will logically "merge" the two license expressions above in a single headers:
- tool_name: scancode-toolkit
tool_version: 30.1.0
options:
input:
- license-ref/
--license: yes
--license-text: yes
--license-text-diagnostics: yes
--unknown-licenses: yes
--yaml: '-'
notice: |
Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. No content created from
ScanCode should be considered or used as legal advice. Consult an Attorney
for any legal advice.
ScanCode is a free software code scanning tool from nexB Inc. and others.
Visit https://github.com/nexB/scancode-toolkit/ for support and download.
start_timestamp: '2022-01-13T080858.894595'
end_timestamp: '2022-01-13T080900.669069'
output_format_version: 2.0.0
duration: '1.7744977474212646'
message:
errors: []
extra_data:
spdx_license_list_version: '3.15'
OUTDATED: 'WARNING: Outdated ScanCode Toolkit version! You are using an outdated
version of ScanCode Toolkit: 30.1.0 released on: 2021-09-24. A new version is
available with important improvements including bug and security fixes, updated
license, copyright and package detection, and improved scanning accuracy. Please
download and install the latest version of ScanCode. Visit https://github.com/nexB/scancode-toolkit/releases
for details.'
files_count: 2
files:
- path: license-ref
type: directory
licenses: []
license_expressions: []
percentage_of_license_text: '0'
scan_errors: []
- path: license-ref/COPYING
type: file
licenses:
- key: apache-2.0
score: '100.0'
name: Apache License 2.0
short_name: Apache 2.0
category: Permissive
is_exception: no
is_unknown: no
owner: Apache Software Foundation
homepage_url: http://www.apache.org/licenses/
text_url: http://www.apache.org/licenses/LICENSE-2.0
reference_url: https://scancode-licensedb.aboutcode.org/apache-2.0
scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE
scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml
spdx_license_key: Apache-2.0
spdx_url: https://spdx.org/licenses/Apache-2.0
start_line: 1
end_line: 1
matched_rule:
identifier: apache-2.0_65.RULE
license_expression: apache-2.0
licenses:
- apache-2.0
referenced_filenames: []
is_license_text: no
is_license_notice: no
is_license_reference: no
is_license_tag: yes
is_license_intro: no
has_unknown: no
matcher: 1-hash
rule_length: 4
matched_length: 4
match_coverage: '100.0'
rule_relevance: 100
matched_text: 'license: apache 2.0'
license_expressions:
- apache-2.0
percentage_of_license_text: '100.0'
scan_errors: []
- path: license-ref/ref
type: file
licenses:
- key: unknown-license-reference
score: '100.0'
name: Unknown License file reference
short_name: Unknown License reference
category: Unstated License
is_exception: no
is_unknown: yes
owner: Unspecified
homepage_url:
text_url:
reference_url: https://scancode-licensedb.aboutcode.org/unknown-license-reference
scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE
scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.yml
spdx_license_key: LicenseRef-scancode-unknown-license-reference
spdx_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE
start_line: 1
end_line: 1
matched_rule:
identifier: unknown-license-reference_91.RULE
license_expression: unknown-license-reference
licenses:
- unknown-license-reference
referenced_filenames:
- COPYING
is_license_text: no
is_license_notice: no
is_license_reference: yes
is_license_tag: no
is_license_intro: no
has_unknown: yes
matcher: 1-hash
rule_length: 8
matched_length: 8
match_coverage: '100.0'
rule_relevance: 100
matched_text: This is free software. See COPYING for details.
- key: apache-2.0
score: '100.0'
name: Apache License 2.0
short_name: Apache 2.0
category: Permissive
is_exception: no
is_unknown: no
owner: Apache Software Foundation
homepage_url: http://www.apache.org/licenses/
text_url: http://www.apache.org/licenses/LICENSE-2.0
reference_url: https://scancode-licensedb.aboutcode.org/apache-2.0
scancode_text_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.LICENSE
scancode_data_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/apache-2.0.yml
spdx_license_key: Apache-2.0
spdx_url: https://spdx.org/licenses/Apache-2.0
start_line: 1
end_line: 1
matched_rule:
identifier: apache-2.0_65.RULE
license_expression: apache-2.0
licenses:
- apache-2.0
referenced_filenames: []
is_license_text: no
is_license_notice: no
is_license_reference: no
is_license_tag: yes
is_license_intro: no
has_unknown: no
matcher: 1-hash
rule_length: 4
matched_length: 4
match_coverage: '100.0'
rule_relevance: 100
matched_text: 'license: apache 2.0'
license_expressions:
- unknown-license-reference
- apache-2.0
percentage_of_license_text: '100.0'
scan_errors: []
|
We have support for this now, but the approach is refined in the next release. I am moving this there instead |
This is now supported comprehensively, and merged. Closing this! |
For non-SPDX / propretary licenses, NPM suggests to use a value of "SEE LICENSE IN " for the "license" key in
package.json
. If the license file is calledLICENSE
, that text ends up to be "SEE LICENSE IN LICENSE". At least ScanCode 2.9.7 reports this string as an "Unknown" license when scanningpackage.json
. IMO, that text should not trigger a license finding in this case.The text was updated successfully, but these errors were encountered: