Skip to content

Fix extracting license information for pypi packages#518

Merged
mpcen merged 2 commits intoclearlydefined:masterfrom
qtomlinson:qt/fix_lgpl
Apr 13, 2023
Merged

Fix extracting license information for pypi packages#518
mpcen merged 2 commits intoclearlydefined:masterfrom
qtomlinson:qt/fix_lgpl

Conversation

@qtomlinson
Copy link
Copy Markdown
Collaborator

No description provided.

There were issues in parsing LGPL license in spdx-correct. Previously
parsing LGPL was fixed once in "Fixed handling of GNU LGPL licenses in
spdx-correct". The fix depended on the file in /patches directory to
patch spdx-correct.  The /patches directory was not included in the
Dockerfile and hence the previous LGPL fix was not effective in the
docker deployment.

There was a recent release of spdx-correct.  The LGPL issues that the
patch intended to fix seem resolved. Upgrade spdx-correct to the most
recent version. LGPLv2 and LGPLv2+ are still not correctly identified.
Added patch for the specific cases.

Also update Dockerfile so that the patch will be effective in the container
deployment.

Test cases:
        "url": "cd:/pypi/pypi/-/pycountry/22.3.5"
        "url": "cd:/pypi/pypi/-/chardet/5.1.0"
        "url": "cd:/pypi/pypi/-/PyGObject/3.42.0"
In addition to info.classifier entries in the registry data used to
extract license information, there is also info.license in the registry data.
This can also provide license information when there is no license
information in the classifiers.

Tese cases:
pypi/pypi/-/dnspython/1.11.0
pypi/pypi/-/pytorch-ignite/0.5.0.dev20220727
pypi/pypi/-/mitmproxy-wireguard/0.1.10
@qtomlinson qtomlinson marked this pull request as ready for review April 11, 2023 23:12
@qtomlinson
Copy link
Copy Markdown
Collaborator Author

@mpcen ready for review

Copy link
Copy Markdown
Contributor

@jeffwilcox jeffwilcox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will share this with Manny to take a look as well.

@mpcen mpcen merged commit a28303b into clearlydefined:master Apr 13, 2023
qtomlinson pushed a commit to qtomlinson/crawler that referenced this pull request Feb 6, 2024
Fix extracting license information for pypi packages
@qtomlinson qtomlinson deleted the qt/fix_lgpl branch February 6, 2024 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants