Support Custom PyPI-Compatible Repositories for Package Metadata #258
+699
−1,007
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes python-inspector's ability to fetch package metadata from custom PyPI-compatible repositories (Artifactory, Nexus, DevPI, etc.). Previously, the tool would successfully resolve dependencies but fail to populate the
packagesarray with metadata, resulting in missing licenses, checksums, and download URLs.Problem Statement
When using python-inspector with custom repository indexes (via
--index-url):resolved_dependencies_graphwas correctly populatedpackagesarray remained emptyRoot Cause
File:
src/python_inspector/package_data.pyThe
get_pypi_data_from_purl()function hardcodedhttps://pypi.org/pypiinstead of deriving the base URL from the custom index:When internal packages don't exist on PyPI.org, the fetch returns 404 and the package is silently excluded from output.
Additional Challenge: URL Path Variations
Custom repositories may use different URL structures between their PEP 503 Simple API and JSON API endpoints:
https://repo.example.com/simple/../packages/hash/file.whlhttps://repo.example.com/pypi/hash/file.whlMatching by full URL fails when paths differ between endpoints.
Solution
This PR implements three key improvements:
1. Dynamic Base URL Derivation
Modified
get_pypi_data_from_purl()to derive the JSON API base URL from the provided repositories:/simple) to JSON API URLs (/pypi)2. Universal Filename-Based Matching
Instead of comparing full URLs (which vary by repository), extract and match by:
This approach is:
3. Relative URL Resolution
Added proper URL resolution for repositories that return relative URLs in their JSON API:
Changes Made
Modified Files
get_file_match_key()utility functionget_pypi_data_from_purl()to support custom repositoriesurlunparse)New Files
Testing
Unit Tests
pytest tests/test_package_data.py -v # ===== 14 passed, 11 warnings in 0.60s =====All tests cover:
Integration Testing
Tested with custom Artifactory repository:
Before fix:
{ "packages": [], "resolved_dependencies_graph": { "pkg:pypi/[email protected]": [...] } }After fix:
{ "packages": [ { "name": "internal-package", "version": "1.0.0", "license_expression": "Apache-2.0", "download_url": "https://repo.example.com/packages/...", "sha256": "abc123...", "md5": "def456..." } ], "resolved_dependencies_graph": { "pkg:pypi/[email protected]": [...] } }Tested scenarios:
Benefits
✅ Universal compatibility: Works with PyPI.org, Artifactory, Nexus, DevPI, and any PEP 503-compliant repository
✅ Backward compatible: Existing PyPI.org usage unchanged
✅ Complete metadata: Populates licenses, checksums, and download URLs for all packages
✅ ORT integration: Enables SBOM generation with custom repositories
✅ Future-proof: Independent of repository-specific URL structures
Compatibility
No breaking changes. Existing users with PyPI.org workflows are unaffected.