-
Notifications
You must be signed in to change notification settings - Fork 220
Description
Proposal: Add new PURL type scid (Software Component IDentification)
Background
The current list of PURL types covers a wide range of package ecosystems. However, there remains a significant gap in the ability to identify non-ecosystem software components, particularly commercial, proprietary, and standalone open source software that is not distributed through a package manager or authoritatively available on a repository with a supported PURL type, such as GitHub.
While the SWID (Software Identification) PURL type has historically been used in asset and inventory contexts to identify installed software, it is:
- overly complex for general use,
- tightly coupled to specific enterprise and government implementations,
- and requires fields such as
tagIdthat are not always meaningful or available in modern DevOps and security pipelines.
Proposal
Introduce a new PURL type: scid, an acronym for Software Component Identification, pronounced /skɪd/.
This PURL type is intended to cover:
- Commercial and proprietary software (e.g., Acme Database Server)
- Standalone open source projects (e.g., the Linux Kernel)
- Internally developed applications that do not belong to a recognized package ecosystem
- Binary-only or installable software where no manifest-based or registry-driven packaging exists
It avoids SWID complexity by not requiring a tagId, and instead supports a simpler identifier model aligned with the core PURL structure:
pkg:scid/<domain>/<vendor>/<component>@<version>?arch=<arch>&edition=<edition>&target=<target>&locale=<locale>
Field Breakdown
| Field | PURL Component | Required | Description |
|---|---|---|---|
domain |
namespace | Yes | Internet domain name of the entity making the claim (e.g., publisher, distributor). |
vendor |
namespace | Yes | Human-readable name of the software creator or vendor (e.g., Microsoft, Acme Corp). |
component |
name | Yes | Name of the actual software product, project, or application. |
version |
version | Optional | Version string identifying the specific release, tag, or snapshot of the product. |
arch |
qualifiers | Optional | CPU architecture the software was built for (e.g., x86_64, arm64). |
edition |
qualifiers | Optional | Specific edition of the product (e.g., community, enterprise). |
target |
qualifiers | Optional | Target software environment (e.g., windows, java, android). |
locale |
qualifiers | Optional | Language of the software, expressed as an ISO 639-1 code (e.g., en) or a full BCP 47 language tag (e.g., en-GB). |
| subpath | Optional | The subpath optionally identifies a specific module, subcomponent, or file within the broader software product. |
NOTE: The namespace must contain two and only two segments: the first segment represents the domain, and the second segment represents the vendor.
Why Both Domain and Vendor Are Required
The domain and vendor fields in the scid PURL type serve distinct but complementary purposes. Both are required to ensure accurate attribution, long-term traceability, and disambiguation across organizations and software products.
1. Domain Names Are Leased, Not Owned
A domain name (e.g., acme.com) is not a permanent identifier. Domains can:
- Expire, be abandoned, or not be renewed.
- Change ownership over time, sometimes without any continuity of the original organization.
- Be shared, squatted, or sold in secondary markets.
Relying solely on the domain as an identifier introduces ambiguity. For example, if two unrelated companies have used the domain acme.com at different points in time, associating that domain with a product like HelloWorld could lead to incorrect assumptions about origin or authorship.
By requiring a separate vendor field (e.g., Acme Robotics), the PURL preserves the human-recognisable identity of the software creator, independent of domain lifecycle or ownership.
2. Vendors Outlive Their Domains (and Vice Versa)
Organizations may go out of business, dissolve, or be acquired. In such cases:
- Their domains may be decommissioned or transferred to another entity.
- Their products may still be widely deployed or archived for compliance, security, or historical analysis.
Including a vendor field ensures that the name of the software creator remains intact and attributable, even if the domain is no longer valid or resolvable.
Conversely, some domains are retained for legacy reasons, but the actual vendor identity may have changed or become ambiguous. In both scenarios, having both fields offers a more complete and resilient reference.
3. Disambiguation of Identical or Similar Vendor Names
There are many organisations with similar or identical names. For example:
pkg:scid/acme-industries.com/Acme/analytics-suite@5.2.1
pkg:scid/acmerobotics.org/Acme/robot-os@2.3.0
Here, Acme is the vendor name in both cases, but the domain clearly differentiates between the two entities. Without both fields, there is no reliable way to determine which “Acme” is responsible for a given product.
4. Cross-organizational Relationships
In some cases, the entity responsible for distributing or asserting the identity of the software may differ from the software’s creator. For example:
pkg:scid/distributor.com/AcmeCorp/appsuite@1.0.0
- distributor.com represents the domain of the organization asserting the software’s metadata.
- AcmeCorp remains the recognized vendor or creator of the application.
This distinction is important in compliance, inventory, and contractual contexts.
5. Consistency with Existing Standards and Enterprise Systems
The inclusion of both domain and vendor in the scid PURL type aligns with how enterprise systems already track and manage software. These two fields are foundational in existing IT Asset Management (ITAM), Software Asset Management (SAM), and discovery and inventory tools used in medium-to-large organizations.
6. Interoperable Successor to CPE
One of the core design goals of this PURL type is to enable interoperability with existing systems that utilize legacy identifiers such as CPE. While CPE provides a structured method of identifying software, it suffers from centralization, a rigid schema, and a heavy reliance on manual human curation. These characteristics make it difficult to scale in modern, dynamic environments where new software products, forks, and distributions emerge continuously. In contrast, the PURL format is inherently decentralized and URI-friendly, enabling toolchains, vendors, and open source communities to generate identifiers independently without requiring central registry approval. Despite this shift, scid PURLs aim to retain semantic compatibility with CPE by using comparable fields and qualifiers, ensuring they can be adopted by inventory, vulnerability, and compliance systems that currently rely on CPE naming conventions.
| CPE Field | scid PURL Equivalent |
Notes |
|---|---|---|
part |
(implicit in PURL context) | CPE uses a (application), o (OS), h (hardware); scid PURL assumes software |
vendor |
vendor |
Human-readable vendor name |
product |
component |
Software product, project, or component name |
version |
version |
Version string |
update |
(not directly mapped) | Could be embedded in version or excluded for simplicity |
edition |
edition (qualifier) |
Directly mapped |
language |
locale (qualifier) |
locale supports ISO 639-1 and BCP 47 |
sw_edition |
edition (qualifier) |
May be combined under edition in PURL |
target_sw |
target (qualifier) |
Target software environment (e.g., java, windows) |
target_hw |
arch (qualifier) |
CPU architecture (e.g., x86_64, arm64) |
other |
(not mapped) | Use additional PURL qualifiers |
scid PURLs align with CPE parts a (application) and o (operating system). For hardware (h), future systems should rely on GS1 standards like GMN and GTIN. There’s no need to reinvent established identifiers.