Skip to content

New PURL type for non-packaged software #516

@stevespringett

Description

@stevespringett

Proposal: Add new PURL type scid (Software Component IDentification)

Background

The current list of PURL types covers a wide range of package ecosystems. However, there remains a significant gap in the ability to identify non-ecosystem software components, particularly commercial, proprietary, and standalone open source software that is not distributed through a package manager or authoritatively available on a repository with a supported PURL type, such as GitHub.

While the SWID (Software Identification) PURL type has historically been used in asset and inventory contexts to identify installed software, it is:

  • overly complex for general use,
  • tightly coupled to specific enterprise and government implementations,
  • and requires fields such as tagId that are not always meaningful or available in modern DevOps and security pipelines.

Proposal

Introduce a new PURL type: scid, an acronym for Software Component Identification, pronounced /skɪd/.

This PURL type is intended to cover:

  • Commercial and proprietary software (e.g., Acme Database Server)
  • Standalone open source projects (e.g., the Linux Kernel)
  • Internally developed applications that do not belong to a recognized package ecosystem
  • Binary-only or installable software where no manifest-based or registry-driven packaging exists

It avoids SWID complexity by not requiring a tagId, and instead supports a simpler identifier model aligned with the core PURL structure:

pkg:scid/<domain>/<vendor>/<component>@<version>?arch=<arch>&edition=<edition>&target=<target>&locale=<locale>

Field Breakdown

Field PURL Component Required Description
domain namespace Yes Internet domain name of the entity making the claim (e.g., publisher, distributor).
vendor namespace Yes Human-readable name of the software creator or vendor (e.g., Microsoft, Acme Corp).
component name Yes Name of the actual software product, project, or application.
version version Optional Version string identifying the specific release, tag, or snapshot of the product.
arch qualifiers Optional CPU architecture the software was built for (e.g., x86_64, arm64).
edition qualifiers Optional Specific edition of the product (e.g., community, enterprise).
target qualifiers Optional Target software environment (e.g., windows, java, android).
locale qualifiers Optional Language of the software, expressed as an ISO 639-1 code (e.g., en) or a full BCP 47 language tag (e.g., en-GB).
subpath Optional The subpath optionally identifies a specific module, subcomponent, or file within the broader software product.

NOTE: The namespace must contain two and only two segments: the first segment represents the domain, and the second segment represents the vendor.

Why Both Domain and Vendor Are Required

The domain and vendor fields in the scid PURL type serve distinct but complementary purposes. Both are required to ensure accurate attribution, long-term traceability, and disambiguation across organizations and software products.


1. Domain Names Are Leased, Not Owned

A domain name (e.g., acme.com) is not a permanent identifier. Domains can:

  • Expire, be abandoned, or not be renewed.
  • Change ownership over time, sometimes without any continuity of the original organization.
  • Be shared, squatted, or sold in secondary markets.

Relying solely on the domain as an identifier introduces ambiguity. For example, if two unrelated companies have used the domain acme.com at different points in time, associating that domain with a product like HelloWorld could lead to incorrect assumptions about origin or authorship.

By requiring a separate vendor field (e.g., Acme Robotics), the PURL preserves the human-recognisable identity of the software creator, independent of domain lifecycle or ownership.


2. Vendors Outlive Their Domains (and Vice Versa)

Organizations may go out of business, dissolve, or be acquired. In such cases:

  • Their domains may be decommissioned or transferred to another entity.
  • Their products may still be widely deployed or archived for compliance, security, or historical analysis.

Including a vendor field ensures that the name of the software creator remains intact and attributable, even if the domain is no longer valid or resolvable.

Conversely, some domains are retained for legacy reasons, but the actual vendor identity may have changed or become ambiguous. In both scenarios, having both fields offers a more complete and resilient reference.


3. Disambiguation of Identical or Similar Vendor Names

There are many organisations with similar or identical names. For example:

pkg:scid/acme-industries.com/Acme/analytics-suite@5.2.1
pkg:scid/acmerobotics.org/Acme/robot-os@2.3.0

Here, Acme is the vendor name in both cases, but the domain clearly differentiates between the two entities. Without both fields, there is no reliable way to determine which “Acme” is responsible for a given product.


4. Cross-organizational Relationships

In some cases, the entity responsible for distributing or asserting the identity of the software may differ from the software’s creator. For example:

pkg:scid/distributor.com/AcmeCorp/appsuite@1.0.0
  • distributor.com represents the domain of the organization asserting the software’s metadata.
  • AcmeCorp remains the recognized vendor or creator of the application.

This distinction is important in compliance, inventory, and contractual contexts.


5. Consistency with Existing Standards and Enterprise Systems

The inclusion of both domain and vendor in the scid PURL type aligns with how enterprise systems already track and manage software. These two fields are foundational in existing IT Asset Management (ITAM), Software Asset Management (SAM), and discovery and inventory tools used in medium-to-large organizations.


6. Interoperable Successor to CPE

One of the core design goals of this PURL type is to enable interoperability with existing systems that utilize legacy identifiers such as CPE. While CPE provides a structured method of identifying software, it suffers from centralization, a rigid schema, and a heavy reliance on manual human curation. These characteristics make it difficult to scale in modern, dynamic environments where new software products, forks, and distributions emerge continuously. In contrast, the PURL format is inherently decentralized and URI-friendly, enabling toolchains, vendors, and open source communities to generate identifiers independently without requiring central registry approval. Despite this shift, scid PURLs aim to retain semantic compatibility with CPE by using comparable fields and qualifiers, ensuring they can be adopted by inventory, vulnerability, and compliance systems that currently rely on CPE naming conventions.

CPE Field scid PURL Equivalent Notes
part (implicit in PURL context) CPE uses a (application), o (OS), h (hardware); scid PURL assumes software
vendor vendor Human-readable vendor name
product component Software product, project, or component name
version version Version string
update (not directly mapped) Could be embedded in version or excluded for simplicity
edition edition (qualifier) Directly mapped
language locale (qualifier) locale supports ISO 639-1 and BCP 47
sw_edition edition (qualifier) May be combined under edition in PURL
target_sw target (qualifier) Target software environment (e.g., java, windows)
target_hw arch (qualifier) CPU architecture (e.g., x86_64, arm64)
other (not mapped) Use additional PURL qualifiers

scid PURLs align with CPE parts a (application) and o (operating system). For hardware (h), future systems should rely on GS1 standards like GMN and GTIN. There’s no need to reinvent established identifiers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions