Skip to content

Templatize concrete path keys from OpenAPI discovery#877

Open
fmhall wants to merge 1 commit into
mainfrom
mason/templatize-openapi-paths
Open

Templatize concrete path keys from OpenAPI discovery#877
fmhall wants to merge 1 commit into
mainfrom
mason/templatize-openapi-paths

Conversation

@fmhall
Copy link
Copy Markdown
Member

@fmhall fmhall commented May 14, 2026

Summary

Some OpenAPI providers (notably war-tracker.com) deliberately use concrete path keys like /api/v1/events/397003 instead of /api/v1/events/{event_id} so their discovery probes hit a real URL. Their canonical templates live in the operation's description or in info.x-guidance. We previously stored those concrete keys verbatim, producing ugly resource rows with real IDs baked into them (see image in the kickoff message).

What changed

  • New templatizePath(url, hints) utility (apps/scan/src/lib/discovery/path-template.ts) with three-tier resolution:
    1. Passthrough — URLs that already contain {...} segments are left alone.
    2. Mining — scans the operation summary/description and the doc-level info.x-guidance for templated path patterns. When one matches our concrete URL's prefix, we adopt the provider's own parameter names (e.g. {event_id}, {iso2}, {slug}).
    3. Heuristic fallback — for unmatched concrete-looking segments: numeric / UUID / long hex → {id}; a segment immediately following an {id}{slug}. Ambiguous segments (e.g. events, accounts) are left alone to avoid false positives.
  • Plumbed canonicalUrl through DiscoveredResourceregisterResourcesFromDiscoveryregisterResource. The DB stores canonicalUrl when present, falls back to the concrete probe URL otherwise.
  • deprecateStaleResources now compares against the same form that gets stored.
  • fetchDiscoveryDocument requests GuidanceMode.Always so the mining step has the full guidance text regardless of length.
  • 26 unit tests, including a parametrized run across the full war-tracker resource set.

Why this design

  • Provider-preferred names: war-tracker uses {event_id} and {iso2} — by mining their spec we adopt those names verbatim, rather than rewriting everything to our generic {id} / {code}.
  • No breakage of the standard case: specs that already use {param} templates pass through untouched.
  • Conservative heuristics: we only auto-template segments that strongly look like identifiers. A new provider with /api/v1/widgets/123 works out of the box; a provider with /api/v1/widgets/blue won't get blue falsely turned into {slug}.
  • Future-proof: any provider that documents canonical templates anywhere in their spec text gets handled automatically.

Examples (war-tracker)

Before After
/api/v1/events/397003 /api/v1/events/{event_id}
/share/397003 /share/{event_id}
/share/397003/strike /share/{event_id}/{slug}
/media/397003 /media/{event_id}
/region/middle-east /region/{slug}
/country/UA /country/{iso2}
/event-type/drone-strike /event-type/{slug}

Test plan

  • pnpm test:run — 115 tests pass (26 new for templatizePath)
  • pnpm types:check — clean
  • pnpm lint — clean
  • Re-run war-tracker discovery in staging and verify resource rows show templated paths
  • Confirm that an existing standard-OpenAPI origin (already-templated keys) is unaffected

🤖 Generated with Claude Code

Some providers (e.g. war-tracker.com) deliberately use concrete path
keys like `/api/v1/events/397003` instead of `/api/v1/events/{event_id}`
so their discovery probes hit a real URL. The canonical templates live
in the operation summary/description or in `info.x-guidance`.

We previously stored those concrete keys verbatim, producing resource
rows with real IDs in them.

Add a `templatizePath` utility that:
  1. Returns already-templated URLs unchanged.
  2. Mines the operation summary and doc-level guidance for path
     patterns containing `{param}` placeholders, then matches them
     against the concrete URL prefix to derive provider-preferred
     parameter names (e.g. `{event_id}`, `{iso2}`).
  3. Falls back to safe heuristics for unmatched concrete segments
     (numeric / UUID / long hex → `{id}`; slug after `{id}` → `{slug}`).

Plumb a `canonicalUrl` field through `DiscoveredResource` and
`registerResource`. Use it for both DB upsert and stale-resource
deprecation comparison.

Request `GuidanceMode.Always` from the discovery package so the mining
step has access to the full guidance text regardless of length.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
x402scan Ready Ready Preview, Comment May 14, 2026 4:40pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant