Skip to content

Replace sed post-processing in 2pydantic with gen-pydantic template override#410

Merged
candleindark merged 1 commit into
linkml-conversionfrom
drop-sed-in-2pydantic
May 14, 2026
Merged

Replace sed post-processing in 2pydantic with gen-pydantic template override#410
candleindark merged 1 commit into
linkml-conversionfrom
drop-sed-in-2pydantic

Conversation

@candleindark
Copy link
Copy Markdown
Member

Summary

  • Replace the sed -E 's,[a-z]+COLON,,g' post-processing in the 2pydantic hatch script with an enum.py.jinja override passed to gen-pydantic via --template-dir.
  • The substitution now happens at the point where enum labels are emitted (via pv.label.split("COLON") | last), rather than as blind text replacement over the whole generated file.
  • This removes the dependency on a system sed from the Pydantic generation pipeline, which makes the "Validate the data instances against the Pydantic models generated from the LinkML schema" task in Add LinkML behavior tests for required: False -> True slot_usage refinement #408 easier to implement (e.g. generating the Pydantic models in-process or in environments where shelling out to sed is awkward).

Why

The old pipeline was:

gen-pydantic --black dandischema/models.yaml | sed -E -e 's,[a-z]+COLON,,g' > dandischema/models_linkml.py

gen-pydantic emits enum members whose Python identifiers carry LinkML's namespace-prefix munging — e.g. dandiCOLONOpenAccess = "dandi:OpenAccess". The sed stripped the <prefix>COLON portion file-wide.

Doing this at the template level is more precise:

  • it only touches the place where the prefix is known to be a munging artifact (the enum member label), not arbitrary file text;
  • the intent is explicit in the template rather than hidden in a regex pipeline;
  • it drops the runtime dependency on a system sed, which is useful for Add LinkML behavior tests for required: False -> True slot_usage refinement #408 (validating data instances against the generated Pydantic models) where we'd rather not have the generation step shell out.

Verification

Generated models_linkml.py two ways from dandischema/models.yaml (restored from the linkml-auto-converted branch):

  1. old: gen-pydantic --black … | sed -E 's,[a-z]+COLON,,g'
  2. new: gen-pydantic --black --template-dir tools/linkml_conversion_tools/pydantic_templates …

diff of the two outputs is empty — byte-identical. All 116 COLON occurrences in the raw gen-pydantic output are preceded by a lowercase prefix, so split("COLON") | last is equivalent to the sed on this schema.

Test plan

  • Compared old vs new output on dandischema/models.yaml from linkml-auto-converted — byte-identical.
  • On the linkml-auto-converted branch (or one stacked on top), run hatch run linkml-auto-converted:2pydantic and confirm dandischema/models_linkml.py is unchanged from before this PR.

… override

The `2pydantic` hatch script previously stripped LinkML's namespace-prefix
munging from enum member names with `sed -E 's,[a-z]+COLON,,g'` over the
whole generated file. Replace that with an `enum.py.jinja` override passed
via `gen-pydantic --template-dir`, so the substitution happens at the
exact place the labels are emitted (using `pv.label.split("COLON") | last`)
rather than as blind text replacement after the fact.

Verified byte-identical output against the previous `sed`-based pipeline
on `dandischema/models.yaml` from `linkml-auto-converted` (all 116 `COLON`
occurrences are preceded by a lowercase prefix, so `split("COLON") | last`
is equivalent to the `sed` substitution on this schema).

Co-Authored-By: Claude Code 2.1.141 / Claude Opus claude-opus-4-7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.83%. Comparing base (08e2d11) to head (23666ae).

Additional details and impacted files
@@                  Coverage Diff                  @@
##           linkml-conversion     #410      +/-   ##
=====================================================
+ Coverage              96.67%   97.83%   +1.16%     
=====================================================
  Files                     20       19       -1     
  Lines                   2436     2407      -29     
=====================================================
  Hits                    2355     2355              
+ Misses                    81       52      -29     
Flag Coverage Δ
unittests 97.83% <ø> (+1.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@candleindark candleindark merged commit a0cdc4a into linkml-conversion May 14, 2026
82 of 84 checks passed
@candleindark candleindark deleted the drop-sed-in-2pydantic branch May 14, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant