Skip to content

Feature/sem review#19

Open
antiartificial wants to merge 13 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review
Open

Feature/sem review#19
antiartificial wants to merge 13 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review

Conversation

@antiartificial
Copy link
Copy Markdown

@antiartificial antiartificial commented Mar 8, 2026

Adds a few new features:

sem review and sem log work with historical state whereas sem graph and sem impact work with current state. Signature change detection (new capability, powers the [signature changed] / [body only] labels in review and changelog).

  • sem log full history of any function (entity), following it through renames and moves across commits
  • sem changelog auto-categorized release notes from a commit range with a suggested semver bump
  • sem review groups changes by impact (API surface, internal, config) with dependency counts and a risk assessment

Other items in this PR:

  • This includes a slight fix to the version string handling, previously this was hardcoded and could become stale, now reflects what's in the Cargo.toml.
  • Test coverage for all three new modules (19 new tests)
  • README updated with documentation for the new commands

Example usage/output

sem review --from HEAD~2 --to HEAD # last two commits

  ┌─ Internal Changes ──────────────────────────────────
  │  ∆ function   getSupportedMimeType      [modified]
  │  ∆ variable   types                     [modified]
  └───────────────────────────────────────────────────────

  Summary: 2 internal
  Risk: low (internal/config changes only, no public API impact)

Example usages:
sem log --entity authenticateUser # finds all references to entity (ex. a function)
sem log --entity login --file src/auth.ts # constrains search to specific file

Example output:

>sem log --entity getSupportedMimeType                                                                                                              (base)

  prism-web/src/lib/audio.ts :: function :: getSupportedMimeType

  349f65f  2026-02-18  Aaron Barton  [modified]       Fix Brave misdetected as Safari: prefer webm over mp4 MIME type
  45b819f  2026-02-06  antiartifici  [added]          Add Prism: voice-driven Socratic interview simulator

Example usages:
sem changelog --from v1.2.0 --to HEAD # ex changes spanning a tag range (or commits)
sem changelog --from v1.0.0 --to v2.0.0 --format markdown

Example output:

sem changelog --from HEAD~1 --to HEAD                                                                                                              (base)

## Unreleased — 2026-03-08

### Internal
  · `getSupportedMimeType` — function modified in prism-web/src/lib/audio.ts
  · `types` — variable modified in prism-web/src/lib/audio.ts

Suggested version bump: PATCH (no new public API, no breaking changes)

@antiartificial antiartificial marked this pull request as ready for review March 8, 2026 23:31
@rs545837
Copy link
Copy Markdown
Member

rs545837 commented Mar 9, 2026

Thanks I will take a look at this asap.

@rs545837
Copy link
Copy Markdown
Member

rs545837 commented Mar 9, 2026

Replied on the discussion with more context on how this relates to inspect and the broader roadmap. The sem log and sem changelog modules look good and fill real gaps in the CLI. sem review overlaps significantly with inspect which does entity-level triage with graph-based risk scoring and LLM integration.

Going to review the code in detail. Main things I want to check: how the signature detection interacts with sem-core's existing structural_hash (which already distinguishes cosmetic vs structural changes), and whether the log module's git history walking could live in sem-core as a library function so inspect and weave can use it too.

Palanikannan1437

This comment was marked as off-topic.

Copy link
Copy Markdown
Contributor

@Palanikannan1437 Palanikannan1437 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature set — review, changelog, and log are solid additions. Left some inline comments. The two I'd want addressed before merge are the signature breaking classification (return type changes + language-aware param handling) and the entity name collision in log history tracking. The rest are suggestions for follow-ups.

Also we could discuss more on what should stay here vs in inspect as @rs545837 mentioned above

pub fn is_breaking(&self) -> bool {
matches!(
self,
SignatureChangeKind::ParamsRemoved { .. }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ReturnTypeChanged should be in here too — changing a return type will break every caller that depends on the old type. Right now it silently gets classified as non-breaking, which means changelog/semver will miss real breakage.

pub fn get_log(&self, limit: usize) -> Result<Vec<CommitInfo>, GitError> {
/// Walk commits lazily from `to` (or HEAD) backward, stopping at `from`.
/// The callback returns `true` to continue or `false` to stop early.
pub fn for_each_commit(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice API — the lazy callback with early-stop is a good design for entity log. One thing to consider for later: the callback receives CommitInfo with sha as a String, but callers like build_entity_log immediately re-resolve that SHA back into trees via resolve_tree. Passing richer data (or at least the Oid + parent) would avoid that round-trip.

@antiartificial
Copy link
Copy Markdown
Author

I'll have more time to digest and reflect this evening, thank you for your eyes and input.

@Palanikannan1437
Copy link
Copy Markdown
Contributor

Hey @antiartificial did you get some time to check them out?

@antiartificial
Copy link
Copy Markdown
Author

I've had some guests from out of town. Catching up!

@antiartificial
Copy link
Copy Markdown
Author

antiartificial commented Mar 17, 2026

Replied on the discussion with more context on how this relates to inspect and the broader roadmap. The sem log and sem changelog modules look good and fill real gaps in the CLI. sem review overlaps significantly with inspect which does entity-level triage with graph-based risk scoring and LLM integration.

Going to review the code in detail. Main things I want to check: how the signature detection interacts with sem-core's existing structural_hash (which already distinguishes cosmetic vs structural changes), and whether the log module's git history walking could live in sem-core as a library function so inspect and weave can use it too.

Thank you, the idea with sem review was to offer an offline first no LLM, would you be interested in merging the logic here, keeping sem review as the offline baseline?

the structural_hash and analyze_signature_change operate independently, I feel they compliment one another but don't coordinate. build_entity_log already uses both for formatting only plus signature analysis for the detail.

structural_hash ast level answers "did anything meaningful change, or was it just formatting or comments?" acts as a fast pre-filter.
analyze_signature_change function signature only (params and return type) answers "how exactly did the signature change? is it breaking"

sem review uses structural_change to label things "cosmetic" but doesn't run signature analysis so a review will show a parameter removal as just "modified" and it won't say "parameter removed (breaking). The changelog and log commands do run signature analysis. if inspect wants breaking code change detection in its triage:

  1. call the analyze_signature_change function directly
  2. use the changeling/log output which includes signature details

structural_hash is the "did anything real change" gate and signature analysis is the "what exactly changed in the API contract"

As far as some of the feedback: targeted single-entity parsing instead of computing a full semantic diff for all changed files in a commit parse only the tracked file and compare only the tracked entity and shared "analyzed change" structure when calling build_review and reclassifying everything into changelog categories and using an intermedia structure. I'm happy to tackle these in this effort if you don't mind it, or happy to do a follow up PR for each.

  Groups entity-level changes into API surface (cross-file dependents),
  internal (body-only, deletions, no cross-file refs), and config/data
  categories. Includes approximate dependent counts from the entity graph
  and a risk heuristic (low/medium/high). Supports terminal and JSON output.
Classifies entity-level changes into Keep-a-Changelog categories
(Breaking, Added, Changed, Removed, Internal) with semver bump
suggestion. Includes Conventional Commits parsing from commit
messages and supports terminal, markdown, and JSON output formats.
Also adds get_log_range() to GitBridge for commit message access.
Re-parses before/after entity content with tree-sitter to extract
parameter lists and classify changes as breaking (params removed or
reordered) vs non-breaking (params added, body-only). Supports
TypeScript, JavaScript, Python, Rust, Go, Java, C, C++, and more.
Integrated into `sem changelog` for automatic breaking change detection.
Add README sections for entity history tracking, semantic review,
and changelog generation. Update language table with Swift, Elixir,
Bash, and Vue. Document sem diff file comparison mode.
19 new tests covering risk assessment, config classification,
entity resolution, changelog generation, and modification
classification. All 83 tests pass.
- ReturnTypeChanged now classified as breaking (breaks callers)
- ParamsAdded label changed to "signature changed" (can't verify defaults)
- Entity log matches file_path + entity_name to prevent cross-file misattribution
- Replace --follow (always true, no way to disable) with --no-follow flag
- Remove aggressive has_breaking_commits && dependent_count >= 5 heuristic
- compute_semver considers removed public entities with dependents as MAJOR
- Add --full flag to sem changelog and sem review to show all changes
…s diff.rs, review.rds, changelog.rs consolidated normalize_extts. Config changesi n risk assessment, now considers modified config/data as medium risk instead of silently falling through to low. Richer CommitInfo.
…ommon::open_git_or_exit(). Replaces raw ANSI escapes with colored crate for consistency. Simplifies resolve_file_changes match bloxes to a git_or_exit helper. find_conventional_type now uses prebuilt HashMap<&str, &str> for o(1)instead of o(n)
@rs545837
Copy link
Copy Markdown
Member

Hey @antiartificial, sorry for the slow reply!

Took another look and the signature classification and entity disambiguation are both handled nicely, good work on those.

Here's what I'd like to do: let's merge sem log and sem changelog from this PR, but leave out sem review for now since it overlaps with inspect and I want to think through the right boundary there.

Could you split out the sem review parts so we can merge the rest? Once that's done this is good to go.

@antiartificial
Copy link
Copy Markdown
Author

@rs545837 no problem, we are all busy, certainly I can proceed with sem log and changelog sans review.

@rs545837
Copy link
Copy Markdown
Member

Awesome sounds great.

sem log and sem changelog are ready to merge; sem review is deferred
until the boundary with inspect is clarified. The parser/review module
is retained as an internal dependency of the changelog pipeline.
Copy link
Copy Markdown

@inspect-review inspect-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspect review

Triage: 175 entities analyzed | 0 critical, 0 high, 68 medium, 107 low
Verdict: standard_review

Findings (1)

  1. [low] analyze_signature_change misclassifies return-type additions/removals as BodyOnly: when before_params == after_params and before_ret != after_ret, it only returns ReturnTypeChanged if both are Some. If one side is None (return type added/removed), it falls through to SignatureChangeKind::BodyOnly, which is incorrect.
    Evidence (crates/sem-core/src/parser/signature.rs):
if before_params == after_params {
    if before_ret != after_ret {
        if let (Some(br), Some(ar)) = (before_ret, after_ret) {
            return SignatureChangeKind::ReturnTypeChanged { before: br, after: ar };
        }
    }
    return SignatureChangeKind::BodyOnly;
}

Reviewed by inspect | Entity-level triage found 0 high-risk changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants