feat: add ground truth assertion support to Goal Success Rate evaluator by ybdarrenwang · Pull Request #180 · strands-agents/evals

ybdarrenwang · 2026-03-26T21:25:58Z

Description

This PR enhances the GoalSuccessRateEvaluator to support assertion-based evaluation mode alongside the existing conversation-analysis mode. The evaluator now accepts ground truth assertions via metadata and validates agent behavior against these explicit success criteria.

Related Issues

#95

Documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Co-authored-by: Kang Zhou <kangzhou1991@gmail.com> Co-authored-by: Subramanian Chidambaram <subbu10123@gmail.com>

poshinchen

I would prefer the assertions to be expected_criteria / expected_goal to align with what other evaluators have, and to avoid confusion between deterministic assertions
1.2. Add expected_criteria : str | None = None to EvaluationData and Case
How useful is the additional_context?
Can't the prompt be goal_success_rate_v1? what's the major diff?

src/strands_evals/evaluators/goal_success_rate_evaluator.py

ybdarrenwang · 2026-03-27T22:21:42Z

I would prefer the assertions to be expected_criteria / expected_goal to align with what other evaluators have, and to avoid confusion between deterministic assertions
1.2. Add expected_criteria : str | None = None to EvaluationData and Case

How useful is the additional_context?

Can't the prompt be goal_success_rate_v1? what's the major diff?

That makes sense. Will incorporate that in the next revision.
@kangISU @subbu10123 can you answer this?
I don't think so. goal_success_rate and goal_success_rate_with_assertions are fundamentally different prompts. In the future their versions have to evolve separately.

kangISU · 2026-03-27T22:42:18Z

How useful is the additional_context?

This is for flexibility. Customers may want to include additional context for their specific use cases/domains, to further guide/help the evaluation. However, it can be removed if flexibility is not a high priority here.

…ntext and unnecessary async func Co-authored-by: Kang Zhou <kangzhou1991@gmail.com> Co-authored-by: Subramanian Chidambaram <subbu10123@gmail.com>

update goal success rate evaluator to support ground truth assertion

decc8a6

Co-authored-by: Kang Zhou <kangzhou1991@gmail.com> Co-authored-by: Subramanian Chidambaram <subbu10123@gmail.com>

ybdarrenwang requested a deployment to manual-approval March 26, 2026 21:26 — with GitHub Actions Waiting

poshinchen requested changes Mar 27, 2026

View reviewed changes

afarntrog reviewed Mar 27, 2026

View reviewed changes

src/strands_evals/evaluators/goal_success_rate_evaluator.py Show resolved Hide resolved

promote expected_assertion from metadata to arg, remove additional_co…

29c048d

…ntext and unnecessary async func Co-authored-by: Kang Zhou <kangzhou1991@gmail.com> Co-authored-by: Subramanian Chidambaram <subbu10123@gmail.com>

ybdarrenwang requested a deployment to manual-approval March 27, 2026 23:55 — with GitHub Actions Waiting

ybdarrenwang mentioned this pull request Mar 28, 2026

docs: add ground-truth based evaluator examples strands-agents/docs#714

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ground truth assertion support to Goal Success Rate evaluator#180

feat: add ground truth assertion support to Goal Success Rate evaluator#180
ybdarrenwang wants to merge 2 commits intostrands-agents:mainfrom
ybdarrenwang:feature/gt-gsr

ybdarrenwang commented Mar 26, 2026

Uh oh!

poshinchen left a comment

Uh oh!

Uh oh!

ybdarrenwang commented Mar 27, 2026

Uh oh!

kangISU commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ybdarrenwang commented Mar 26, 2026

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

poshinchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ybdarrenwang commented Mar 27, 2026

Uh oh!

kangISU commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants