Threat mitigation: Runtime inspection of tool outputs for indirect prompt injection

The threat model identifies indirect prompt injection via tool outputs 
(content agents read from web, email, APIs) as a risk. Current mitigations 
focus on sandboxing and permission scoping, which contain the blast radius 
but don't detect the attack itself.

I built mlayer-guard, a runtime detection API that inspects tool outputs 
for injection before the agent acts on them. Available as an OpenClaw 
skill and as a REST API.

Benchmarked on public datasets:
- 98% detection on InjecAgent (ACL 2024, N=300)
- Zero false positives on Deepset (N=343)
- 94.1% on WildGuard (N=971)


The skill adds zero tokens to agent context — detection happens externally.

Demo: https://hidylan.ai/demo
OpenClaw skill: https://github.com/dmilstein-match/mlayer-guard-openclaw

Happy to discuss how this maps to specific threat cards in the model, 
or how it could complement the existing mitigations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Threat mitigation: Runtime inspection of tool outputs for indirect prompt injection #28

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Threat mitigation: Runtime inspection of tool outputs for indirect prompt injection #28

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions