The Chainlink CRE SDK (v1.0.9) was newly released during the hackathon window. We pioneered early adoption patterns including ConfidentialHTTPClient integration, .result() synchronous chaining for WASM compatibility, and ConsensusAggregationByFields for AI verdict BFT consensus.
Challenges solved:
- The
.result()pattern (no async/await) was not intuitive. CRE compiles workflows to WASM, so standard JavaScript async patterns break compilation. We had to rewrite all SDK calls to use the synchronous.result()chaining pattern. ConfidentialHTTPClientuses a completely different request format thanHTTPClient—multiHeaderswith{ values: ['...'] }objects instead of flat strings,bodyStringinstead ofbody, andvaultDonSecretsfor template injection. This wasn't obvious from docs and required reading SDK source code.ConsensusAggregationByFieldsrequired understanding that DON nodes must get identical results foridenticalaggregation to succeed. With AI models, eventemperature: 0can produce slight variations, so we had to carefully design our response parsing to extract only deterministic fields (verdict string, not full reasoning).runtime.now()instead ofDate.now(),runtime.log()instead ofconsole.log()— WASM runtime constraints that aren't documented prominently but break consensus if violated.
How we overcame it: Read the SDK source on GitHub, studied the example workflows in the CRE docs, and built incrementally — standard HTTP first, then added Confidential HTTP behind a feature flag, then behavioral scoring.
The dashboard used Next.js 15.3.3 with Turbopack. Production builds (next build) failed with cryptic errors that didn't appear in development (next dev).
Pain points:
export const dynamic = 'force-dynamic'is silently ignored in'use client'files. This caused prerender errors on every page that used React hooks:TypeError: Cannot read properties of undefined (reading 'env')andCannot read properties of null (reading 'useContext').- The fix required splitting every page into a thin server component wrapper (with the
dynamicexport) and a separate client component. We refactoredpage.tsx,presentation/page.tsx, andnot-found.tsxinto this pattern. NODE_ENV=developmentwas set as a Windows system environment variable, poisoningnext buildinto using the dev runtime even when we explicitly set production mode. Fixed withcross-env NODE_ENV=productionin the build script.- Bun's dependency resolution accidentally upgraded Next.js from 15.3.3 to 15.5.12 when we ran
bun add -d cross-env, which broke webpack (WebpackError is not a constructor). Had to pin Next.js back to15.3.3. - Next.js 15.3.3 has a known bug with Pages Router 404/500 fallback generation during
next build. Creating apages/directory with custom error pages made things worse (whack-a-mole). Ultimately accepted this as unresolvable —next devworks fine and judges will use that.
How we overcame it: Systematic debugging — isolated each prerender failure, traced it to the 'use client' + dynamic export conflict, then applied the server/client split pattern consistently across all pages.
The 7-dimension behavioral anomaly engine was the most complex piece of original logic.
Pain points:
- Sequential Probing detection needed to detect monotonically increasing values (binary search pattern) without false-positiving on legitimate ascending trades. The detection parameters were tuned through extensive testing.
- Cumulative Drift detection compares a rolling average against a "frozen origin" baseline. The challenge was determining when to freeze the origin — too early and the baseline is unreliable, too late and an attacker can poison it. The freeze window was calibrated through experimentation.
- Behavioral profile state isolation between test phases. Large-value attacks shifted the accumulated behavioral profile so dramatically that subsequent subtle attacks weren't detected. Fixed by implementing a max-score merge strategy.
- BigInt precision overflow in the evaluate API route — large wei values overflowed standard number conversions. Fixed by splitting the calculation into safe ranges.
How we overcame it: Built a full dry-run test harness that replayed all demo scenarios programmatically, caught failures, and fixed root causes rather than tweaking thresholds.
A fundamental design tension: blockchain storage is inherently public, but we wanted agents to be unable to see their own guardrails.
Pain points:
- Initially, our docs and README implied that ALL policy thresholds are hidden via Confidential Compute. But on-chain policy parameters (
maxTransactionValue,approvedContracts,blockedFunctions) are stored inSentinelGuardian.solas public state — any agent can callgetAgentPolicy()and read its own limits. - This meant our Confidential Compute story was technically inaccurate for Layer 1.
- We had to rethink the narrative: Layer 1 is transparent compliance (like publishing regulatory limits), while Layers 2 and 3 are confidential evaluation (behavioral weights, AI prompts, anomaly thresholds stay inside the TEE).
How we overcame it: Embraced the tension as a feature. The three-layer architecture is specifically designed so that knowing Layer 1's rules doesn't help bypass Layers 2 and 3. An agent can read its policy limits from the contract, but it can't see the behavioral scoring weights, anomaly thresholds, or AI evaluation prompts. Updated all docs to be precise about what each layer protects.
PolicyLib.checkAll() runs 7 independent validation checks, each needing access to policy parameters and action data.
Pain points:
- Solidity's 16-variable stack limit made it impossible to pass all parameters as individual function arguments. Created the
CheckParamsstruct to batch parameters. - The
processVerdict()function does a lot: ABI decode, 7 policy checks, stat updates, incident logging, severity classification, challenge window creation, and 4+ event emissions. Gas optimization required careful ordering — short-circuit on first policy failure rather than running all checks. - Dynamic arrays (
approvedContracts[],blockedFunctions[]) in storage are expensive.registerAgent()costs ~180K gas due to deep-copying dynamic arrays. Accepted this as a one-time registration cost. - Circular incident buffer (max 100 per agent) required careful index management to avoid unbounded storage growth.
How we overcame it: Designed PolicyLib as a pure library with no storage, used structs to batch parameters, and ordered checks from cheapest to most expensive so the common case (simple value check fails) exits early.
The demo runs 13 scenarios across 3 phases. Every scenario had to produce the exact expected result (APPROVED or DENIED) consistently.
Pain points:
- The evaluation server maintains accumulated behavioral profiles per agent. Running Phase 2 attacks (massive values) before Phase 3 edge cases shifted the profiles, causing Phase 3 scores to be unreliable.
- The demo script (v6) and dashboard UI got out of sync as we iterated. Button labels, scenario counts, and phase descriptions diverged.
- Behavioral resets between runs required a dedicated
/behavioral/resetendpoint and abun run behavioral:resetscript.
How we overcame it: Built a Node.js dry-run script that programmatically replayed all 13 scenarios, verified expected verdicts, and caught regressions. Aligned the demo script with the dashboard UI rather than the reverse. Added the max-score merge to ensure either the accumulated profile or the deterministic scorer could catch attacks.
The entire project was developed on Windows 10, which introduced platform-specific friction.
Pain points:
NODE_ENV=productionsyntax in npm scripts doesn't work on Windows. Requiredcross-envpackage.- Path separators (backslash vs forward slash) caused occasional issues with Foundry and Bun.
- Git line ending warnings (CRLF vs LF) on every commit — cosmetic but annoying.
- PowerShell vs Bash syntax differences when running commands.
How we overcame it: Used Bun (cross-platform) as the primary runtime, added cross-env for env vars, and used Git Bash for shell operations.
Building a full-stack project (Solidity contracts + CRE workflow + behavioral engine + Next.js dashboard + 4 tabs + 10-slide presentation + AI evaluation server + agent simulators + documentation) as a solo developer.
Key decisions:
- Deterministic AI evaluation for demo reliability — The CRE workflow is production-ready with pluggable AI endpoints (Anthropic Claude + OpenAI GPT-4). For the testnet phase, a deterministic evaluation engine implements the same API contracts, ensuring repeatable demo results. Production deployment connects real endpoints via Vault DON secret injection — the integration paths are fully designed and implemented behind the
enableConfidentialComputefeature flag. - Tenderly over live testnet — Virtual TestNet with funded accounts means no faucet hunting, instant transactions, and the Simulation API for the dashboard's drag-and-drop simulator.
- Feature flags over conditional compilation —
enableConfidentialComputeconfig flag lets us switch between standard and confidential HTTP without code changes. - Documentation as a feature — Invested heavily in README, TECHNICAL.md, ARCHITECTURE.md, CRE_INTEGRATION.md, CONFIDENTIAL-COMPUTE.md, SECURITY_MODEL.md, and INTEGRATION-GUIDE.md. This differentiates the submission from projects that are technically strong but poorly explained.