One architectural decision shapes the cost of every AI AppSec deployment: does the agent derive security facts on its own, or work from precomputed, deterministic evidence?
The Endor Labs research team ran a controlled benchmark to find out. Same model, same 34 AppSec prompts, same step budget, across two agent configurations and 12 large open-source projects. One agent had access to precomputed Endor Labs evidence. The other worked from the local repo and public web research.
The results:
- 91.7% fewer tokens (6.6M vs 79.5M)
- 2.8x faster completion
- 77.6% fewer tool calls (1,380 vs 6,165)
- Costs stayed predictable (1.8x spread vs a 4x swing)
The 12x token gap comes down to reconnaissance. The unequipped agent averaged 14 tool calls per question — grep calls, manifest reads, directory listings — hand-rolling software composition analysis from scratch. The evidence-equipped agent averaged 3, all high-level security-aware lookups.
Cost also scales with codebase size when there's no evidence. The same 34 prompts cost 2.7M tokens on a small project and 10.5M on a large one. With evidence: 0.33M–0.61M regardless.
And without a call graph, agents get things wrong confidently. One agent spent 22x the tokens on a single reachability question and produced a well-cited, completely incorrect answer.
The takeaway: agents are good at synthesis once the facts are in front of them. Asking them to excavate those facts themselves is expensive and unreliable.
Read the full whitepaper →
What's next?
When you're ready to take the next step in securing your software supply chain, here are 3 ways Endor Labs can help:










