Cobalt Pentesting Report: AI Systems Show 2.5x Higher Severe-Flaw Density Than Legacy Apps

What happened

Cobalt's 2026 State of Pentesting Report revealed that 32% of all AI and LLM findings during penetration tests are rated high-risk—2.5 times the rate (13%) observed in traditional enterprise security tests. The report also found that LLM vulnerabilities have the lowest remediation rate across all application types tested (38%), and that one in five organizations reported experiencing an LLM security incident in the past year.

Why it matters

This is the first large-scale empirical evidence quantifying that AI systems introduce fundamentally riskier attack surfaces than legacy applications, with prompt injection, insecure plugins, and excessive agent permissions creating blast radii that span multiple internal systems. The 38% remediation rate indicates development teams lack established patterns for fixing AI-specific vulnerabilities, unlike traditional injection flaws where playbooks are mature.

Applicability

Organizations deploying or evaluating LLM-integrated systems should factor this 2.5x risk multiplier into security budgets, prioritize AI-specific penetration testing, and invest in training development teams on AI vulnerability remediation patterns—particularly for agentic systems with tool-use capabilities.

Cobalt Pentesting Report: AI Systems Show 2.5x Higher Severe-Flaw Density Than Legacy Apps

What happened

Why it matters

Applicability

Sources