TELUS Digital GenAI Safety Benchmark: Every Model Tested Was Exploitable, with Attack Success Rates from 1.3% to 93% Across 620,000+ Adversarial Attacks

What happened

TELUS Digital's Fuel iX Applied Research team published the GenAI Safety Model Benchmark 2026 on May 26, covering 34 models from 10 providers across more than 620,000 adversarial attack evaluations and 15 risk categories. Key findings: every single model was exploitable; attack success rates ranged from 1.3% (best) to 93% (worst); the majority of production-popular models exceeded 40% attack success rate. Three attack categories broke every model tested including top performers: privacy/personal data exploitation, fraud/financial scams, and cybersecurity threat generation. Small models (≤10B parameters) failed to resist attacks 86% of the time. A novel 'refuse-but-engage' behavior was identified — models declining but then continuing to assist with the underlying harmful topic — classified as a distinct exploitable vulnerability class.

Why it matters

The benchmark shifts the model safety conversation from 'which model is safest' to 'what is your specific attack surface given your deployment context.' The refuse-but-engage finding is particularly actionable: models deployed in customer service, financial advisory, or compliance workflows that exhibit this pattern provide no real safety boundary. The finding that Chinese-origin models show no meaningful safety difference from Western models once size is controlled also removes a commonly-cited but unsupported sourcing heuristic.

Action needed

Download the full TELUS Digital benchmark and map your deployed models against the 15 attack categories. Specifically test for refuse-but-engage behavior in your production deployment context — not just in generic safety evaluations. Establish continuous red-teaming as a release gate rather than a one-time pre-launch check, especially when upgrading model versions or changing fine-tuning.

TELUS Digital GenAI Safety Benchmark: Every Model Tested Was Exploitable, with Attack Success Rates from 1.3% to 93% Across 620,000+ Adversarial Attacks

What happened

Why it matters

Action needed

Sources