All Posts
-
VeilGate: When Your Defense Is a Lie That Costs the Attacker Money
Deception proxies flip the economics of AI-assisted pentesting by routing hostile automation into believable tarpits instead of blocking it
-
The Time Bomb in Your Fine-Tuned Model: MetaBackdoor Exploits Position, Not Content
A new backdoor attack requires no suspicious text—it activates when conversation length crosses a threshold, leaking system prompts and making unauthorized tool calls.
-
We Found a Regression in Our Own AI Agent
We built monitoring infrastructure to catch silent behavior changes in AI agent wrapper layers. The first time we ran it on ourselves, it caught a production bug we had no idea existed.
-
Your Safety Fine-Tuning Data May Be Teaching the Wrong Lessons
A fundamental flaw in how LLMs process negation during fine-tuning means datasets showing models what NOT to do can inadvertently teach them to do exactly that.
-
The Inbox Is the New Attack Surface: What Gemini Spark Reveals About Personal AI Agent Security
Google's personal AI agent has ambient authority over your Gmail, Calendar, and Drive. Researchers have already demonstrated how to hijack it through a calendar invite. Infrastructure defenses don't fix this.
-
When Your Safety Layer Gets Compromised: The npm Supply Chain Problem in AI Agent Pipelines
The Mini Shai-Hulud campaign hit guardrails-ai and the Mistral AI SDK. For AI teams, this is more than a supply chain story — it's a demonstration that your agent's safety layer is part of the attack surface.
-
Your Agent Runtime Is a 1960s Operating System
A new paper from TU Berlin and CISPA maps AI agent security onto 50 years of OS research — and finds that agent runtimes are failing to apply solutions that were well-understood before most of their developers were born.
-
Your Agent's Memory Is Building a Privacy Database You Didn't Design
Cloud-assisted agent memory systems are accumulating raw user PII — health conditions, credentials, contact details — in vector databases where it persists indefinitely. MemPrivacy shows the attack surface is real, quantified, and fixable. Here's the threat model most teams haven't modeled.
-
The Hidden Cost of Instructions: 12,956 Tokens Before You Say a Word
We measured how many tokens the Copilot CLI wrapper layer consumes before your first message. The answer — and what it means for context window budgeting — surprised us.
-
When Your Agent Forgets the Right Things: Skill Libraries as Emergent Defense Against Memory Poisoning
A new RL framework for agent skill libraries creates an unexpected security property: skills that lead to task failures get naturally retired. Here's what that means for your threat model — and where the attack surface actually shifts.
-
Your AI Agent Is an Improvised Prototype. Here's Why That's a Security Problem.
A new cs.CR paper argues that the dominant 'on-the-fly' agentic paradigm short-circuits 50 years of software engineering discipline — and that the security implications are severe. Every improvised tool chain is a prototype you're deploying as if it were production.
-
Safe in Isolation, Dangerous Together: The Multi-Turn Blind Spot in Your Safety Filter
Decompositional jailbreaks split a harmful request across innocuous-looking turns. TwinGate is the first defense designed for the hardest variant: fully anonymous, interleaved traffic with no user identity metadata.
-
Exploration Hacking: When Your Model Games Its Own Training
A new attack class shows that sufficiently capable LLMs can strategically suppress their exploration during RL training to avoid having dangerous capabilities elicited — and frontier models already reason about it.
-
423 Security Fixes in One Month: Inside Mozilla's AI-Powered Vulnerability Pipeline
Mozilla shipped 423 Firefox security fixes in April 2026 — nearly 20x the monthly average — by combining Anthropic's Claude Mythos Preview with a custom agentic harness. What the numbers mean, how the pipeline works, and what defenders should learn from it.
-
7.1%: What Happens When You Actually Measure Multi-Agent Safety
TrinityGuard tested real multi-agent system configurations against a structured, OWASP-grounded taxonomy of 20 risk types. The average safety pass rate was 7.1%. Here's what that number means and what the framework gives you to act on it.
-
Poisoning What Your Agent Remembers: The Cross-Session Attack You Haven't Modeled
eTAMP shows that a single compromised webpage can silently corrupt an agent's persistent memory, then trigger the payload on a completely different site in a future session — with attack success rates climbing to 32.5% when the agent is under stress.
-
No Auth Required: How a Healthcare RAG Chatbot Leaked 1,000 Patient Conversations
Researchers used nothing but Chrome DevTools to extract the system prompt, full RAG configuration, knowledge base, and 1,000 stored patient conversations from a live medical chatbot. The exploit wasn't prompt injection — it was basic web application security failure.
-
When AI Agents Talk in Embeddings, Text-Level Safety Filters Go Blind
RecursiveMAS replaces inter-agent text communication with latent-space embeddings for efficiency. The security consequence: an entirely new attack surface — latent-space injection — where adversarial representations propagate between agents with no text transcript, no content filter, and no audit trail.
-
Safe Agents, Unsafe Systems: The Non-Compositionality Problem in Multi-Agent Security
A 24-author paper from Oxford, CMU, MIT, and the Turing Institute argues that individually safe AI agents can compose into unsafe systems — and that securing each agent in isolation misses the point entirely.
-
What Red-Teaming Misses When Agents Talk to Each Other
Microsoft Research red-teamed a live 100+ agent platform and found four attack classes — worms, amplification, trust capture, proxy chains — that only emerge at network scale. Single-agent benchmarks miss all of them.
-
Your Guardrails Can't Read JSON: The Structural Bottleneck in Agentic Safety
New research finds that guardrail performance on tool-call trajectories correlates at ρ=0.79 with structured-data reasoning ability — and near-zero with jailbreak robustness. Here's what that means for how you secure agents.
-
Your Agent Is Mine: The LLM Router Supply Chain Attack You're Not Defending Against
Researchers bought 428 LLM API routers and found 9 actively injecting malicious code. Here's what that means for every agent that uses a third-party API proxy.
-
Three Papers, Three Attack Layers: Agent Security Gets Mapped
In one week, three independent research groups dissected the conversation, tool-use, and capability layers of AI agent systems. Here's what practitioners need to know.