Cybersecurity

A $1,000 AI agent found 21 zero-days in FFmpeg, some 23 years old

Adrian Kessler

An autonomous AI agent read roughly 1.5 million lines of FFmpeg’s C source and came back with 21 working zero-day vulnerabilities, each one paired with a reproducible input that triggers it. FFmpeg is the engine that decodes video and audio inside web browsers, media players, phones and smart TVs, so a hole in it is a hole almost everywhere.

For anyone who has ever opened a video link, that is the part that matters. FFmpeg almost never appears on screen, but it runs underneath VLC, Chrome, countless Android apps and the back ends that process uploads on the largest platforms. A bug in one of its parsers can, in principle, be reached by a single malicious file: a clip, a stream, a subtitle track crafted to crash the program or run code on whatever device is decoding it.

The agent comes from DepthFirst AI, a security startup that built a system to hunt memory-safety bugs without a human reading the code first. According to the company, the full run cost roughly $1,000, a figure it pointedly frames as about 10 percent of what Anthropic spent when its Claude Mythos model swept major software for vulnerabilities earlier this year. The claim under the price tag is the real story. Finding genuine, exploitable bugs in critical infrastructure is becoming cheap enough to do almost on a whim.

The 21 findings are mostly the classic wounds of old C code: heap and stack buffer overflows, integer overflows and underflows. They sit in the parts of FFmpeg that ingest untrusted data, including the MPEG-TS demuxer, the VP9 decoder, several RTP depacketizers, the swscale scaler and the DASH and AVI demuxers. Those are exactly the components that touch a file or a network stream before anything else does.

One of the flaws had been sitting in the codebase since 2003. A stack overflow tied to a service-description table, now tracked as CVE-2026-39214, went unnoticed for 23 years through countless code reviews and audits. DepthFirst’s first batch of identifiers runs from CVE-2026-39210 to CVE-2026-39218, with the remaining issues fixed but not yet numbered. That a machine surfaced in days what two decades of human eyes missed is the uncomfortable headline for the security profession.

The FFmpeg haul landed the same week Google shipped Chrome 149, which patched a record 429 vulnerabilities in a single release. More than 100 were rated critical or high, most of them use-after-free bugs and cases where the browser trusted input it should have checked. The worst, CVE-2026-10881, is an out-of-bounds read and write in Chrome’s ANGLE graphics layer with a severity score of 9.6 out of 10. A crafted web page could use it to break out of the browser’s sandbox and run code on the machine, and Google paid the researcher who reported it $97,000.

Two numbers, 21 and 429, tell the same story from opposite ends. Vulnerability research is industrialising. Whether the finder is an AI agent or a well-funded bug-bounty program, the volume of discovered flaws is climbing far faster than the number of people available to fix them.

That volume is also where the hype meets reality. AI bug-hunting has a false-positive problem, because a model can confidently describe a vulnerability that does not actually exist, or one no attacker could ever trigger. When Anthropic announced that Claude Mythos had found thousands of zero-days across major operating systems and browsers, critics noted that the headline number rested on a far smaller set of manually reviewed cases, and read the announcement as much as a sales pitch as a research result. DepthFirst says its agent is built to avoid exactly this, with guardrails that stop it from inventing the conditions a bug needs and a requirement that every finding arrive with input that provably reaches the flaw. The reproducible proof of concept is what separates a real report from noise.

Even verified bugs create a problem, though. FFmpeg is maintained largely by volunteers, and a sudden flood of machine-generated reports, however accurate, moves the bottleneck from finding flaws to triaging and patching them. The cost of discovery is collapsing while the cost of the human response is not. A tool that can produce 21 valid bugs for $1,000 can also produce them faster than a small team can responsibly absorb.

For now the FFmpeg flaws are fixed in the project’s source, with the outstanding CVE numbers still being assigned, and Chrome 149 is rolling out to users automatically over the coming days. DepthFirst has signalled that FFmpeg was a demonstration rather than an endpoint, and that other widely used open-source libraries are next in line for the same treatment. The next time an AI agent reads a million lines of code that quietly runs on billions of devices, the only real question is how quickly the humans on the other side can keep up.

Tags: , , , , ,

Discussion

There are 0 comments.