DeepSeek V4 launches as the largest open-source AI — at a fifth of GPT-5’s cost

DeepSeek has released the preview of V4-Pro and V4-Flash, a pair of open-weights language models that together stake the lab’s biggest bet yet — that million-token context is no longer a capability problem, only an efficiency one. V4-Pro packs 1.6 trillion total parameters with 49 billion activated per query, enough to process an entire codebase or a book-length document in a single prompt. Open-source AI has finally closed most of the gap to the frontier closed models on math, coding, and agentic tasks — at a fraction of the cost.

Both models ship under an MIT license with weights already live on Hugging Face. V4-Flash is the efficient sibling at 284 billion total parameters and 13 billion active, small enough that a quantized version may fit on a high-end laptop. V4-Pro is the flagship, at 865 gigabytes on disk, aimed at cloud deployment and research labs. Both share the same one-million-token context window — a leap that matches Google’s Gemini and doubles what most competing open models offer.

The architectural move is a technique DeepSeek calls Hybrid Attention, which combines two compression methods to cut memory costs so aggressively that V4-Pro uses 27% of the compute and 10% of the cache that V3.2 needed at the same context length. V4-Flash pushes that further still. In practical terms: serving a million-token prompt with V4-Pro is now cheaper than serving a 100,000-token one with the previous generation.

The price disruption is where this release lands hardest. V4-Flash is priced at $0.14 per million input tokens, undercutting even OpenAI’s GPT-5.4 Nano. V4-Pro is $1.74 per million input and $3.48 per million output — a third of what Anthropic charges for Claude Opus 4.7 and a fifth of what OpenAI charges for GPT-5.5. On coding benchmarks V4-Pro reaches a Codeforces rating of 3,206, which DeepSeek says would place it 23rd among human competitive programmers.

The geopolitical subtext matters as much as the benchmarks. DeepSeek optimized V4 to run on Huawei’s Ascend 950 chips and on silicon from Chinese AI specialist Cambricon, and declined to give Nvidia or AMD early access for tuning — a reversal of standard industry practice. The release is a commercial stress test for China’s domestic AI hardware stack, which has been operating under US export restrictions for years.

There are real caveats. V4 is a preview release, not a production version, and independent third-party benchmarking has not been completed. DeepSeek’s own report concedes the model trails GPT-5.4 and Gemini 3.1 Pro by roughly three to six months on frontier capability. Its predecessor R1 was banned or restricted in multiple US states, Australia, Taiwan, South Korea, Denmark, and Italy within weeks of launch — V4 faces the same regulatory exposure in those markets. US defense contractors are prohibited from using DeepSeek under the 2026 NDAA unless the Pentagon grants a waiver.

Access is open now for anyone outside those restricted zones. DeepSeek’s web chatbot exposes V4-Pro via Expert Mode and V4-Flash via Instant Mode at no charge, and developers can call the API by setting the model name to deepseek-v4-pro or deepseek-v4-flash.

The release shipped exactly one year after DeepSeek-R1 rattled global AI markets on January 20, 2025 — the timing is deliberate. Final API pricing outside the preview window is still pending, and the older deepseek-chat and deepseek-reasoner endpoints retire on July 24, 2026, when all traffic routes to V4.

Tags: artificial intelligence, China