
On August 5, 2025, OpenAI released two open-weight models, gpt-oss-120B and gpt-oss-20B, marking its first open-source release since GPT-2. These models are optimized for NVIDIA’s Blackwell GB200 NVL72 rack, delivering up to 1.5 million tokens per second. They outperform similar models on benchmarks like AIME 2025 (98.7%) and GPQA (90%). Made using Mixture-of-Experts and Chain-of-Thought technologies, they can achieve the best performance in the most efficient manner of deployment. They were published by OpenAI with the Apache 2.0 license, making them accessible to many. Their release turns OpenAI into a company that no longer only leads in proprietary LLMs but also leads in democratizing AI with powerful, readily accessible, open models.
High-Speed AI Models Designed for Powerful Reasoning Workloads
The gpt-oss-120B and gpt-oss-20B models were built for reasoning-intensive tasks. Leveraging Mixture-of-Experts (MoE) and Chain-of-Thought (CoT) reasoning. To elevate performance across benchmarks. GPT-oss-120B boasts 117 billion parameters, with 5.1 billion active at any time. While GPT-oss-20 B uses 21 billion parameters and 3.6 billion active. Both models are optimized for NVIDIA’s Blackwell GB200 NVL72, where they hit 1.5 million tokens per second using 4-bit NVFP4 precision, a new standard for high-efficiency inference.
On benchmarks, the models shine: gpt-oss-120B hits 98.7% on AIME 2025 and 90% on GPQA, outperforming similar open models. It also exceeds GPT-4o in healthcare-specific tasks, suggesting real-world viability. The 120B model fits on a single 80GB GPU, and the 20B runs on just 16GB of memory, making both accessible for research labs and local inference.
This match in performance against the proprietary models of OpenAI. Such as o3-mini and o4-mini, is considered a move towards open competitiveness. These models not only serve as an experimental tool but are viable and production-ready options. Due to the use of fast inference, flexibility of deployment, and unprecedented reasoning accuracy, they are interesting to develop.
Broad Compatibility Enables Democratized AI Access at Scale
Released under the permissive Apache 2.0 license, OpenAI’s new models are built for maximum accessibility. Developers can run them on Hugging Face, Ollama, llama.cpp, and FlashInfer with support across CUDA, vLLM, and TensorRT-LLM libraries. They’re also compatible with Apple Metal, Windows ONNX, and AMD’s new Ryzen AI Max+ 395, which runs gpt-oss-120B at 30 tokens per second locally.
The models are optimized for diverse environments, consumer laptops, data centers, and enterprise-grade inference stacks, bridging accessibility and performance. According to OpenAI, over 6.5 million developers in more than 250 countries are now able to experiment, fine-tune, or deploy these models commercially. AMD and NVIDIA support underscores the hardware-software synergy needed to move AI forward.
These tools are not just for tech giants; indie developers and startups now have access to reasoning models that rival top-tier commercial offerings. Whether for AI agents, coding assistants, healthcare tools, or education platforms, these models support rapid innovation without vendor lock-in. With full documentation, open weights, and platform guides, OpenAI’s approach could shift the power dynamic in AI from centralized, gated ecosystems to community-driven, open collaboration. It’s not just a model release; it’s a redistribution of capabilities.
A Defining Step Toward Global, Equitable AI Innovation
OpenAI’s gpt-oss-120B and gpt-oss-20B are more than models; they’re a strategy to make powerful AI available to everyone. Their release revives the open-source ethos in an era dominated by closed systems. With strong performance, efficient deployment, and broad compatibility, they empower developers across every region and budget. The NVIDIA and AMD support highlights a growing consensus: AI’s future isn’t just about speed or scale; it’s about access. As competition rises, OpenAI’s move may trigger a wave of open releases and usher in a new phase of innovation built not in secret, but in the open, with everyone at the table.