OpenAI released gpt-oss-120b and gpt-oss-20b, its first open-weight LLMs since GPT-2, engineered for local execution using MXFP4 quantization. Key architectural updates include Rotary Position Embedding (RoPE), SwiGLU activations, Mixture-of-Experts (MoE) with fewer, larger units, and Grouped Query Attention (GQA) with alternating Sliding Window Attention layers. Compared to Qwen3, gpt-oss features a wider architecture, attention bias units, and learned ‘attention sinks’ for better context handling. These models are specifically trained for controllable reasoning via system prompts, balancing inference cost and accuracy, a distinct approach from Qwen3’s dedicated reasoning variants. Despite a tendency to hallucinate, gpt-oss shows strong benchmark performance, comparable even to OpenAI’s proprietary GPT-5, signaling a significant advancement in powerful, locally deployable open-weight models.