Comparison of OpenAI Language Models (May 2025)

by

in

The number of different models available from OpenAI has grown rapidly, and their naming conventions—well—are anything but intuitive. Between versions like GPT-4, GPT-4o, GPT-4.1, and their various “mini” and “turbo” siblings, it’s easy to lose track of what each model actually offers. This comparison aims to bring clarity to the current landscape as of May 2025.

Each model is compared in terms of release date, model size, training data scope, pricing, multimodal capabilities, performance/latency characteristics (including context window), availability in ChatGPT, fine-tuning support, and notable benchmark performance. The table below summarizes the key details:

Model (Release Date)Model SizeTraining Data & CutoffAPI Pricing (Prompt / Completion)Multimodal CapabilitiesContext & PerformanceChatGPT AvailabilityFine-tuningNotable Benchmarks
GPT‑3 (Jun 11, 2020)175 billion params~300B tokens from web, books, etc.; knowledge cutoff Oct 2020$0.0200 per 1K tokens (combined)Text only (no images or audio)2K-token context ; fast inference for its sizeN/A (pre-ChatGPT; API only)Yes – base GPT-3 models fine-tunable via APIMMLU ~43% (few-shot); no code specialty (e.g. basic code generation only)
GPT‑3.5 (Nov 2022)~175B (improved GPT-3)Trained on expanded dataset, cutoff ~Jan 2022$0.0015 / $0.002 per 1K tokens (4K ctx) ; 16K ctx: $0.003 / $0.004Text only (no vision or audio)4K context (16K opt.) ; low latency (optimized for chat)Yes – powers ChatGPT Free (initial release) Yes – fine-tuning available (since 2023)MMLU 70.0% ; GSM8K math 57.1% ; HumanEval (code) ~48% (pass@1)
GPT‑4 (Mar 14, 2023)Not disclosed (rumored ~1T params)Trained on a broad corpus (text + code); cutoff ~2021–22 (estimated)$0.03 / $0.06 per 1K tokens (8K ctx) ; 32K ctx: $0.06 / $0.12Multimodal: Text+Image (GPT-4V)8K-token context (32K variant) ; slower inference (vs 3.5) due to model sizeYes – ChatGPT Plus (March 2023)No (not initially; fine-tuning introduced later via GPT-4o)MMLU 86.4% ; GSM8K 92.0% ; Bar exam ~90th percentile ; HumanEval ~87% (with few-shot)
GPT‑4o (“GPT-4 Omni”, May 2024)Not disclosedEnhanced GPT-4 with refreshed data; cutoff Oct 2023 (with live web access)$0.01 / $0.03 per 1K tokens (128K ctx) (optimized pricing)Multimodal: Text, Image, Audio inputs 128K-token context ; optimized for higher speed (near human response time)Yes – used in ChatGPT (Plus, late 2024)Yes – fine-tuning available (Aug 2024)MMLU 88.7% (outperforms original GPT-4); excels in multi-lingual tasks ; HumanEval ~87.8% (on par with GPT-4)
GPT‑4o Mini (Jul 2024)Not disclosed (smaller variant of 4o)Trained similarly to GPT-4o (text, images, audio); cutoff Oct 2023$0.0030 / $0.0090 per 1K tokens (128K ctx)Multimodal: Text, Image, Audio (same as GPT-4o)128K context; faster and cheaper (aimed to replace GPT-3.5)Possibly (used internally, offered via API; targeted for cost-efficient apps)Yes – fine-tuning available MMLU 82.0% ; HumanEval 75.6% – strong for its size, but below full GPT-4/GPT-4o
GPT‑4.5 (Preview Feb 2025)Not disclosedInterim model (GPT-4 series); knowledge up to mid-2024Similar to GPT-4 pricing (preview access) (to be deprecated)Multimodal: Text+Image (vision capable) 128K context (like 4o); improved accuracy but not fully optimizedNo – API preview only (to be replaced by GPT-4.1)No – fine-tuning not offered (short-lived preview)MMLU ~90.8% ; code benchmark (SWE) 38% (trailed GPT-4.1 by ~27% )
GPT‑4.1 (Apr 14, 2025)Not disclosedLatest GPT-4-series, focus on coding; knowledge cutoff Jun 2024$0.0020 / $0.0080 per 1K tokens (1M ctx) – 5× cheaper than GPT-4oMultimodal: Text+Image (supported, though focus is text/code) 1,048,576‑token context (1M tokens); reduced latency despite higher complexityNo – API-only (not in ChatGPT UI yet) Planned (fine-tuning not yet available as of May 2025)SWE-Bench coding 54.6% (state-of-art) ; MMLU 90.2% ; follows instructions far better (e.g. MultiChallenge 38.3% vs 27.8% for 4o) ; handles long-context tasks (Video-MME 72% vs 65% for 4o)
GPT‑4.1 Mini (Apr 2025)Not disclosed (~Intermediate size)Same training as GPT-4.1; cutoff Jun 2024$0.0004 / $0.0016 per 1K tokensText+Image (like GPT-4.1)1M-token context; latency optimized (mid-tier speed/cost)No – API only (intended for faster, cheaper queries)TBD (fine-tuning expected; not available at launch)MMLU 87.5% ; coding benchmark 23.6% (much lower than full GPT-4.1, but higher than GPT-4o mini)
GPT‑4.1 Nano (Apr 2025)Not disclosed (smallest 4.1 model)Same training as GPT-4.1; cutoff Jun 2024$0.0001 / $0.0004 per 1K tokensText+Image (like GPT-4.1)1M-token context; fastest GPT-4.x model (low latency)No – API only (for ultra-low-cost, real-time tasks)TBD (fine-tuning likely in future)MMLU 80.1% ; coding benchmark ~10% (very limited coding ability, trades off accuracy for speed/cost)

Footnotes: GPT-4 “Turbo” refers to a variant of GPT-4 optimized for speed/cost (essentially GPT-4o in Azure deployment). All pricing is for the OpenAI API (as of 2025) and is listed per 1K tokens (prompt vs completion); ChatGPT user-facing models do not charge per token but correspond to the same underlying models. Fine-tuning support “planned” indicates OpenAI’s stated intent to allow fine-tuning (e.g. GPT-4.1) once the model is stable. Benchmarks: MMLU = Massive Multitask Language Understanding; GSM8K = Grade School Math; HumanEval = coding problems. GPT-4 and newer models generally outperform older ones on these benchmarks, with GPT-4.1 setting new records in coding and long-context understanding. GPT-4 and GPT-4o have achieved parity or better on many academic and professional exams (for example, GPT-4 ranks in the top 10% on the bar exam vs GPT-3.5 in bottom 10% ). GPT-4o’s “Omni” multimodal architecture enabled integrated vision and audio processing , and GPT-4.1 further builds on this foundation with extreme context length and improved alignment for complex instructions. All models above (except GPT-4.5 preview) are generally available via OpenAI’s API, with availability in ChatGPT noted where applicable.