The OpenAI platform has rapidly evolved in recent months, with new models, tools, and API endpoints reshaping how developers integrate language models into their applications. If you’re evaluating whether to use Chat Completions, Responses, or the now-beta Assistants API, you’re not alone.
This post aims to clarify the differences, explain the technical implications, and help you choose the right API interface for your use case.
Why this matters now
Timely and relevant: OpenAI’s platform is evolving fast. With the introduction of the Responses API (and the deprecation roadmap for the Assistants API), developers are making critical decisions about which endpoint will serve as a stable, future-proof foundation.
Comprehensive but actionable: Documentation is spread across changelogs, API references, and announcements. This post synthesizes that into a practical comparison.
Developer-focused: We’re going beyond marketing. This post highlights key technical differences — state management, tool execution, memory, customization, file handling — that directly impact your engineering implementation.
Migration insights: If you’ve built on the Assistants API or plan to use OpenAI’s tools (code interpreter, file search, web browsing), understanding the transition to the Responses API is crucial.
The quick summary
Here’s how the three API interfaces stack up:
Aspect | Chat Completions API (/v1/chat/completions) | Completions / Responses API (/v1/completions or new /v1/responses) | Assistants API (/v1/assistants) |
---|---|---|---|
Supported Models | Designed for GPT-3.5 and GPT-4 series chat models (e.g. gpt-3.5-turbo, gpt-4) . Latest models like GPT-4.5 are available via this endpoint . Does not support older instruct-only models (those use the legacy completions endpoint). | Supports any model that uses the chat completion interface (same models as Chat API, e.g. GPT-4, GPT-3.5, etc.) . Introduced as the “Responses API” – an evolution of the Completions API with chat-like structure . Backward-compatible with Chat Completions code and continues to support those chat models . (Older text-only models like text-davinci-003 remain on the legacy completions endpoint.) | Also uses GPT-3.5, GPT-4 and newer models under the hood . The Assistants API beta receives new model updates (including GPT-4.5 preview) until it’s sunset . It was introduced to let developers use chat models with additional capabilities (tools, memory) via persistent assistant instances. |
State Management | Stateless – no server-side memory. The conversation history must be managed by the developer and sent with each request as an array of messages . Each API call is one turn (prompt → completion). | Optional statefulness – by default can work like Chat API (stateless with messages list), but also supports built-in conversation memory. You can set “store”: true to have OpenAI store the conversation, and include a previous_response_id on the next call to continue the thread without resending full history . This simplifies managing multi-turn conversations. Also supports multi-turn within a single call when using tools (the model can take multiple steps internally before final answer) . | Stateful – conversation history is stored on the server as part of a Thread object. You create an Assistant and then start Threads under it; each thread persists message history automatically . No need to resend context; the API truncates older messages when context limit is reached . Each user query is a “Run” within a thread (one turn per API call). |
System Instructions | Accepts a system message in the prompt to guide model behavior (e.g. role/profile) . This must be included with each conversation (since no persistent memory). Max length depends on model context size (e.g. ~4k or 16k tokens). | Supports system instructions similarly (either via an initial message in the input list or a dedicated field). When using stateful mode, the system message can be stored once and retained in context for subsequent turns . (The API uses a unified message/item format for input.) In “raw” usage, you can also send a single prompt string via the input field for simplicity , but using structured messages with roles is supported. | Persistent system instructions – each Assistant can be created with a long system prompt/definition (up to ~256k characters) that persists across all threads . This allows extensive customization of the assistant’s behavior and knowledge base. Developers set it once when creating the assistant, rather than sending it every time. |
Tool Use – Function Calling | Yes (JSON function calling) – You can define functions for the model to call. Supported in GPT-4 and GPT-3.5 chat models (since June 2023) . The model can decide to return a JSON object to invoke a function, which the developer then executes and returns the result via another API call. All tool logic (function execution) is handled client-side. | Yes – Supports the same function calling mechanism for 3rd-party developer-defined tools. In addition, the Responses API was built to more seamlessly handle tool use. It can orchestrate multiple tool calls and model prompts in one request (the model can call a tool, get the result, and continue) . This reduces manual loop logic for the developer. Function calling requests to external APIs still need developer implementation (similar to Chat API). | Yes (enhanced) – Assistants API introduced an improved function calling interface . You can register tools, and the model can invoke them via function calls. However, execution of third-party functions is not handled by OpenAI’s servers – the developer receives the function call and must execute it . (In practice, this is similar to Chat API’s approach, just within the assistant/thread framework.) |
Tool Use – OpenAI Tools | None built-in. Chat API doesn’t provide native tools – only what you implement via function calling. (E.g. to do web search or code execution, the developer must integrate those capabilities themselves.) | Multiple built-in tools available. The Responses API can leverage OpenAI-hosted tools by simply listing them in the request. For example: Web Search for up-to-date information , File Search for retrieval on your uploaded documents , and Computer Use (a “browser”/OS sandbox for code execution or controlling a GUI) . These tools can be invoked by the model as needed, and the API handles their execution within a single response cycle. (Note: Built-in tools like search are available with certain models, e.g. GPT-4o/4o-mini support web search .) | OpenAI-hosted tools supported. The Assistants API (beta) introduced Code Interpreter and File Search as built-in tools the assistant can use directly . Code Interpreter lets the assistant run Python code in a sandbox (hosted by OpenAI) to solve math, analysis, or coding tasks . File Search provides semantic search over files you’ve uploaded (via an integrated vector store) . These tools are invoked by the model as needed during a conversation (OpenAI handles the actual code execution and file retrieval on their side). (Web Search was not explicitly in v1, but Assistants v2 added tools like web and computer use to align with Responses API .) |
Multi-Modal Inputs (Vision) | Limited support. The Chat Completions API originally only accepted text. As of 2024–2025, multi-modal models like GPT-4 Vision (e.g. GPT-4o “Omni”) can accept images, but support in the API is nascent. By May 2025, the newest model GPT-4o/GPT-4.5 could be used with image inputs via the API (e.g. sending an image alongside a user message), but this is only for models that specifically support vision. There’s no native image handling in older models. | Yes (built-in). The Responses API was designed for multimedia input. It treats inputs as a list of “items” which can include text or images. You can send images directly in a request (along with text) when using a vision-capable model like GPT-4o . The model can then analyze the image and respond with text. (Audio/voice input is expected to be supported in the future as well .) | Yes (via vision models). Assistants API supports image input if the underlying model does. For example, Assistants could use GPT-4o (vision) or GPT-4.5, allowing the user to upload an image to a thread for the model to analyze. The Assistants API v2 and the new Agents platform were evolving to handle multimodal data similar to Responses. (Voice input/output was not directly handled in v1; focus was on text and files.) |
Fine-Tuning & Custom Models | Supported for certain models. Developers can fine-tune base models like gpt-3.5-turbo (with system and example messages) and then use the fine-tuned model via this endpoint. Fine-tuned models are invoked by their custom model name. GPT-4 fine-tuning is expected but (as of May 2025) not widely available. The Chat API does not itself change – you just specify the fine-tuned model name and use it as normal. | Supported (with chat models). Since Responses uses the same model pool as Chat, any fine-tuned chat model can be used here as well. (The legacy completions endpoint supports fine-tuned GPT-3 models like Davinci.) There is no separate fine-tune mechanism unique to Responses – it relies on the underlying model’s fine-tuning capability. In practice, developers can fine-tune a GPT-3.5 model and then call it through /v1/responses. | Supported (implicitly). You cannot fine-tune via the Assistants API itself, but you can create an Assistant that uses a fine-tuned model as its base. For example, if you fine-tuned a model via the normal API, you could specify that model ID when configuring your assistant. Assistants also allow extensive system prompts (which cover many customization needs without full fine-tuning). Fine-tuning was less emphasized given the ability to store large instructions and documents in the assistant’s context. |
File Uploads & Retrieval | No direct integration. Developers can use the separate /v1/files endpoint to upload data, but the Chat API won’t automatically use it – you’d have to implement retrieval (e.g. embed and search your files, then feed relevant text into the prompt manually). The Chat API itself doesn’t manage files or long-term knowledge. | Integrated retrieval via File Search tool. You can upload files to OpenAI (creating a vector store of embeddings) and then use the file_search tool in a Responses request to have the model query those files . The Responses API manages chunking and injecting relevant file content into the conversation. File size limits are generous (e.g. up to 512 MB per file, 5M tokens, with pricing at $0.10/GB/day for storage) . This makes retrieval augmentation (RAG) much easier to implement than with Chat API. | Integrated file vectors. The Assistants API similarly lets an assistant access uploaded files via its File Search tool . Each Assistant (or each thread) can have an associated vector store of documents (up to 10k files, 100GB per project) . When the assistant needs information, it can use the file_search tool to retrieve chunks from those files (with default chunking parameters and limits managed by the API) . Developers upload files (PDFs, text, etc.) and the assistant can “remember” and search them on demand. |
Notable Limitations | – No built-in tools or memory – all multi-step reasoning or tool use must be handled by the client (multiple API calls, external function execution) . – Text-only (for most models) – no native handling of images or audio in prompt/response, except via specialized models and workarounds .- Developer overhead – you must manage conversation state and context truncation yourself, which can become complex as conversations grow. | – Requires chat models – cannot use older completion-only models (it’s meant as a superset of Chat Completions) . – Newer API – as a recently introduced interface (2025), some features are evolving (e.g. built-in Code Interpreter was planned but not immediately available at launch) . – Tool-specific model support – certain tools only work with particular models (for instance, web search is only enabled for GPT-4 variants, not GPT-3.5) . – Complexity – the Responses API is powerful, but introduces new concepts (like input items, stored responses, etc.). It’s largely backwards-compatible with Chat, however, and intended to eventually replace the Assistants beta . | – Beta platform – The Assistants API was (as of 2024–2025) a beta feature. It lacks the polish of the standardized APIs and will be deprecated by mid-2026 in favor of Responses . – Overhead – Using it requires managing multiple objects (Assistant, Thread, Runs) via different endpoints, which can be more complex to integrate than a single /responses call. – Availability – Some features were limited to certain accounts during beta. Also, any data stored (conversations, files) remains until deleted, which has implications for privacy (though OpenAI does not train on it by default) . |
What’s new?
The Responses API is OpenAI’s latest interface, designed as a superset of the Chat Completions API. It adds:
- Built-in tools like web search, file search, code execution sandbox (without requiring developer-implemented functions)
- Optional server-side memory: you can store conversation state so you don’t have to send the entire history every turn
- A more flexible input format for multi-modal data (text + images)
At the same time, OpenAI announced that the Assistants API (launched earlier as a beta platform for agents with tools and memory) will be sunset by mid-2026. The Responses API will inherit its features in a more streamlined, lower-overhead form.
How to choose?
Use Chat Completions API if:
- You want a simple, stateless interface
- You prefer full control over conversation history management
- You are not using OpenAI-hosted tools or server-side memory
- You already built an integration and want to stay compatible with GPT-3.5, GPT-4
This is the most stable, least opinionated interface. Perfect for lightweight bots, apps that need deterministic control over context, or any scenario where you don’t need server-side state or built-in tools.
Example Python snippet:
import openai
openai.api_key = "YOUR_API_KEY"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how can you help me?"}
]
)
print(response["choices"][0]["message"]["content"])
Use Responses API if:
- You want to leverage OpenAI-hosted tools (web search, code execution, file retrieval) without implementing them externally
- You want optional server-side memory to simplify multi-turn conversations
- You want vision input (images) alongside text
- You’re starting a new project and want to future-proof against platform evolution
The Responses API is OpenAI’s long-term direction. It’s powerful for agentic applications, assistants, RAG systems, and multi-modal chat.
Example Python snippet:
import openai
openai.api_key = "YOUR_API_KEY"
response = openai.Responses.create(
model="gpt-4o",
input=[
{"type": "text", "content": "What's the weather like in Berlin?"}
],
tools=["web_search"]
)
print(response["output"][0]["content"])
Use Assistants API only if:
- You’ve already built on Assistants API and need time to migrate
- You rely on its object model (Assistant → Thread → Run)
Otherwise, new projects should start directly on the Responses API, since OpenAI plans to consolidate features there.
Example Python snippet:
import openai
openai.api_key = "YOUR_API_KEY"
assistant = openai.Assistant.create(
name="My Assistant",
instructions="You are a friendly assistant."
)
thread = openai.Thread.create()
run = openai.Run.create(
assistant_id=assistant["id"],
thread_id=thread["id"],
input="Tell me a joke."
)
result = openai.Run.retrieve(run["id"])
print(result["output"]["content"])
Migration insights for Assistants API users
If you’re already using Assistants API, the transition path to Responses API will be crucial. Current differences include:
Feature | Assistants API | Responses API |
---|---|---|
Threads | Built-in | Optional memory |
System instructions | Stored per Assistant | Stored/stated per request |
Tool execution | OpenAI-hosted | OpenAI-hosted |
Function execution | Developer-executed | Developer-executed |
File retrieval | Built-in File Search | Built-in File Search |
Expect OpenAI to provide migration utilities to translate Assistants → Responses in 2025–2026.
Practical recommendations
- New projects → Use Responses API for future-proofing, built-in tools, and flexible input.
- Existing Chat Completions users → Stay or migrate to Responses API if you want tools or server-side memory.
- Assistants API users → Plan for migration by 2026.
If your project requires vision, code execution, retrieval, or OpenAI-hosted tools, Responses API is the best entry point today.
If you need a simple, stateless, text-only model call: Chat Completions API remains solid and well-supported.
Final thoughts
In a platform that’s adding powerful tools but also deprecating some paths, clarity is key. This comparison aims to save developers time, reduce architectural mistakes, and align implementation choices with OpenAI’s roadmap.
Feel free to share this guide with your team or peers.
Last updated: May 2025