Natural Language Is the JSON of AI Agents

Language strives for simplicity - until machines use it as plumbing. AI agents may work better when text moves to the boundary, not the core.

Human language is not simple. It is simple for humans.

That distinction matters. We often treat language as the most natural interface in the world because, for us, it is. We compress thoughts into words, rely on shared context, leave out almost everything, and still usually understand each other well enough. Language is lossy, ambiguous and full of shortcuts. It works because human minds are not isolated processors exchanging raw packets. We share bodies, environments, histories, metaphors, rituals, expectations and social cues.

In that sense, language is a brilliant evolutionary interface for minds that cannot directly share internal state.

But it is not obviously a good machine protocol.

This is the point that makes recent work on latent-space agent collaboration so interesting. Today’s AI agent systems often look like small bureaucracies. A planner writes a plan. A critic writes a critique. A researcher writes a report. A solver reads all of it and writes a final answer. Every intermediate step is serialized into natural language, decoded into tokens, passed around, re-embedded, and interpreted again by another model.

It is readable. It is convenient. It is also theatrical overhead.

RecursiveMAS highlights the oddity of this arrangement. If no human is reading the intermediate memos, why must agents communicate through full prose at all? Why should one model translate its internal state into English, only for another model to translate that English back into a hidden representation?

The machine is not thinking in English. English is the boundary layer.

This does not mean language is useless. Quite the opposite. Language is the most powerful general interface we have for human-facing AI systems. It lets us ask, inspect, guide, debug and correct. It is the reason large language models became widely usable rather than remaining exotic neural machinery. But there is a difference between a user interface and an internal bus.

Software history has already taught us this lesson.

For decades, serious program-to-program communication often meant compact, structured, machine-oriented formats: raw structs, shared memory, binary RPC, ASN.1, XDR, CORBA, ONC RPC, DCOM, and later Protocol Buffers, Avro and FlatBuffers. These systems were not built for casual human reading. They were built to move structured information efficiently and predictably between machines.

Then the web changed the trade-off.

Suddenly, readability mattered enormously. Developers wanted to inspect requests with curl. Firewalls liked HTTP. Logs became easier when payloads were text. Systems written in different languages needed loose coupling. XML and SOAP made enterprise integration verbose but inspectable. JSON later became the simpler winner: still text, still wasteful in a strict performance sense, but good enough, friendly enough, and universal enough to dominate.

We normalized strange rituals. Numbers became strings and then numbers again. Escaping rules became infrastructure. Encoding bugs became a fact of life. Systems parsed text not because text was the most efficient internal representation, but because human readability and interoperability were worth the cost.

Natural language in agent systems is following a similar path.

Text-based agent collaboration is the JSON phase of AI agents. It is flexible, inspectable and easy to prototype. You can read the planner’s output. You can inspect the critic’s complaint. You can log the tool call rationale. You can show a trace to a developer, a manager or a user. This is useful. Early systems need this kind of observability.

But it should not be confused with architectural destiny.

Once agent systems mature, the internal hot path will not necessarily remain human-readable. If two AI systems can exchange dense hidden states, embeddings or learned intermediate representations, forcing them to communicate through English may be like forcing two CPUs to shout instructions at each other in full sentences instead of exchanging cache lines.

That image is deliberately absurd. So is much of today’s agent chatter.

A model generates a paragraph. Another model consumes the paragraph. Then another model summarizes the paragraph. Then another critiques the summary. Each step looks intelligent because it resembles intellectual work as humans perform it. But under the hood, every textual handoff introduces latency, token cost and information loss. The system is constantly moving between dense internal representations and verbose symbolic surfaces.

Latent-space collaboration strips away part of that roundtrip. Instead of decode, tokenize, re-embed, infer, and decode again, agents can pass refined hidden states directly. They can recursively transform ideas without pretending that every intermediate state needs to be a memo.

This resembles a broader pattern in AI progress. Early machine-learning systems relied heavily on hand-engineered features. Then deep learning shifted the center of gravity toward learned representations. Chain-of-thought prompting made reasoning explicit, verbose and manipulable. Now we are beginning to see the opposite pressure: let models reason in compressed trajectories that need not always be verbalized.

That does not make explicit reasoning obsolete. It changes where it belongs.

Language should appear when humans need to participate. We need language at the prompt boundary, at the explanation boundary, at the audit boundary, at the debugging boundary. But inside the machine-to-machine path, natural language may often be a convenience layer that hardened into a habit.

The mature architecture is probably hybrid. Dense communication inside. Text at the edges. Something like gRPC with Protocol Buffers under the hood and JSON gateways where human developers need them. Or like a modern database engine: binary pages, indexes and execution plans internally; SQL and explain output externally.

This distinction also clarifies the main objection. If agents stop producing human-readable intermediate messages, we lose visibility. That is not a minor concern. Text traces are imperfect, but they are operationally valuable. They help developers understand why a system failed. They help users build trust. They give safety teams something to inspect.

So the answer is not to abolish language. The answer is to stop treating language as mandatory for all computation.

A system can expose summaries, probes, explanations, traces and diagnostic views without forcing every internal operation to be prose. We should not confuse observability with verbosity. A readable transcript is one kind of instrumentation. It is not the only kind.

This is where the phrase “language strives for simplicity” becomes interesting but incomplete. Human language often evolves toward economy: shorter expressions, shared assumptions, compressed references. But that simplicity is relative to human minds. For machines, natural language is not simple. It is ambiguous, redundant, culturally loaded and computationally expensive. It is a remarkable interface, but a poor default for high-frequency internal exchange.

The deeper principle is architectural Occam’s razor : do not add a human-shaped communication layer where the machine substrate does not need one.

RecursiveMAS matters because it makes this principle concrete. It suggests that multi-agent systems do not have to be little theater ensembles, with each agent stepping forward to deliver its monologue. They can become recursive computational structures, passing compact internal states and producing language only when language is useful.

The irony is that we have seen this movie before.

First we optimized for machines. Then we optimized for humans. Then, once systems became large enough, we rediscovered that machines still prefer machine-native representations on the hot path.

AI agents are now entering that same cycle. Natural language made them accessible. It made them debuggable. It made them imaginable.

But it may not be where their real collaboration belongs.

No comments yet