Sam Altman and Alan Touring

An LLM Made of Redstone Bricks: What CraftGPT Really Teaches Us

by

in ,

A few times a decade, someone takes an idea that sounds like a joke and executes it with surgical patience. CraftGPT is one of those moments: a small language model that runs inside Minecraft, wired up from Redstone like a cathedral of logic gates. The project comes from sammyuri, who released the world and code on GitHub. It’s trained on a tiny conversational dataset, runs inference in-game, and—on ordinary, “vanilla” tick rates—would need roughly a decade to answer you once. With the special high-performance Redstone server MCHPRS, the reply still takes hours. That’s not a criticism; it’s the point. 

CraftGPT packs 5,087,280 parameters into a Redstone build measuring 1,020 × 260 × 1,656 blocks—about 439 million blocks in total. It runs with a 64-token context window, and the weights are heavily quantized (mostly 8-bit; embeddings 18-bit; LayerNorm 24-bit). On vanilla timing it would take years per reply; with MCHPRS it still needs hours. It’s a staggeringly physical way to render what we usually hide behind a neat pip install. 

Below are the lessons I think are worth keeping—beyond the immediate delight and the inevitable “someone actually did it!” grin.

Five takeaways from building a language model with pickaxes

  1. Compute is geometry. In textbooks, a transformer layer is a handful of matrix multiplies and non-linearities; in CraftGPT, it’s corridors, counters, and timed torches. When you spatialize linear algebra, you feel the cost of every multiply–accumulate. The result is a visceral demonstration of why scale matters, and why “constant factors” are not a footnote: Redstone’s update physics turn a single token into an expedition, even with MCHPRS accelerating the tick rate. 
  2. Inference is not training—and that’s illuminating. CraftGPT’s weights are trained in Python, then inference is enacted in Redstone. Seeing the pipeline split reminds us: no magic lives between tokenizer and logits; there’s just deterministic plumbing and a bit of RNG. (Yes, CraftGPT exposes an RNG seed input.) The romance of “emergence” is replaced by the humbler clarity of clocked circuits. 
  3. LLMs are powerful, not mystical. The README is pleasingly candid: the model often derails, the grammar’s wobbly, the context window is tiny. That frankness is an antidote to mythmaking. If you can walk through an attention head, you stop treating it like a shaman’s hut. (And if you want usable response times, you don’t run it on Redstone.) 
  4. Systems thinking beats slogans. CraftGPT needs memory (world load of 32–64 GB), clock discipline (server tick tuning), and painstaking reset logic. Put differently: the surrounding engineering dominates the “model” conversation. That parallels real deployments where orchestration, batching, caching, and observability move the needle more than another exotic activation function. 
  5. Pedagogy through ridiculousness works. Try explaining quantization, context limits, or attention wiring on a whiteboard. Now imagine you can physically follow a signal as it ripples through an 8-bit weight, or watch a progress counter step per token. Absurd? Yes—and strangely effective. 

A short, opinionated history of Redstone computing

Redstone computers have been part of the culture almost since the ore first glittered. Early on, players built ALUs and memory out of torches and dust—Wired covered a 16-bit arithmetic unit back in 2010. Communities like Open Redstone Engineers turned this tinkering into curricula. And the projects kept escalating: sammyuri’s CHUNGUS II brought “Minecraft in Minecraft,” while other builders got DOOM running on in-game CPUs like IRIS. CraftGPT is the natural, slightly deranged next chapter: “okay, now do an LLM.” 

“When will the LLM Turing machine arrive?”

If by that you mean “when will a language model be a general computer?” then the honest answer is: it depends on your definitions and your patience.

  • Minecraft Redstone is widely regarded as Turing-complete—you can, in principle, build machines that compute anything computable. (You’ll run out of time and chunks first.) 
  • Neural networks: classical results show certain recurrent neural nets are Turing-complete under idealized conditions (unbounded precision/time). Whether transformers meet the bar depends on assumptions: some proofs achieve completeness with tweaks (hard attention, unbounded context, or other idealizations), while other work argues vanilla, fixed-precision transformers are not Turing-complete. The debate is lively and technical. 

So the “LLM Turing machine” is both here and not here: theoretically close under certain formalisms, practically bounded by precision, context, and compute. CraftGPT dramatizes those bounds in the most literal way possible—by making you walk past them at block scale. 

Why this matters (beyond the spectacle)

  • Interpretability by construction. When a weight is a physical structure, your intuitions sharpen. That doesn’t make modern frontier models simple—but it teaches the right habit: treat the network as a mechanism you can reason about, optimize, and constrain. 
  • Cost and latency are first-class citizens. After two hours per answer, “fast enough” will never sound like fluff in a design review again. 
  • Respect the substrate. GPUs, TPUs, FPGAs—or Redstone—aren’t interchangeable veneers. They shape what’s feasible, cheap, and observable. CraftGPT is a love letter to substrates. 

Closing note

Tom’s Hardware reported the headline facts—parameter counts, size, and the heroic time-to-first-token. Fair enough. But to me, CraftGPT isn’t a stunt so much as a thought experiment with a power cord: a reminder that language models are made of things—counters, adders, encoders—no matter how cleverly we package them. Human ingenuity, it turns out, scales from CUDA kernels to dust lines and torches; what doesn’t scale is our patience for slow clocks. 

Links & resources

PS: If anyone manages to make CraftGPT print its own blueprints faster than it can answer “hello,” buy them lunch.