Chromatron recompiled from WinXP and PowerPC binaries to Apple Silicon

Piotr Migdał

I recompiled an old game from WinXP and PowerPC binaries to Apple Silicon and WASM. And you can too!

You can download and play Chromatron here.

Let’s revive an old game!

A colleague used Ghidra to add infinite lives in River Ride, a classic Atari game.

I got inspired to give Ghidra (an open-source decompiler by the NSA) a try. Flipping one assembly instruction is one thing. But recovering an entire game is another. I had no prior experience with working with binary formats, let alone decompilation - I thought that xxd was a very smiley face.

I wanted to start with something small, so there would be a reasonable chance that it works. I went with Chromatron - a puzzle game about mirrors and lasers. I like its simplicity. No story, just puzzles. And it gets challenging fast!

Chromatron Level 3

A game that I played when doing an internship in quantum optics, and which (along with The Incredible Machine) was my motivation for the Quantum Game with Photons.

Ten years later, I wanted to replay it. Yet, it was only accessible for PowerPC, an Apple architecture discontinued in 2006, prior to the Intel era. And now, we are already 6 years into Apple Silicon.

Here is my journey with some lessons learnt.

Claude Code and Opus 4.5

First, I started with Claude Code and Opus 4.5. Downloaded binaries, connected to Ghidra. Then, asked to decompile it, and recompile it to Apple Silicon. I protested at suggestions that it would be easier to use an emulator.

First success with Ghidra MCP

I saw not only that it generates some Assembly and C code, but actually that Claude Code writes something about elements! This time I used GhidrAssistMCP.

Something works. So, does it just see a few elements, or is it able to recreate the game? After a few back and forths, the first things started appearing. From the darkness, a few elements emerged. Even at this point, it required a lot of hand-holding. It took some iterations for each piece (board, toolbox, levels, etc.). It took time to actually decode assets.

Yet, it was far from perfect. Many details were way off, and Claude was stubborn about fixing them. Much worse - it often invented things, like different fonts and texts. When asked if they were in the decompiled code, it gleefully said they were not.

Even with beams, it was a bit of a step-by-step process:

Second approach

So, I took a different approach. This time using the m2c decompiler to turn PowerPC machine code into C. Maybe this approach would be better - first generate the code, then fix it.

With the second approach, it got the positions of things right on the first try. But it had trouble decoding assets.

I tried a few times, but ultimately abandoned this approach.

Cursor and GPT-5.2-Codex

I got excited by the news that GPT-5.2-Codex was able to create a browser. While there was some skepticism around that, I decided to give it a go. This time I went with Ghidra, but misconfigured something, so it actually started using the built-in headless mode - which worked better.

It took minimal feedback during 1-2h long sessions. It took some time, but it didn’t pester me with questions constantly, and I couldn’t believe it when I saw it working.

Some things were broken (like the laser beam), but it didn’t take many prompts to fix them - only one or two, not an endless cycle like with the previous model. Wow, it was actually a playable game. Some rough edges here and there.

Pixelwise comparison with screenshots

I implemented a way to compare results with screenshots. LLMs by themselves were very eager to say that something is the same. Only with pixelwise comparison, they had real feedback.

And, step by step, it worked. Using Gemini 3 Pro externally for consulting images helped, but was not enough.

A month break

I had almost a month break in the project - primarily for the writeup and charts for BinaryAudit, a recent project on using Ghidra for detecting backdoors in binaries.

A month in human time is ages for AI time. In the meantime, two models came out - Opus 4.6 and GPT-5.3-Codex. Both can be used with a lot of success. I first used GPT-5.3-Codex to slightly polish the C++ version.

Then I decided - why not start from scratch - this time with Opus 4.6. Since the previous attempt didn’t work, I wondered if this one would. I picked Rust this time.

Aaaand - this time even the first approach worked.

Maybe it helped having some experience with the tech stack. Or because I used Rust. Or because of PyGhidra. But most likely the primary reason is the models themselves.

Rough font edges

Reviving most of the game was simpler than expected. Yet, there is this uncanny valley of “almost done”.

A sane approach would be to stop here. Just some minor positioning and font selection differences are visible, and only to people who know the original version.

Anyway, very likely the font was different on the WinXP version (the one I know) and the PowerPC version (the one I didn’t). Using a standard font would be fair game… if it weren’t for my obsession.

Models (all of them) wanted to persuade me it is not worth fighting.

Pixelwise comparison with screenshots

Turns out that it was VGASYS all along.

So, here we go:

Chromatron Apple Silicon

Actually run: online

Summary of approaches taken

DatesLanguageRenderingGhidra AccessAI AgentAI ModelResultPlay
Jan 14–15CRaylibGhidraAssistMCPClaude CodeOpus 4.5Almost works!
Jan 14–21CSDL2Headless + m2cClaude CodeOpus 4.5Did something, stuck.
Jan 15–Feb 16C++SDL2Headless CLI (custom Java scripts)CursorGPT-5.2-Codex → GPT-5.3-CodexFirst working result.online
Feb 15–18RustSDL2 -> winit + softbufferPyGhidra APIClaude CodeOpus 4.6Best overall.online

Lessons learnt

New models, new possibilities. Sometimes new models make not just things slightly better, but open new routes.

Also, the stronger the model, the less hand-holding it needs. One model provides little help, another - a bit, yet another - basically does it end-to-end.

Confabulations are worse than a lack of an answer.

References are gold. Both for you (memory can be faulty) and for the AI.

Models excel at code. Model visual capabilities are lacking. If there are visible differences (e.g. an element is RED, but should be BLACK), a model will gleefully say that there are no differences, or that there are.

SDL3 is the way to go. Or SDL2 if bindings for your language are less mature.

It matters less what the target language is. Decompiled code is pseudo-C, so one may think that C is the best target. For a human - likely. For a machine, just pick the best. My go-to here is Rust, as it is still low-level, but more concise and safer, for humans and machines alike. And with good tooling to build to other systems.

The main question is defining your goals and non-goals. For example, it makes a difference if you want a pixel-perfect recompilation, a remaster, or a remake.

Some tricky parts go easily (e.g. assets in obscure format).

Some small things are surprisingly hard. I spent most of the time on trying to make the font the same. I am not sure anymore if it is the f(r)ont I want to die on.

Don’t be afraid to start over. One of the common pitfalls is the sunk cost fallacy - you already picked a framework, started, so you want to polish. But with AI sometimes it is better to start over, knowing what you know. Think of that as a roguelike game, in which you die, then start over - but richer by knowledge.

One of the bad things about pixel-perfect clones is that you notice many things that could be improved - both things that were bugs in the original game and things that could simply be done better.

A word of caution

If you deal with decompilation, be aware that AI guardrails can sometimes get in the way. Passing disassembled code to an LLM might get your account flagged (as happened to a friend) or shadow-redirected. AI labs actively try to prevent their models from being used to crack software or hack accounts, and reverse-engineering tasks can occasionally trigger these safety filters.

Their precautions are understandable, but it is definitely something to keep in mind when you embark on your own reverse-engineering adventures.

Perspective

It is just 40kB. With bigger code it might be harder, but people do that. With even newer models, which are better at such tasks, it will be more reasonable.

For example, the original Supaplex is 287 kB, while a remaster on Steam is around 200 MB. For some reason, it feels… bad. I mean, we all are used to huge files, wrapped in Docker or Electron (FYI: I prefer smaller Tauri 2). But now, we can go beyond - and natively port to another system. No longer a many-month project for a seasoned reverse engineer.

A lot of things happen here. Christopher Ehrlich ports SimCity from C into TypeScript with 5.3-Codex. See also a wonderful writeup Resurrecting Crimsonland by banteg on decompiling a 2003 top-down shooter, with a very clear goal to be faithful:

the goal is a complete rewrite that matches the original windows binary behavior exactly. if the original has a bug, the rewrite has the same bug. if there’s a texture that’s one pixel too small (there is), i replicate that too. the executable is the spec, and we’re writing the spec back into source code.

Two weeks ago there was a thread Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering. Or see The Long Tail of LLM-Assisted Decompilation.

In the meantime, you can play my recompiled Chromatron online!

Which game (or maybe other piece of software) would you like to bring back to life?

Stay tuned for future posts and releases

Subscribe via RSS

Related Articles

Continue exploring similar topics