Starch Development

Rebuilding PotatoQuest with AI — what changed between 2018 and 2026

27 May 2026 · Harry Mitchell

PotatoQuest started in 2018 as a side project in Java + LibGDX. I shelved it. In 2026 I picked it up again — but instead of porting the old code I started from a blank Flutter project and rebuilt the entire thing using AI coding agents. Every line of Dart in the current build was written by an AI. Every accessory, every potato skin (except the default body), every background was generated by an AI image model and then cleaned up by hand.

This post is the side-by-side: what the game looked like in 2018, what it looks like now, and the workflow that closed the gap.

The backgrounds

Drag the slider. Left is the 2018 build, hand-drawn in a single afternoon. Right is the 2026 version — same composition, AI-refined, then layered into a three-plane parallax (back / middle / front) so the camera actually has depth. The "after" images here are the real composites: every parallax layer stacked exactly as the game renders them.

Default scene — 2026Default scene — 201820182026

The starry-night mountains default. Same horizon, but the new one breathes — distant peaks on the back layer scroll slowly, a mid-range ridge in the middle, the nearest crags in front. None of that existed in the original; the 2018 build had one flat plane.

City scene — 2026City scene — 201820182026

City scene. The 2018 version was honest about its budget — flat blocks, two colours. The 2026 one was generated from a short prompt describing "the original PotatoQuest city skyline, refined, neon dusk" and then nudged through three rounds of img2img until it sat next to the originals without feeling out of place.

Desert scene — 2026Desert scene — 201820182026

Desert. The dune silhouettes survived; everything else got rewritten in light.

All of the code was written by AI

There is no hand-written Dart in this repo. Every screen, every Flame component, every Supabase service was produced by an AI coding agent — mostly Claude Code, with bursts of GPT for one-off scripts.

What that actually looked like day to day:

  • Specs first, code second. I'd write a tight description of the next feature ("Bracket Battle payout: coins equal to tier number, 1 at Spud Sprout through 7 at Spud Royale; sample the opponent ghost from recent runs in the same tier") and let the agent draft the change. Reviewing diffs took longer than describing them.
  • Tests as guardrails, not theatre. The agent was much better at writing the code than at writing meaningful tests. I leaned on hot-reload + manual play sessions to catch regressions, and reserved test cases for the genuinely tricky maths (replay determinism, rig coordinates, score-tier boundaries).
  • The agent owned the boring stuff. Refactors, file moves, renaming score_bracketscoreBracket across 40 files, regenerating Supabase types — all of that was one-shot.
  • I owned the calls that matter. Architecture, data model, what the game should feel like — those stayed mine. The agent is a fast typist with good instincts, not a designer.

The total time from "blank flutter create" to "feature-complete versus modes" was about six weeks of nights and weekends. The 2018 build, written by hand, never got past a single-player prototype in roughly the same wall-clock effort.

Accessories: generated, then hand-cleaned

There are about 35 accessories in the current build — hats, glasses, helmets, crowns. None of them are hand-drawn. The pipeline was:

  1. Prompt against the body. I'd feed Gemini and ChatGPT the existing Body_Regular.png and ask for the exact same potato, in the exact same pose, but wearing a specific accessory. The instruction was always "match the reference body precisely — only add the new accessory."
  2. Pick the best of N. Five or six generations per accessory, picking the one whose proportions matched the rig closest.
  3. Procreate cleanup. This was the slowest step. On the iPad I'd erase the potato, erase the background, and clean the alpha edge of the accessory until it was a clean transparent PNG. Roughly 10–20 minutes per accessory.
  4. Drop into assets/images/Acc_*.png and let the rig handle alignment.

The reason for matching the body first was practical: by generating the accessory worn on the canonical body, the perspective and scale came out right. Generating an accessory in isolation gave wonky proportions every time.

The potato skins

Same pipeline as accessories, but the goal was to replace the whole potato silhouette while keeping the rig anchors identical. About a dozen skins in the build — Disco, Zombie, Robo, Mash, Rotten, Gold, Cactus, and more — each generated as a full-body sprite matching the reference, then masked.

The default Body_Regular.png is the only sprite I drew myself. Everything else is downstream of it.

A new animation system that doesn't need re-animation

The 2018 version used pre-baked sprite sheets per skin: every animation frame for every variant, all painted by hand. That doesn't scale — adding a single new accessory meant redrawing every animation it would appear in.

The 2026 version does the opposite. There is exactly one animated reference: Body_Regular.png, rigged and animated in code. Every skin and every accessory rides on that rig at runtime. The body bends, the hat follows. The animation system never knows what's on top of it.

To make this work I needed a way to align each accessory to the body precisely — frame by frame — without doing it by eye in a paint program. That became the calibration tool.

It's a debug-only overlay shipped in the same binary. You launch the game with a flag, the calibration screen replaces the menu, and you can:

  • step through every animation frame of the reference body,
  • nudge each accessory slot's offset and rotation per frame with arrow keys,
  • preview the result composited live over the original sprite sheet,
  • export the resulting rig as JSON to the clipboard.

Paste it into assets/data/potato_rig.json, ship it, done. The whole animation pipeline for a new accessory is now: generate image → clean alpha → run calibration tool for a few minutes → commit.

That single change — one body, programmatic accessories, calibration as a tool — is why the 2026 build can have 35 accessories × 12 skins instead of the original's three hand-drawn potato variants.

Multiplayer without live multiplayer

The 2018 game was single-player only. The core of the 2026 build is the same endless platformer — drag to aim, release to launch, don't fall off the back of the auto-scrolling screen — but it now has two versus modes layered on top: Bracket Battle and Daily Duel. Neither of them needs real-time networking. The trick is asynchronous ghosts.

Every match recorded in PotatoQuest is captured as a deterministic replay — a compact event log keyed to game ticks (jumps, swings, deaths, the lot). The replay isn't a video; it's a script the game can re-run frame-perfect, dropping a "ghost" version of the original player into a fresh match.

That single primitive — record once, replay anywhere — gives the game two modes for free:

  • Daily Duel. One course, the whole world, 24 hours. Everyone races the same level against a ghost of the current day's leader. You either out-score them and take the crown, or you don't — and the crown changes hands the moment someone posts a higher score. It runs on its own ladder and doesn't touch your all-time high score.
  • Bracket Battle. You're slotted into one of seven tiers by your high score — Spud Sprout at the bottom, Spud Royale at the top. Each match drops you against a single ghost sampled from recent runs in your tier. Beat their score and you bank coins equal to your tier number (1 at Spud Sprout, 7 at Spud Royale) to spend on accessories. These runs still count toward your high score, so winning can bump you up a bracket.

The advantages over real-time multiplayer at this scale:

  • Latency is zero. The ghost is local data; the only network call is fetching it before the match.
  • The match always finishes. Nobody rage-quits a ghost. Nobody no-shows.
  • Scheduling solves itself. Asia plays Europe's ghosts. The "lobby" is the replay pool.
  • It's cheaper. A handful of Supabase rows per match versus a websocket backend that would dominate the budget for an indie game.

The tradeoff: ghosts can't react to you. They're playing their own match, not yours. For a versus game with no direct interaction (you're both attacking the same level, fastest score wins), that's a feature, not a bug — but I wouldn't try this in a fighting game.

What I'd tell 2018-me

Don't draw 80 sprite sheets by hand. Build a rig and a calibration tool first. Treat assets as data, not commitments.

And — when image models are good enough that you can hand them a reference and ask for "the same thing, wearing a fez" — take them up on it. The 2018 PotatoQuest had three potato variants because that's how many I could draw. The 2026 one has dozens because the bottleneck moved from drawing to deciding.

The game itself? Same idea, eight years later. Spuds, swings, small dramas. It just shipped this time.

PotatoQuest is in soft launch right now. If you want to try it, store links are on the product page.