If you write a program in English and AI translates it into Python, which one is the actual source code?
In the age of vibe coding1, prompts are becoming the human interface. This raises a new dilemma: should we store these prompts alongside the code they generate, or discard them as transient artifacts?
The community is divided. When Gergely Orosz polled developers about making prompts visible to code reviewers, opinions split: 49% loved it, while 24% hated the idea. Meanwhile, the industry is betting on a fundamental shift: Cursor acquiring Graphite, a startup that uses AI to review and debug code, and Meta creating internal tooling to publish prompts.
We are still figuring out the norms for this new reality.
Are prompts the new source code?
Traditionally, source code is what humans write, and machine code is what computers execute.

For the end user, the build is all that matters. They download binaries or open the website. They don’t care about the code, nor should they. Yet source code is what is needed for development — and sufficient to generate builds.
With vibe coding, we translate natural language into programming language:

If prompts are the real “source”, should we be committing them instead of the Python, TypeScript, or Rust they generate? It might be tempting to cut the middleman and treat our instructions as source code. But it does not work that way.
Building code is deterministic, or close to it. Code that compiles only during a full moon is not good code. In 2026 we are well past the era of “works on my machine” and should never go back there.
Good repositories have a clear way of execution, so there is no guesswork on which commands to run, or which package version to use. In most modern languages, toolsets are good — both in package managers and in other tooling — Dockerfiles, GitHub Actions, or similar.

At the same time, generating code from prompts is non-deterministic by nature, and hard to replicate.
- Probabilistic nature: We can try to set
temperature=0, but it is neither supported by all APIs nor guaranteed to produce the best result (see this beautiful Transformer Explainer). Guaranteeing determinism is a research problem. - Lack of long-term support: Models update silently or are deprecated. Unlike pinned package versions, we cannot rely on a specific model snapshot existing forever.
- Hard to capture context: LLMs work best with rich context beyond the prompt itself, including conversation history, memory, skills, screenshots, tool outputs, and MCP servers.
Even in the simplest case, results differ.
Same prompt (“Create an HTML file with a cute, interactive octopus.”), same agent (Claude Code), same model (Opus 4.5), still — slightly different results. Click on images to play with them!
In larger projects, the same prompt might solve an issue once, fail another time, and introduce a new bug.
Even something as explicit as “correct grammar” of a single blog post yields different outcomes.
I ran four instances of Gemini 3 Pro in parallel in Cursor, with the same prompt — “Correct grammar in this post”. Even for standard tasks, each worked differently and gave different results.
Where is the room for prompts?
Prompts are a kind of spec. They can be very vague, leaving a lot of room for interpretation.
Natural language does not compile — which is both a feature and a curse.
Even when they are precise, there is still space left2. Just because we gave a clear specification and asked someone (or something) to do it, doesn’t mean it works yet. Current LLMs are far from perfect. Sometimes they fail instructions that would be clear for an employee.
That’s why prompts are best treated as intentions and notes from the development process — useful context, not a reliable build input.
We should be able to (git) blame AI
I think that all contributions from AI should be attributed as such (both code changes and commits). Not because they are worse (or better), but as an essential troubleshooting tool. More and more open source projects require clear disclosure on AI contributions3.
Among other things, it is crucial to know: what was intended, what was a conscious decision, and what just “happened”.
From stared/sc2-balance-timeline, my entirely vibe-coded side project 15 Years of StarCraft II Balance Changes Visualized. Each commit is also Claude-generated, so I can compare package changes with their intention.
Tracking prompts helps us on a few levels:
- Learning: The AI world is moving so fast it is hard to catch up. Learning from peers is super valuable — even Andrej Karpathy mentioned he feels behind. Seeing how others prompt models helps us improve our own workflows.
- Intent verification: We can understand the intention behind a change by reading the prompt.
- Efficient reviewing: AI makes it easy to create a commit, but it may take more time to review. Knowing code is AI-generated signals where to look closer. For example some code such as UI can be AI-generated, while we want human precision in auth logic.
Reservations
One of the issues with saving prompts is the human factor. Tracking prompts is awkward due to creative flow, privacy, anger, and messiness:
- Dirty notebook: People often write prompts as a stream of consciousness, full of typos and idiosyncrasies.
- Privacy: Prompts might contain passwords, API keys, or personal data we don’t want to share publicly.
- Profanity: People behave less civilly towards AI than they would towards coworkers. Sometimes out of frustration, other times because it might actually work (see the famous leaked Windurf prompt).
- Sense of pride: For many, coding is a craft that demonstrates high-value skills. Using an LLM can make the output feel less “earned”.
- Peer pressure: There is a huge amount of “AI Slop” and valid skepticism. Many communities or reviewers automatically reject AI-assisted submissions.
We need redaction capabilities. Just as we squash dirty commits before pushing to a public repository, we should be able to curate our prompts.
Conclusion
Code reviews are evolving, and controversy is inevitable.
We already have standards like MCP and SKILL.md — and we need one to share prompts alongside git commits. We are building an open-source tool to help with this — stay tuned!
In the meantime, start simple: if you use AI to write code, use AI to write the commit message.
It is frustrating to see dozens of AI-generated files committed with a lazy fixed it. If a tool allows vibe coding, it should also allow vibe committing.
Footnotes
Footnotes
-
See our recent post How 2025 took AI from party tricks to production tools. Even the term “vibe coding” was coined in February, see Andrej Karpathy’s musings. Right now Claude Code is written in Claude Code. ↩
-
Law is codified, yet requires courts for interpretation. Even mathematics, despite its precision, leaves room for underspecification — hence the need for proof checkers, see AI will make formal verification go mainstream. ↩
-
Ghostty requires clear AI disclosure, Gentoo plans to ban AI contribution, and there is a lot of general discussion on good standard for peer-reviewing AI-assisted pull requests. People actually read Claude-generated Claudflare’s code. ↩
Stay tuned for future posts and releases