MISSION

The interesting part of an agent isn't the model.
It's where the model meets the world.

Every month the models get better. Every month the conversation about them gets a little less interesting. The sharp question moved a while ago. It isn't which model you use. It's what happens when that model decides to run a shell command, write to your disk, send an email, or move money. Most people don't have a good answer yet. That gap is the thing we work on.

The parts of the stack that matter over the long run are the slow parts. Policy. Approval. Audit. Memory with access control. Boring words, but they're the reason an agent is safe to leave running while you sleep. They're also what makes the next model, and the one after that, drop-in replaceable. You get to keep the guardrails and the logs when whatever's underneath changes in six months.

Boundary primitives.

Each one is useful on its own. Together they're how you run an agent in production without holding your breath. Three of them ship in Gatekeeper today. The fourth is emerging inside the stack we use ourselves.

● SHIPPING in Gatekeeper

Policy

Declarative rules the agent can't rewrite. Every tool call goes through a decision. If the agent gets prompt-injected tomorrow, the policy still says no.

● SHIPPING in Gatekeeper

Approval

A signed, single-use link that pings a human before a risky action runs. The agent pauses. You decide. The log remembers. This is the third option most tools don't give you.

● SHIPPING in Gatekeeper

Audit

An append-only record of every decision. What the agent tried, what the policy said, who approved, when. Something you can show a reviewer, or a customer, or yourself six months later when you're wondering what actually happened.

○ EMERGING dogfood only

Memory with a boundary

Structured long-term state that lives on your machine. The agent can learn about your work across weeks without handing any of it to a third party to index. We're building this inside the stack we use ourselves, to see what holds up before it's a product.

How we sequence what gets built.

Infrastructure is a slow game. We're not building everything at once. The rule we follow: pick the thing that hurts the most right now, ship it as OSS, see if anyone actually uses it. If they do, and they have opinions, we build the next layer. If they don't, we stop. The roadmap doesn't have dates on it. It has evidence gates. Gatekeeper exists because we kept running agents that would have been a lot safer with it. The next primitive will exist for the same reason, or not at all.

Local-first, for a strategic reason.

One reason we self-host is principle. The bigger reason is strategic. Every tool call an agent makes is metadata about how a company actually works. The arguments, the paths, the URLs, which actions got approved, which got blocked. Whoever ends up holding all of that ends up being very important to everyone else. We'd rather not be that. We'd rather be the thing you run on your own hardware and forget about. If we ever do a hosted tier it sits on top of the same OSS code, not instead of it. You can always take the stack and go home.

What we're not building.

Some things we get asked about often enough to be explicit:

Where we are.

We're small and early. Gatekeeper is on GitHub and on npm as @runestone-labs/gatekeeper-client. If you want to use it, break it, integrate it somewhere real, or argue about what we're getting wrong, the contact page has working addresses. We answer our own mail.

One side note. The same people behind this also run a couple of smaller consumer-facing AI projects, on separate brands with separate audiences. We keep them off this site on purpose. It'd muddy what Runestone Labs is for. If you're curious and you find them, that's fine. Mostly we want one sentence to set expectations here: this company is about the infrastructure, not the content layer.