Every month the models get better. Every month the conversation about them gets a little less
interesting. The sharp question moved a while ago. It isn't which model you use. It's what happens
when that model decides to run a shell command, write to your disk, send an email, or move money.
Most people don't have a good answer yet. That gap is the thing we work on.
The parts of the stack that matter over the long run are the slow parts. Policy. Approval. Audit.
Memory with access control. Boring words, but they're the reason an agent is safe to leave running
while you sleep. They're also what makes the next model, and the one after that, drop-in
replaceable. You get to keep the guardrails and the logs when whatever's underneath changes in
six months.
How we sequence what gets built.
Infrastructure is a slow game. We're not building everything at once. The rule we follow: pick
the thing that hurts the most right now, ship it as OSS, see if anyone actually uses it. If they
do, and they have opinions, we build the next layer. If they don't, we stop. The roadmap doesn't
have dates on it. It has evidence gates. Gatekeeper exists because we kept running agents that
would have been a lot safer with it. The next primitive will exist for the same reason, or not
at all.
Local-first, for a strategic reason.
One reason we self-host is principle. The bigger reason is strategic. Every tool call an agent
makes is metadata about how a company actually works. The arguments, the paths, the URLs, which
actions got approved, which got blocked. Whoever ends up holding all of that ends up being very
important to everyone else. We'd rather not be that. We'd rather be the thing you run on your
own hardware and forget about. If we ever do a hosted tier it sits on top of the same OSS code,
not instead of it. You can always take the stack and go home.
What we're not building.
Some things we get asked about often enough to be explicit:
- Not an agent framework. Gatekeeper doesn't care which framework you use.
Plain HTTP. Any language.
- Not a model. We use Claude. We'll use whatever's best next year. The model
isn't the product.
- Not a prompt-security wall. Those exist. They solve a different problem.
Prompt-injection defense and tool-call enforcement are not the same thing.
- Not an evals company. Important category. Not ours.
- Not a product that only works if you use our cloud. If we're not useful on
your laptop first, we're not useful.
Where we are.
We're small and early. Gatekeeper is on GitHub and
on npm as @runestone-labs/gatekeeper-client. If you want to use it, break it,
integrate it somewhere real, or argue about what we're getting wrong, the contact page has
working addresses. We answer our own mail.
One side note. The same people behind this also run a couple of smaller consumer-facing AI
projects, on separate brands with separate audiences. We keep them off this site on purpose.
It'd muddy what Runestone Labs is for. If you're curious and you find them, that's fine.
Mostly we want one sentence to set expectations here: this company is about the infrastructure,
not the content layer.