Everything that keeps your agents alive, honest, and private
HawkeRun is the durable layer above Claude, Codex, and Antigravity: it self-heals, learns from your decisions, keeps your data on your machine, and governs autonomous work with policy, permissions, and a full audit trail, built in.
Self-healing
A frozen agent doesn't end your run. HawkeRun spots the stall, kills the dead worker and its child processes, requeues the task, and keeps going. Uncaught errors never take the daemon down.
Self-learning
Every decision and correction you make is remembered. Agents get your past redirects as context on the next run, so the same mistake doesn't happen twice.
Data stays private
Local-first. Your code, prompts, and API keys never leave your machine. HawkeRun drives the agents and subscriptions you already pay for. No proxy, no cloud middleman, no training on your data.
AI governance
Per-agent policy tiers (red / amber / green), approval gates, role-scoped permissions, and an audit trail of every action and decision. Autonomy you can actually prove.
Everything you need to trust your agents unattended
Reliability primitives borrowed from production job queues, applied to disposable AI workers.
Single owner of state
One supervisor process writes SQLite in WAL mode. No multi-writer races, no corruption from agents fighting over a file.
Leases & heartbeats
Every claimed task has a lease_expires_at. No output for N minutes → hung → killed + requeued. Nothing is owned forever.
Watchdog reclaim
Polling is the safety net; file watchers just make wakeups instant. Missed FS events never stall the system.
Crash recovery
On startup, anything left claimed or running is requeued automatically. Pick up exactly where you crashed.
Whole-tree process kills
Tears down the entire process tree (taskkill /T /F on Windows, process-group signals on macOS and Linux), so orphaned npm / python / node children can't fake 'still alive.'
Any AI worker
The shell adapter runs claude -p, codex, ollama run, anything. Add adapters per tool, all sharing the same lease/heartbeat contract.
Manager escalations
Workers can wait for approval. Approve, reject, or redirect from the control panel, or from your phone via the email + Cloudflare relay.
Journal retention
events.jsonl and per-task logs rotate to timestamped archives instead of truncating. Nothing important is silently lost.
Self-healing loop
A bug in one tick is logged and the loop continues. uncaughtException and unhandledRejection don't kill the daemon.
The supervisor earns its keep while you're away
Three things that happen on their own. No prompts, no babysitting.
Self-healing
A worker froze. You didn't have to care.
Failover
One model ran out. The work didn't stop.
Decide from your phone
Only the serious calls reach you.
Approve the serious calls from your phone
Routine, reversible work runs on its own. When an agent hits a real decision (publishing, deleting, spending), HawkeRun pauses just that step and emails it to you. Approve, hold, or redirect, and work resumes the moment you tap. Try it →
- Red / amber / green tiers: only red needs you
- Every other agent keeps working meanwhile
- Reply by email: APPROVE · REJECT · REDIRECT
- The whole exchange lands in your audit trail
Publish v0.2.1 to production?
Signed installer built · all tests green · changelog ready.
Agents keep working while you decide.
"Can't I just run the CLI myself?"
You can, right up until something stalls, rate-limits, or needs a decision while you're not looking.
| When… | Run it yourself | cron + scripts | HawkeRun |
|---|---|---|---|
| A frozen agent at 2am | Dead until you notice | Job hangs, blocks the queue | Detected, killed, requeued |
| Rate limit mid-task | Work stops | Script errors out | Another agent covers |
| A real decision | Must be at the keyboard | No concept of approval | Emailed to your phone |
| Agents reviewing each other | Manual | Not a thing | Built-in sign-off pipeline |
| Crash / reboot | Lose your place | Partial reruns | Requeues exactly where it stopped |
Let agents work autonomously, and still prove what happened
HawkeRun's governance layer makes hands-off agents safe for real work today: policy tiers, role-scoped permissions, approval gates, and a complete record you can audit.
- Per-agent policy tiers: red needs approval, amber warns, green runs free
- Role-scoped permissions and allow/deny tool lists
- Full audit trail: every action, decision, and hand-off recorded
- Sign-off gates so nothing ships on one agent's say-so
Full LLM observability, in your own database
Trace every run, track every token and dollar, score every output. Cloud observability tools ship your prompts, code, and credentials to their servers. HawkeRun writes the same telemetry to infrastructure you control.
Trace every run
The full nested tree of prompt calls, tool-execution spans, responses, and hand-offs between agents.
Metrics that matter
Tokens, cost estimates, latency, and rate-limit events, per agent and per task.
Built-in evaluations
Human thumbs up/down and LLM-as-a-judge scoring on any output.
Prompt management
Store, version, and fetch the exact prompts your agents run.
Immutable audit trail
Every action, decision, and approval recorded for compliance.
vs. cloud observability tools: your data never touches our cloud, because there isn't one.
Pluggable local storage
Point HawkeRun's logs, metrics, and supervisor verdicts at the store you already run:
Inside your firewall
No raw code, prompts, or credentials leave your machine. Pipe the audit trail into Snowflake or BigQuery for cost attribution and compliance, on your terms, not a vendor's.
No agent marks its own work done
One agent does the work, another reviews it, a check verifies the result, and HawkeRun enforces the hand-off, whether that’s a pull request, a cleaned dataset, or a finished report. Nothing reaches done on a single agent’s say-so.
Implementer
Claude · CodexDoes the work (writes code, gathers data, drafts content, runs the job) and submits it for review. Never marks its own work done.
Reviewer
Codex · ClaudeChecks the output, then signs off or sends it back with evidence, not vibes.
Verifier
Shell · test runnerRuns the check that proves it: tests pass, row counts match, the scrape returned the expected schema, the build is green. A sign-off only counts with a passing result attached.
Referee
HawkeRunEnforces the protocol. A task can't reach done until every required sign-off passes.
implementing → worker_done → verifying → review_required review_required → signed_off → done review_required → changes_requested → implementing
Sign-offs are typed (implementation, test verification, review, security, release, or human approval) and stored as signed decisions in SQLite. The event log is the source of truth; the board is generated from it.
# TASK-104 · Cloudflare email relay review required impl claude-worker-1 implementation complete verify test-runner npm test → pass review codex-reviewer-1 changes requested [x] Implementation submitted [x] Tests passed [ ] Reviewer sign-off [ ] Done > “Missing duplicate inbound email test.”
In a coding workflow, that looks like Claude implementing, Codex reviewing, and the test runner verifying. HawkeRun makes your agents communicate, review each other, and sign off, so nothing reaches done on one agent’s “trust me,” and you’re only pulled in for the truly red decisions.
Your code never leaves your machine
HawkeRun is local-first by design. Here's exactly what does and doesn't leave your laptop.
- Your source code stays on your machine. HawkeRun runs beside your projects, not in our cloud.
- License checks send only your license key, app version, and a machine hash.
- The managed relay stores only decision metadata, never your code or prompts.
- Bring your own model and API keys.
Keep your agents working while you’re away.
Let the supervisor feed them work, catch the stalls, and requeue the dead, and stop them from wasting hours asking dumb questions.