HawkeRun
Features

Everything that keeps your agents alive, honest, and private

HawkeRun is the durable layer above Claude, Codex, and Antigravity: it self-heals, learns from your decisions, keeps your data on your machine, and governs autonomous work with policy, permissions, and a full audit trail, built in.

Self-healing

A frozen agent doesn't end your run. HawkeRun spots the stall, kills the dead worker and its child processes, requeues the task, and keeps going. Uncaught errors never take the daemon down.

Self-learning

Every decision and correction you make is remembered. Agents get your past redirects as context on the next run, so the same mistake doesn't happen twice.

Data stays private

Local-first. Your code, prompts, and API keys never leave your machine. HawkeRun drives the agents and subscriptions you already pay for. No proxy, no cloud middleman, no training on your data.

AI governance

Per-agent policy tiers (red / amber / green), approval gates, role-scoped permissions, and an audit trail of every action and decision. Autonomy you can actually prove.

Features

Everything you need to trust your agents unattended

Reliability primitives borrowed from production job queues, applied to disposable AI workers.

Single owner of state

One supervisor process writes SQLite in WAL mode. No multi-writer races, no corruption from agents fighting over a file.

Leases & heartbeats

Every claimed task has a lease_expires_at. No output for N minutes → hung → killed + requeued. Nothing is owned forever.

Watchdog reclaim

Polling is the safety net; file watchers just make wakeups instant. Missed FS events never stall the system.

Crash recovery

On startup, anything left claimed or running is requeued automatically. Pick up exactly where you crashed.

Whole-tree process kills

Tears down the entire process tree (taskkill /T /F on Windows, process-group signals on macOS and Linux), so orphaned npm / python / node children can't fake 'still alive.'

Any AI worker

The shell adapter runs claude -p, codex, ollama run, anything. Add adapters per tool, all sharing the same lease/heartbeat contract.

Manager escalations

Workers can wait for approval. Approve, reject, or redirect from the control panel, or from your phone via the email + Cloudflare relay.

Journal retention

events.jsonl and per-task logs rotate to timestamped archives instead of truncating. Nothing important is silently lost.

Self-healing loop

A bug in one tick is logged and the loop continues. uncaughtException and unhandledRejection don't kill the daemon.

See it work

The supervisor earns its keep while you're away

Three things that happen on their own. No prompts, no babysitting.

Self-healing

A worker froze. You didn't have to care.

running · nightly ETL load
no heartbeat for 6m, stalled
killed frozen worker + children
requeued · attempt 2 of 3
done · recovered automatically

Failover

One model ran out. The work didn't stop.

Claude · drafting release notes
rate limit hit, cooling down
routing to Codex
Codex covered, no interruption

Decide from your phone

Only the serious calls reach you.

🔴 publish v0.2.1 to production?
emailed to your phone
you tapped Approve
work resumed, you stayed asleep
Decide from anywhere

Approve the serious calls from your phone

Routine, reversible work runs on its own. When an agent hits a real decision (publishing, deleting, spending), HawkeRun pauses just that step and emails it to you. Approve, hold, or redirect, and work resumes the moment you tap. Try it →

  • Red / amber / green tiers: only red needs you
  • Every other agent keeps working meanwhile
  • Reply by email: APPROVE · REJECT · REDIRECT
  • The whole exchange lands in your audit trail
9:41
HawkeRunnow
RED DECISION
Claude-Release

Publish v0.2.1 to production?

Signed installer built · all tests green · changelog ready.

Agents keep working while you decide.

Why not DIY

"Can't I just run the CLI myself?"

You can, right up until something stalls, rate-limits, or needs a decision while you're not looking.

When…Run it yourselfcron + scriptsHawkeRun
A frozen agent at 2amDead until you noticeJob hangs, blocks the queueDetected, killed, requeued
Rate limit mid-taskWork stopsScript errors outAnother agent covers
A real decisionMust be at the keyboardNo concept of approvalEmailed to your phone
Agents reviewing each otherManualNot a thingBuilt-in sign-off pipeline
Crash / rebootLose your placePartial rerunsRequeues exactly where it stopped
AI governance · built in

Let agents work autonomously, and still prove what happened

HawkeRun's governance layer makes hands-off agents safe for real work today: policy tiers, role-scoped permissions, approval gates, and a complete record you can audit.

  • Per-agent policy tiers: red needs approval, amber warns, green runs free
  • Role-scoped permissions and allow/deny tool lists
  • Full audit trail: every action, decision, and hand-off recorded
  • Sign-off gates so nothing ships on one agent's say-so
Private observability

Full LLM observability, in your own database

Trace every run, track every token and dollar, score every output. Cloud observability tools ship your prompts, code, and credentials to their servers. HawkeRun writes the same telemetry to infrastructure you control.

Trace every run

The full nested tree of prompt calls, tool-execution spans, responses, and hand-offs between agents.

Metrics that matter

Tokens, cost estimates, latency, and rate-limit events, per agent and per task.

Built-in evaluations

Human thumbs up/down and LLM-as-a-judge scoring on any output.

Prompt management

Store, version, and fetch the exact prompts your agents run.

Immutable audit trail

Every action, decision, and approval recorded for compliance.

vs. cloud observability tools: your data never touches our cloud, because there isn't one.

Pluggable local storage

Point HawkeRun's logs, metrics, and supervisor verdicts at the store you already run:

SQLitePostgreSQLMySQLSnowflakeBigQuery

Inside your firewall

No raw code, prompts, or credentials leave your machine. Pipe the audit trail into Snowflake or BigQuery for cost attribution and compliance, on your terms, not a vendor's.

Agent sign-off

No agent marks its own work done

One agent does the work, another reviews it, a check verifies the result, and HawkeRun enforces the hand-off, whether that’s a pull request, a cleaned dataset, or a finished report. Nothing reaches done on a single agent’s say-so.

Implementer

Claude · Codex

Does the work (writes code, gathers data, drafts content, runs the job) and submits it for review. Never marks its own work done.

Reviewer

Codex · Claude

Checks the output, then signs off or sends it back with evidence, not vibes.

Verifier

Shell · test runner

Runs the check that proves it: tests pass, row counts match, the scrape returned the expected schema, the build is green. A sign-off only counts with a passing result attached.

Referee

HawkeRun

Enforces the protocol. A task can't reach done until every required sign-off passes.

enforced sign-off flow
implementing → worker_done → verifying → review_required
review_required → signed_off → done
review_required → changes_requested → implementing

Sign-offs are typed (implementation, test verification, review, security, release, or human approval) and stored as signed decisions in SQLite. The event log is the source of truth; the board is generated from it.

runtime/HANDOFF.md (generated)
# TASK-104 · Cloudflare email relay
review required

impl    claude-worker-1   implementation complete
verify  test-runner       npm test → pass
review  codex-reviewer-1  changes requested

[x] Implementation submitted
[x] Tests passed
[ ] Reviewer sign-off
[ ] Done

> “Missing duplicate inbound email test.”

In a coding workflow, that looks like Claude implementing, Codex reviewing, and the test runner verifying. HawkeRun makes your agents communicate, review each other, and sign off, so nothing reaches done on one agent’s “trust me,” and you’re only pulled in for the truly red decisions.

Local-first

Your code never leaves your machine

HawkeRun is local-first by design. Here's exactly what does and doesn't leave your laptop.

Keep your agents working while you’re away.

Let the supervisor feed them work, catch the stalls, and requeue the dead, and stop them from wasting hours asking dumb questions.