Noether Capital - Hilbert AI Software Factory

Upgraded

PROCESS FLOW

Hilbert AI Software Factory — Process Flow

Complete Pipeline: Architecture → Production

28 Steps · 6 Phases

GOVERNANCE DASHBOARD

Hilbert AI Software Factory — Governance Dashboard Specification

Human Visibility, Oversight & Conceptualisation of the Build Process

7 Views · 30 Panels

Design Principle: A human who has never seen the factory should be able to look at the dashboard and understand what is happening, what has happened, and what needs attention — within 30 seconds. Every view answers a specific human question. Every panel has a reason to exist. Every indicator maps to an action the human can take.

Build Implementation Map

Each governance dashboard has a dedicated page shell containing its full build specification. To build a dashboard, give Claude Code the instruction shown below.

Dashboard View	Page Shell	Spec Reference	Claude Code Build Instruction
1. Mission Control	factory-mission-control.html	S19 + View 1	"Read the build spec on factory-mission-control.html and implement all panels"
2. Build Activity	factory-build-activity.html	S14, S15, S18 + View 2	"Read the build spec on factory-build-activity.html and implement all panels"
3. Verification & Quality	factory-verification.html	S3, S6, S7, S15 + View 3	"Read the build spec on factory-verification.html and implement all panels"
4. Failure & Recovery	factory-failures.html	S4, S5, S9, S17 + View 4	"Read the build spec on factory-failures.html and implement all panels"
5. Budget & Resources	factory-budget.html	S16 + View 5	"Read the build spec on factory-budget.html and implement all panels"
6. Configuration & Progress	factory-configuration.html	S6, S7, S8, S12 + View 6	"Read the build spec on factory-configuration.html and implement all panels"
7. System Health & Validation	factory-health.html	S19 + View 7	"Read the build spec on factory-health.html and implement all panels"

Each page shell contains an amber BUILD STATUS banner and the full panel-by-panel specification including chart types, data sources, refresh rates, and interaction models. When a dashboard is built, the amber banner is removed and replaced with the actual dashboard components. The build spec remains in the page source as an HTML comment for reference.

Dashboard 1 Mission Control

Mission Control

“Is the factory working? What’s the overall status right now?”

● Is it working?

System Health Traffic Light: single GREEN / YELLOW / RED indicator — GREEN = all health checks passing + last canary passed within 24 hours; YELLOW = health checks passing but canary stale > 24 hours; RED = any health check failing or last canary FAILED or active HARD_HALT
Build Pipeline Status Bar: horizontal flow showing packet counts at each stage — QUEUED → ASSIGNED → EXECUTING → VERIFYING → COMPLETED — with FAILED packets in red below the bar
Active HARD_HALT indicator: if any system-wide halt is active, a prominent red banner with the halt reason and time since halt
Last Canary Test result: PASS/FAIL with timestamp, latency, and cost — red highlight if > 24 hours since last run

▶ What is it doing?

Phase Progress Bars: 11 horizontal bars (Phase 0–10) showing total packets, completed, in-progress, and failed per phase — active phase highlighted, future phases greyed, completed phases with green checkmark
Today’s Key Metrics: packets completed, packets failed, API calls made, budget spent vs $5.00 remaining, average build time, verification pass rate — each with trend arrow (↑↓→) compared to 7-day average
Active Alerts feed: scrollable list of CRITICAL / WARNING / INFO alerts from stop conditions, canary failures, health check failures, budget warnings, and escalations — most recent first
Next pending human action: if any escalation is waiting for human response, show it prominently with “Action Required” badge

S3 S9 S18 S19

Dashboard 2 Build Activity

Build Activity

“What is every agent doing right now? Where is each packet in the pipeline?”

● Is it working?

Agent Pool Grid: all Builder, Verifier, and Chief Engineer instances — each showing agent_id, type, status (IDLE = green, BUSY = blue, UNRESPONSIVE = red), current packet if busy, elapsed time, last heartbeat timestamp
Agent Availability Summary: “X/Y Builders idle, X/Y Verifiers idle, Chief Engineer: idle/busy” — red warning if no idle agents and packets are queued
Queue Depth Monitor: current queue depth with alert if > 20 packets waiting — shows average wait time and longest-waiting packet

▶ What is it doing?

Live Packet Tracker table: all non-terminal packets with packet_id, capability_block name, phase, current status (colour-coded), assigned agent, elapsed time, retry count — sortable by any column, click for full execution history
Execution Timeline: Gantt-style horizontal chart showing each packet’s lifecycle over the past 24 hours — colour segments for time in each status, red diamonds at failure points
Concurrent Build Monitor: number of simultaneous builds running vs maximum (3), with API call rate and token consumption per minute
Agent Assignment Log: recent assignments with packet → agent mapping, timestamp, and reason (priority order, phase order, dependency order)

S14 S15 S17 S18

Dashboard 3 Verification & Quality

Verification & Quality

“Is the code being built correctly? Can I trust the verification? Where is the proof?”

● Is it working?

Verification Pass Rate: large percentage with trend sparkline over 30 days — target ≥ 95%
Two-Agent Discipline Audit: “100% enforced (X checks, 0 violations)” — this must always show 100%, any deviation is a HARD_HALT — shows last 20 independence checks with builder_id ≠ verifier_id confirmation
Proof Object completeness: “X/Y components have complete proof chains” — any component without a full chain (source → requirement → block → packet → build → verification → proof → manifest → claim) is flagged
Implementation Claim Ledger integrity: last integrity check timestamp, status (INTACT / TAMPERED), total claims filed

▶ What is it doing?

Recent Verification Results table: packet_id, verifier_agent, result (PASS/FAIL), failing step if FAIL, duration, timestamp — click for full Verification Manifest and all Proof Objects
Code Quality Gauges: 6 gauges with current value, threshold, and 30-day trend sparkline — Lint cleanliness (% clean, threshold: 100%), Dependency health (vulnerability count, threshold: 0 high/critical), Unused code (% dead code, threshold: 0%), Cyclomatic complexity (avg per function, threshold: ≤ 10), Security score (vulnerability count, threshold: 0 high/critical), Code duplication (% duplicated, threshold: ≤ 5%)
Proof Object Registry: searchable table of all proof objects by type, associated packet, component, storage reference, and timestamp — filter by type (test log, compliance report, API log, etc.)
Template Compliance Rate: percentage of builds passing all template verification_rules, with breakdown by template type

S3 S6 S7 S15

Dashboard 4 Failure & Recovery

Failure & Recovery

“What went wrong? Why? What decision do you need from me?”

● Is it working?

Active Failures count: large number with severity breakdown — X FAILED, Y REPAIRING, Z ESCALATED — ESCALATED packets highlighted prominently as they require human action
Chief Engineer Resolution Rate: percentage of failures resolved without human escalation — trend over 30 days — declining rate indicates systemic issues
Mean Time to Recovery: average time from failure detection to successful re-verification — by failure category
Escalation Queue: all packets awaiting human resolution with escalation_id, packet_id, category, time waiting, recommended actions — each with action buttons: “Approve Repair”, “Modify Architecture”, “Override and Resume”, “Suspend Build”

▶ What is it doing?

Chief Engineer Activity Log: all interventions with diagnosis_id, packet_id, failure classification, root cause, confidence score, outcome (REPAIR/ESCALATE), duration — click for full Diagnostic Report
Failure Pattern Analysis: most common failure types (pie chart), failure rate by phase (bar chart), failure rate trend (line chart) — if same root cause appears 3+ times: highlighted as “Systemic Issue — architecture review recommended”
Repair Instruction History: all repair instructions with repair_type (RETRY_SAME / RETRY_MODIFIED / REBUILD_DEPENDENCY / TEMPLATE_UPDATE), success rate per type, average retries to resolution
Confidence Score Distribution: histogram of Chief Engineer root_cause_confidence scores — low average (< 0.5) indicates diagnostic context is insufficient

S4 S5 S9 S17

Dashboard 5 Budget & Resources

Budget & Resources

“How much is this costing? Are we within budget? Where is the money going?”

● Is it working?

Daily Budget Gauge: circular gauge showing $X.XX spent of $5.00 — green (< $3.50), yellow ($3.50–$4.50), red (> $4.50) — with priority tier indicator: “Normal” / “HIGH+ only” / “CRITICAL only”
Projected End-of-Day Spend: based on current hourly rate — warning if projected to exceed budget
Budget Alert Status: any active budget warnings (80% threshold reached, CRITICAL reserve activated, emergency override in effect)
Canary Test Cost: today’s canary spend vs $0.10 target — alert if canary exceeding expected cost

▶ What is it doing?

Cost Breakdown: stacked bar chart — daily cost by agent type (Builder / Verifier / Chief Engineer / Canary) over past 7 days
Per-Packet Cost Table: today’s most expensive packets with packet_id, model used (Sonnet/Opus), token count (in/out), cost, and API calls made
Model Distribution: “Sonnet: XX% of calls ($X.XX) / Opus: XX% of calls ($X.XX)” — with cost-per-completion comparison
30-Day Spending History: line chart of daily spend with $5.00 target line — days exceeding $4.50 highlighted red
Cost Attribution: per-phase cost breakdown — shows which phases are most expensive and why (complexity, retries, Opus usage)

S14 S16 S19

Dashboard 6 Configuration & Progress

Configuration & Progress

“What templates are active? How much of the system has been built? What’s left?”

● Is it working?

Template Registry Status: X templates ACTIVE, Y DEPRECATED, Z DRAFT — alert if any required base template is not ACTIVE
Prompt Version Status: current active prompt_version with last update date — yellow warning if any prompt references an outdated specification version
Specification Version: current Hilbert Factory specification version with last modification date and section count
Knowledge Graph Health: entity count, relationship count, last integrity check, query latency p95

▶ What is it doing?

Capability Block Map Visualisation: the 7-layer map with per-block status — NOT_STARTED (grey), IN_PROGRESS (blue), COMPLETE (green), MODIFIED (orange) — click any block to see its packets and their statuses
Phase Completion Matrix: grid of Phase 0–10 × status showing which phases are complete, active, or blocked — with blocking reason for blocked phases
Template Usage Report: which templates are used most, which agents use them, recent template evolution proposals
Build Velocity Forecast: at current completion rate, estimated date for each remaining phase — “Phase 3 estimated completion: April 12, 2026”

S6 S7 S8 S12

Dashboard 7 Health & Validation

System Health & Validation

“Is the factory itself healthy? Can I prove it works right now?”

● Is it working?

Component Health Grid: 10 components each with green/yellow/red indicator, response time (ms), uptime percentage over 24 hours — PostgreSQL, Knowledge Graph, Template Registry API, Prompt Registry, Artifact Storage, Claude API, Build Orchestrator, Prompt Execution Engine, Builder Agent Pool, Verifier Agent Pool
System Uptime: overall factory uptime percentage over 30 days — target ≥ 99%
Last Pre-Production Validation: date, result, categories passed/failed — alert if > 30 days since last full validation
Factory Commissioning Status: COMMISSIONED (date) or NOT COMMISSIONED — with link to the Factory Commissioning Proof Object

▶ What is it doing?

Canary Test History: table of recent runs with trigger (SCHEDULED/MANUAL), result (PASS/FAIL), failing step if FAIL, latency, cost, timestamp — most recent at top with large PASS/FAIL indicator
Canary Pass Rate: percentage over past 30 days — should be 100%, any deviation is a red flag with trend chart
Run Canary Test Button: prominent button with real-time progress display — Readiness ✓ → Queued ✓ → Building... → Verifying... → Complete ✓ or ✗ at failing step — 5-minute cooldown between manual runs
Run Health Check Button: adjacent button — runs all 10 component checks immediately, results populate the Component Health Grid above
Health Check History: per-component 24-hour uptime chart — click any component to see error log and response time distribution

S9 S19

Human Action Map — Where to Find Answers

Every human question mapped to the dashboard and panel that answers it.

Human Question	Dashboard	Panel
“Is the factory working?”	1. Mission Control	System Health Traffic Light
“What’s the overall progress?”	1. Mission Control	Phase Progress Bars
“What needs my attention right now?”	1. Mission Control	Active Alerts / Next Pending Action
“Where is my packet?”	2. Build Activity	Live Packet Tracker
“Is anything stuck?”	2. Build Activity	Queue Depth Monitor
“How long are builds taking?”	2. Build Activity	Execution Timeline
“Is the code good quality?”	3. Verification	Code Quality Gauges
“Can I trust the verification is independent?”	3. Verification	Two-Agent Discipline Audit
“Where is the proof this component was built correctly?”	3. Verification	Proof Object Registry
“What broke?”	4. Failures	Active Failures
“Is this a recurring problem?”	4. Failures	Failure Pattern Analysis
“What decision do you need from me?”	4. Failures	Escalation Queue
“How much is this costing?”	5. Budget	Daily Budget Gauge
“Where is the money going?”	5. Budget	Cost Breakdown + Per-Packet Cost
“How much of the system has been built?”	6. Configuration	Capability Block Map Visualisation
“What templates are we using?”	6. Configuration	Template Registry Status
“When will this phase be done?”	6. Configuration	Build Velocity Forecast
“Is the factory healthy right now?”	7. Health	Component Health Grid
“Prove the factory works”	7. Health	Run Canary Test Button