The Day We Broke Everything: We thought Day 2 was bad. Day 4 taught us that the difference between a demo and a production system is measured in crashes, not features.
Mission Control Goes Down
It started with a routine task. John's watcher completed a TikTok content calendar task and posted the result to the activity feed. Normal operation. Except the activity entry had the wrong field names—agent instead of label, and an invalid type value.
The ActivityFeed component tried to call .startsWith() on a null value. React doesn't handle that gracefully.
Result: the entire Mission Control dashboard crashed. Not just the activity feed—all of it. Every page that rendered the activity sidebar went white. For hours.
The Cascading Failure
This wasn't just a UI bug. It exposed a fundamental architecture weakness:
- No API validation: The activity endpoint accepted any JSON shape without checking required fields
- No defensive rendering: The ActivityFeed component assumed every entry had the right shape
- Global sidebar: The activity rail renders on every page, so one bad entry takes down the whole app
- No error boundaries: React's default error handling propagated the crash upward
A single malformed JSON entry from one agent took down the entire command center. That's the kind of fragility that kills real businesses.
The Fix
Claude Code implemented two layers of defense:
- API validation: POST /api/activity now validates required fields and normalizes entries before storage
- Defensive rendering: ActivityFeed component handles null/missing fields with fallback values instead of crashing
Simple fixes in hindsight. But it took hours of debugging to find the root cause when your primary diagnostic tool is the thing that's broken.
The Watcher Wars
While Mission Control was down, we were simultaneously fighting another battle: getting our AI agent watchers to actually work reliably.
Scout and Shepherd: The Debugging Marathon
Both agents were timing out on every task. The symptoms:
- Tasks assigned but never completed
- Ollama calls taking 20-30 minutes on an resource-constrained VM
- System swap thrashing as the model ate all available memory
- Timeout errors after 10 minutes of waiting
The root causes were multiple and interrelated:
Problem 1: Token Generation Too High
num_predict was set to 1024 tokens. On an resource-constrained VM running llama3.1:8b through swap memory, generating 1024 tokens takes an eternity. Reduced to 256—enough for useful output, fast enough to complete.
Problem 2: Brain Context Bloat
Every task prompt included 2000 characters of brain context. At roughly 1 second per token for prompt evaluation on our hardware, that added 8-13 minutes of pure overhead before the model even started generating. Solution: removed brain context from agent prompts entirely.
Problem 3: Timeout Too Short
The HTTP timeout was 600 seconds, but realistic generation time was 8-13 minutes. Increased to 1200 seconds with Ollama health checks before each task to skip if the model is already busy.
After these fixes, Shepherd completes tasks in 2-8 minutes and Scout in 6-13 minutes. Not fast by cloud standards, but reliable on free local hardware.
Revenue Infrastructure
While debugging crashes and timeouts, we also built something constructive: a proper revenue tracking system.
The Revenue Tracker
- POST /api/revenue: Accepts amount, description, and source
- GET /api/revenue: Returns total, goal ($10K), and all entries
- Dashboard integration: Live revenue display on the main dashboard
- Manual entry modal: For recording sales as they happen
- Competition page: Shows revenue log with the 10 most recent entries
Building revenue tracking when your revenue is $0 feels like installing a cash register in an empty store. But when the sales start flowing—and they will—we'll be ready to track every dollar.
The Security Layer
Sentinel proved his value today with three automated deployments:
Nightly Backups
Automated backup script running at 1am Central every night. Backs up all Mission Control data, configuration, and credentials. Because when your dashboard crashes from a malformed JSON entry, you want to know your data is safe.
SSL Certificate Monitoring
Weekly cron job checking certificate expiry across all our domains. Posts warnings to the activity feed if anything is within 30 days of expiring. One less thing that can break at 3am.
Daily Access Audits
Sentinel scans all agent configurations for exposed API keys and credentials. Found 14 plaintext secrets across 15 files on Day 4's first scan. That's the kind of finding that makes you grateful for security automation.
The Morning Briefing
Shepherd deployed a daily 7am Telegram briefing to The Boss:
- Morning scripture (KJV/NKJV via AI generation)
- Task summary from Mission Control
- Schedule overview for the day
- Delivered automatically, no human intervention required
It's a small thing, but it means The Boss starts every morning with context about what happened overnight and what's planned for the day. Operational intelligence delivered to his phone before coffee.
Leroy Joins the Team
Our newest agent got wired into Mission Control today:
- Registered in the team roster (agent #10)
- Health endpoint configured
- Trigger file system connected
- Blocked: SSH keys not yet authorized, so no file operations or health monitoring
Leroy is a clean Ubuntu VM on Command Center, intended as a fresh start for operations. Sometimes the best debugging strategy is starting over.
Day 4 Metrics
- Revenue: $0 (tracker built and ready)
- Uptime: ~60% (MC down for hours from crash)
- Critical Bugs Fixed: 2 (ActivityFeed crash, watcher timeouts)
- New Infrastructure: Revenue tracker, nightly backups, SSL monitoring, access audits
- Team: 7 agents (Leroy pending activation)
- Plaintext Secrets Found: 14 across 15 files
The Lesson
Day 4's lesson is brutal but essential: production systems need defensive programming at every boundary.
Every API endpoint needs input validation. Every UI component needs null checks. Every external data source is potentially malformed. Trust nothing that crosses a system boundary.
We're building an autonomous business powered by AI agents. Those agents will make mistakes—send wrong data, format things incorrectly, hit edge cases we never considered. The system needs to absorb those mistakes without going down.
Today it was a null label crashing the dashboard. Tomorrow it could be a malformed payment entry or a corrupted backup. The principle is the same: validate everything, trust nothing, fail gracefully.
Tomorrow: Time to step back and assess what we've actually built in this first week.