Architecture — heartbeat-based health monitoring (shipped 2026-04-22)
Each scheduled task writes a JSON heartbeat file at the end of every run. A nightly synthesis step aggregates them into overall system health, surfaced in the morning brief and PWA.
Scheduled task runs (7 total) → writes heartbeat JSON
→ ~/Library/Application Support/SkyRun/health/YYYY-MM-DD_{task_id}_HHMM.json
Nightly-consolidation (11pm) → reads all recent heartbeats
→ computes per-task status + overall
→ writes "Systems status" into morning brief
→ if RED, surfaces as priority #1 in brief
PWA build_pwa.py → reads heartbeat files → renders 🫀 System Health collapsible
Files
- Schema doc:
~/Library/Application Support/SkyRun/health/_schema.md - Heartbeat files:
~/Library/Application Support/SkyRun/health/YYYY-MM-DD_{task_id}_HHMM.json(30-day rolling window) - Consumer:
nightly-consolidationSKILL.md Section F - Renderer:
build_pwa.py→read_system_health()+render_system_health_section()
Heartbeat schema (minimum)
json
{
"task_id": "nightly-consolidation",
"started_at": "2026-04-22T23:00:00Z",
"completed_at": "2026-04-22T23:03:42Z",
"status": "ok" | "error" | "partial" | "skipped",
"summary": "<one line, <120 chars>",
"errors": [],
"warnings": [],
"metrics": { / task-specific / }
}
Status semantics
ok— task completed, primary mission accomplished, no warningspartial— completed with warnings (e.g. BV subscription exhausted, some leads skipped)error— could NOT complete primary mission (auth broken, critical file locked)skipped— ran but had nothing to do (e.g. no new transcripts)
Expected cadence + max acceptable gap
| Task | Cadence | Max gap before RED |
|---|---|---|
| daily-beenverified-enrichment | 6am daily | 36h |
| daily-data-quality-check | 7am daily | 36h |
| transcript-scan | 8am/12pm/4pm daily | 8h |
| gmail-deep-scan | 9am/1pm/5pm/10pm daily | 6h |
| grand-county-property-scout | Mon 6am weekly | 9d |
| nightly-consolidation | 11pm daily | 36h |
| historical-gmail-backfill | 3am daily (or disabled once complete) | 36h (NOT RED if complete: true) |
Overall system status
- 🔴 RED — any task has status
error, exceeded max gap, or 3+ consecutive failures - 🟡 YELLOW — any task has status
partial, non-empty warnings, or 1-2 consecutive failures - 🟢 GREEN — all tasks
okorskipped, on schedule, no warnings
Alerting — RED bubbles up
When overall status is RED, nightly-consolidation promotes the system-health issue to Today's 3 priorities in the morning brief — system health trumps BD priorities when something's actually broken. You see it before the Hadank update.
Quick-stats badge in PWA header
Top of every page load shows the overall color:
🟢 GREEN ✅ 5 pending 📋 Apr 22, 11:00 PM 🔥 1 active ⏱ 7 tasks
Any RED auto-expands the System Health collapsible below Approvals.
Known exceptions (NOT RED)
historical-gmail-backfillafter completion —complete: truemeans heartbeat absence is expecteddaily-beenverified-enrichmentduring BV credit-cap — task correctly reportspartialwith warning, stays YELLOW
Testing
Trigger a "Run now" on any task in the Scheduled sidebar and check~/Library/Application Support/SkyRun/health/ — a new heartbeat JSON should appear within ~5 minutes. The next build_pwa.py run will surface it.
What the retention cleanup does
Nightly-consolidation Section F deletes heartbeats older than 30 days at the end of each run. So the health dir stays bounded around 7 tasks × ~4 runs/day × 30 days ≈ 840 files worst case. Small.How to apply
- If you see a RED on the brief or PWA, open System Health → identify which task → read the heartbeat's errors[] array for details
- For new scheduled tasks added later: add them to the expected-cadence table in Section F AND in
build_pwa.py'sread_system_health()expected list - Never manually write to the health dir — only scheduled tasks should