← Back to brief

reference system monitoring

memory · reference_system_monitoring.md

Architecture — heartbeat-based health monitoring (shipped 2026-04-22)

Each scheduled task writes a JSON heartbeat file at the end of every run. A nightly synthesis step aggregates them into overall system health, surfaced in the morning brief and PWA.


Scheduled task runs (7 total) → writes heartbeat JSON
                                → ~/Library/Application Support/SkyRun/health/YYYY-MM-DD_{task_id}_HHMM.json

Nightly-consolidation (11pm) → reads all recent heartbeats
→ computes per-task status + overall
→ writes "Systems status" into morning brief
→ if RED, surfaces as priority #1 in brief

PWA build_pwa.py → reads heartbeat files → renders 🫀 System Health collapsible

Files

Heartbeat schema (minimum)

json
{
  "task_id": "nightly-consolidation",
  "started_at": "2026-04-22T23:00:00Z",
  "completed_at": "2026-04-22T23:03:42Z",
  "status": "ok" | "error" | "partial" | "skipped",
  "summary": "<one line, <120 chars>",
  "errors": [],
  "warnings": [],
  "metrics": { / task-specific / }
}

Status semantics

Expected cadence + max acceptable gap

TaskCadenceMax gap before RED
daily-beenverified-enrichment6am daily36h
daily-data-quality-check7am daily36h
transcript-scan8am/12pm/4pm daily8h
gmail-deep-scan9am/1pm/5pm/10pm daily6h
grand-county-property-scoutMon 6am weekly9d
nightly-consolidation11pm daily36h
historical-gmail-backfill3am daily (or disabled once complete)36h (NOT RED if complete: true)

Overall system status

Alerting — RED bubbles up

When overall status is RED, nightly-consolidation promotes the system-health issue to Today's 3 priorities in the morning brief — system health trumps BD priorities when something's actually broken. You see it before the Hadank update.

Quick-stats badge in PWA header

Top of every page load shows the overall color:


🟢 GREEN ✅ 5 pending 📋 Apr 22, 11:00 PM 🔥 1 active ⏱ 7 tasks

Any RED auto-expands the System Health collapsible below Approvals.

Known exceptions (NOT RED)

Testing

Trigger a "Run now" on any task in the Scheduled sidebar and check ~/Library/Application Support/SkyRun/health/ — a new heartbeat JSON should appear within ~5 minutes. The next build_pwa.py run will surface it.

What the retention cleanup does

Nightly-consolidation Section F deletes heartbeats older than 30 days at the end of each run. So the health dir stays bounded around 7 tasks × ~4 runs/day × 30 days ≈ 840 files worst case. Small.

How to apply