Filename — REQUIRED format
~/Library/Application Support/SkyRun/health/<YYYY-MM-DD>_<task-id>_<HHMM>.json
YYYY-MM-DD— UTC date the run started<task-id>— exact taskId frommcp__scheduled-tasks__list_scheduled_tasks(e.g.nightly-consolidation,daily-beenverified-enrichment)HHMM— UTC time the run started (4 digits, no colon)
Why the format matters: system_hygiene.sh uses a defensive double-glob (_<task>_.json AND <task>_*.json) to detect overdue tasks. Filenames that don't match either pattern are invisible — the watchdog will queue false-positive overdue alerts every cycle.
Required JSON fields — NEVER null
json
{
"task_id": "<exact-task-id>",
"status": "ok | partial | skipped | error",
"started_at": "2026-05-02T01:55:00Z",
"completed_at": "2026-05-02T02:07:02Z",
"summary": "<one-line plain-English summary, < 200 chars, of what this run did>",
"warnings": [],
"errors": [],
"metrics": { "<task-specific KPIs>": "<...>" }
}
Field rules:
task_id— MUST be the exact taskId. Never null. Never the description. Never the prompt name.status— pick one of the 4 canonical values. Do NOT useGREEN/YELLOW/RED(those are F-section classifications). See "Status field — CANONICAL VALUES" section below for full discipline.started_at,completed_at— ISO-8601 UTC timestamps. NEVER null. If the run aborts mid-flight,completed_atshould still be the abort time, witherrorspopulated.summary— one human-readable sentence describing what happened. NEVER null. The PWA dashboard, system hygiene, and morning brief all consume this. If a run did "nothing notable," summary should still say so explicitly: e.g. "Clean run — no new bounces in the last 24h."warnings— list of strings (zero or more). Things the agent noticed but didn't fail on. Surfaces in PWA + watchdog escalation.errors— list of strings or objects (zero or more). Things that failed. Triggers RED status.metrics— task-specific JSON object. Counts, durations, IDs touched. Used bybuild_pwa.pyfor dashboard cards.
Optional but recommended
chrome_bridge_status— for tasks that use Chrome:ok | session_lost | auth_expired | tab_not_found | timeout. Watchdog uses this to detect silent-failure mode.run_id— unique identifier for this specific run (helpful for cross-reference between heartbeat + audit summary).next_action_for_joseph— string, surfaces in morning brief if non-empty.
Status field — CANONICAL VALUES (hardwired 2026-05-02 PM, stay-green sweep)
Status MUST be exactly one of:
ok— task completed, primary mission accomplished, no operator action needed. Health classifies GREEN.partial— task completed BUT operator action is required (real new drift, hard bounces, unmatched data, stage mismatches, auth issues). Health classifies YELLOW.skipped— task ran but had nothing to do (no new data, preconditions not met, monthly budget hit). Health classifies GREEN.error— task could NOT complete primary mission (auth broken, file locked, helper crashed). Errors array non-empty. Health classifies RED.
⛔ DEPRECATED — do NOT write these as status values
GREEN / YELLOW / RED are health classifications computed by nightly-consolidation Section F. They are NOT valid heartbeat status values. Writing them in the status field breaks downstream parsers that expect canonical values.
Forensic precedent: smartlead-bounce-handler previously wrote "status": "YELLOW" for soft-only-bounce runs. This caused two issues: (a) status field non-conforming to schema, (b) yellow-flagging non-actionable conditions (soft bounces are normal background noise — mail will retry). Stay-green sweep 2026-05-02 PM updated the skill to classify soft-only as ok.
Discipline: don't yellow-flag noise
partial is the operator's signal that something needs attention. Reserve it for real signal:
- ✅
partial: hard bounces, new HS-stage drift, unmatched leads, missing rosters, auth failures - ❌ NOT
partial: soft bounces, fallback paths working, acknowledged duplicates, off-cadence re-fires, informational notes
See memory/feedback_stay_green_discipline.md for the full discipline.
Status mapping (legacy reading-only equivalence)
When you encounter a heartbeat written before the 2026-05-02 PM sweep with non-canonical status:
| Legacy value | Treat as |
|---|---|
GREEN | ok |
YELLOW | partial |
RED | error |
Anti-patterns — these caused real bugs
❌ Null required fields
json
{"task_id": null, "started_at": null, "completed_at": null, "summary": null, "status": "GREEN"}
Why bad: build_pwa.py:328 keys on task_id for dashboard rows. Null task_id = task vanishes from PWA. Null timestamps break heartbeat-age math (results in either 999h-old display OR exception swallowed silently).
Real incidents (2026-05-01 fleet-fire audit):
close-to-onboarding-handoff— wrote 4 null fieldsdeal-postmortem-capture— wrote 4 null fieldscommitment-tracker— wrote 4 null fieldssmartlead-bounce-handler— wrote 4 null fieldsdaily-beenverified-enrichment— wrote 3 null fields (despite Step 9b spec being added 4/30 — agent wasn't following the inline spec, hence promotion to this canonical reference doc)
❌ Bare-name filename without date prefix
daily-beenverified-enrichment_2026-05-02T130640Z.json
daily-beenverified-enrichment_latest.json
Why bad: the primary system_hygiene.sh glob is _<task>_.json which requires a leading underscore-separated prefix. Bare-name filenames miss the primary glob. The defensive double-glob catches them as a fallback (added 2026-04-30) but that's a fallback, not the source of truth.
The <task>_latest.json convenience pointer is OK (it's a duplicate write of the latest heartbeat for fast dashboard reads), but the canonical YYYY-MM-DD_<task>_HHMM.json MUST also be written for every run.
How to write a heartbeat (Python example)
python
import json
from datetime import datetime, timezone
from pathlib import Path
started = datetime.now(timezone.utc)
... task logic ...
completed = datetime.now(timezone.utc)
heartbeat = {
"task_id": "my-task-id", # exact match to mcp__scheduled-tasks taskId
"status": "ok", # or 'partial'/'skipped'/'error'
"started_at": started.isoformat().replace('+00:00','Z'),
"completed_at": completed.isoformat().replace('+00:00','Z'),
"summary": "Processed N items; X new, Y updated, Z skipped.",
"warnings": [],
"errors": [],
"metrics": {"items_processed": 42, "items_new": 5}
}
filename = f"{started.strftime('%Y-%m-%d')}_my-task-id_{started.strftime('%H%M')}.json"
path = Path.home() / "Library/Application Support/SkyRun/health" / filename
path.write_text(json.dumps(heartbeat, indent=2))
For agent-prompt skills (where the runtime is Claude executing instructions), the prompt should explicitly cite this file:
> "Write a heartbeat per memory/reference_heartbeat_schema.md — required fields task_id, status, started_at, completed_at, summary must NEVER be null."
Cross-references
~/Library/Application Support/SkyRun/system_hygiene.sh— the consumer that uses heartbeats to detect overdue tasks~/Library/Application Support/SkyRun/build_pwa.py— the consumer that surfaces heartbeats in the PWA dashboardfeedback_no_fabrication_personal.md— overarching no-null-fields disciplinedaily-beenverified-enrichment/SKILL.mdStep 9b — the original inline spec that didn't propagate to other skills, hence this canonical reference doc