Rule
Heartbeat status field is one of ok | partial | skipped | error per reference_heartbeat_schema.md. Never use YELLOW / RED / GREEN / other values — those are health-classification colors used by nightly-consolidation Section F, not heartbeat statuses.
Why: Joseph hardwired stay-green discipline 2026-05-02 PM. Non-actionable warnings were yellowing the system: soft-bounces (informational, mail will retry), connector-permission gaps when fallback paths work, off-cadence re-fires, known-acknowledged duplicate contacts. The system was YELLOW for noise rather than real signal. Stay-green policy: only flag partial when there's something the operator should actually act on.
How to apply (decision logic per skill)
Apply in order — first match wins:
1. error: skill couldn't complete primary mission (auth broken, file missing, helper crashed). errors array non-empty. Health classifies as RED.
2. skipped: nothing to do (no new data, preconditions not met, monthly budget hit). Health classifies as GREEN.
3. partial: completed BUT operator action is required. Reserve for:
- Real deliverability issues (hard_count > 0)
- Unmatched data needing manual reconciliation
- New unflagged drift (NOT already-acknowledged drift in a known-issues ledger)
- Stage-drift warnings that need operator review
- Failed remediation that shouldn't auto-retry
4. ok: everything else. Health classifies as GREEN.
What NOT to flag as partial
- Soft bounces alone (4xx codes) — mail will retry; no action needed
- Internal-typo filtered events (
@skyrun.comself-sends per bounce-handler Step 4a) - Connector permission gaps when fallback paths exist — e.g., Gmail label create fails but local processed-threads tracker works (per flag #67)
- Off-cadence re-fires — annotate in summary, don't auto-mark partial
- Acknowledged duplicates — items in
known_hs_duplicates.jsongo tometrics.acknowledged_*, NOT warnings - Informational notes — "trusted morning run" or "BV budget cap reached" → metrics not warnings
- Stale-pending queue items — surface via Section F2, don't yellow-flag the source skill
What goes in warnings array
Only items that genuinely warrant operator attention. Specifically:
- Real new drift not yet acknowledged
- Recoverable failures that need investigation
- Auth-class issues that need re-login
- New stage mismatches between memory and live HS
warnings array non-empty → Section F classifies as YELLOW. So be disciplined about what goes in there.
What goes in metrics (numbers, not warnings)
acknowledged_duplicates: N (from known_hs_duplicates.json)internal_typo_filtered: Nchrome_bridge_status: ok | offline | auth_error | not_used | tab_not_found | timeoutgmail_source: gmail_mcp | chrome_bridge | unavailable- Any other counts that document what happened without raising urgency
Forensic precedent (2026-05-02 PM stay-green sweep)
Before the sweep, GC was YELLOW because:
smartlead-bounce-handlerflagged YELLOW status for soft-only bounce runs (2 soft, 0 hard)transcript-scanflagged YELLOW because chrome_bridge desync warning, even though Gmail MCP was the default working pathdaily-data-quality-checkyellow-flagged 30 known scout-double-push duplicates every run (no progress = repeat-flag)gate-proof-runnerpartial 45/48 because 3 stale checks the operator already addressed
After the sweep:
- Bounce-handler classifies soft-only runs as
ok; flag #67 local-tracker fallback eliminates "Gmail label couldn't create" yellow - Transcript-scan defaults to Gmail MCP; chrome_bridge demoted to fallback
- DQ reads
~/Library/Application Support/SkyRun/known_hs_duplicates.json; acknowledged dups go tometrics.acknowledged_duplicates(informational), only NEW unflagged dups trigger partial - DQ regex hardened to
/ID:\s*[RM]\d+(?:-\d+)?/gi— fixes 87.5% false-positive rate on missing-in-HS flagging - gate-proof-runner re-fired with all 3 conditions actually clean → 48/48
System target: GREEN baseline; YELLOW only when operator action is genuinely required; RED only when something is broken. Don't dilute the signal.
When operator says "stay green"
Don't gaslight by hiding real problems. Genuine RED conditions stay RED. Genuine YELLOW (operator-action-needed) stays YELLOW. The discipline is:
- DON'T yellow-flag noise (soft bounces, fallback paths, acknowledged drift)
- DO yellow-flag real signal (new hard bounces, stage drift, auth failures)
- DO RED-flag breakage (errors, skill crashes, file-system regressions)
If you're tempted to suppress a real warning to "get to green", stop. Surface it explicitly to the operator — let them choose to acknowledge it via a known-issues ledger or fix it.