← Back to brief

reference freshness fleet

memory · reference_freshness_fleet.md

Always-live data fleet

Joseph 2026-05-09: "We need to figure out how we keep all of this live — all the time, including local data sources."

The fleet is now three layers stacked on top of each other. Each layer is independent and bash-only (zero Claude tokens). If any one layer fails, the others compensate.

Layer 1 · Cron floor — 15-min extractor cadence

Every chrome_bridge extractor fires every 15 minutes via launchd StartInterval=900. This is the passive freshness floor — no signal needed, just runs.

JobCadenceWhat it does
com.skyrun.track-metricsevery 15 minextract_track_metrics.pypwa/track_metrics.json
com.skyrun.keydata-metricsevery 15 minextract_keydata_metrics.pypwa/keydata_metrics.json
com.skyrun.smartlead-metricsevery 15 minextract_smartlead_metrics.pypwa/outbound.json
com.skyrun.session-keep-aliveevery 10 minsession_keep_alive.pypwa/source_health.json
com.skyrun.pwa-periodic-refreshevery 30 minbuild_pwa.py + deploy_pwa.sh
(Was 3-8 fires/day at clustered cron times. Migrated 2026-05-09 to flat 15-min interval — .bak.20260509T112728 for rollback.)

Layer 2 · Watchdog enforcer — 3-min reactive check

freshness_watchdog.py runs every 3 min via com.skyrun.freshness-watchdog. For every source, it:

1. Reads pwa/source_health.json for probe state
2. Reads each metrics file for captured_at
3. Computes age. If older than per-source threshold:
- chrome_bridge source + probe says alive → run extractor (with WrongAccountError retry)
- chrome_bridge source + probe says expired/no_tab → skip + push ntfy on first detection
- local file → run build_pwa.py + deploy_pwa.sh
4. Pushes ntfy on 3-consecutive-failure
5. Writes heartbeat to health/freshness_watchdog_*.json

Per-source max_age_min thresholds (in freshness_watchdog.py):

Effective ceiling on staleness = max_age_min + 3 min (the watchdog's polling interval).

Layer 3 · On-demand triggers — instant refresh

extract_trigger_runner.py runs every 30s polling Cloudflare KV for "Trigger now" requests posted from the PWA Sources page (P5 freshness pill button). Joseph clicks "Trigger now" → trigger lands in KV → runner picks up within 30s → extractor fires → redeploy → PWA shows fresh state within ~60s end-to-end.

Failure modes + recovery

FailureDetectionRecovery
Browser session expiredprobe flips to expiredntfy push w/ click-to-Sources URL; watchdog skips trigger; user re-auths
Extractor exits 1 (WrongAccountError)watchdog _run_extractor reads stderrretries 2x with 30s pause; if still failing → ntfy on 3rd consecutive
Extractor exits 0 but writes auth_expired metricswatchdog reads chrome_bridge_status from output filecounts as failure for freshness purposes; ntfy on first detection (deduped 60 min)
build_pwa failswatchdog captures non-zero rcadds to errors[]; partial heartbeat; ntfy via dq-realtime-monitor on next tick
Local file >threshold + no producer runningwatchdog detects stale mtimekicks build_pwa + deploy

What's NOT in the fleet (and why)

How to add a new source

1. If chrome_bridge: write extract_{source}.py, add launchd plist with StartInterval=900, register in EXTRACTORS dict in extract_trigger_runner.py and in CHROME_SOURCES in freshness_watchdog.py, add probe handler in session_keep_alive.py
2. If local: write the producer skill, ensure it writes a captured_at timestamp, add entry to LOCAL_FILES in freshness_watchdog.py with appropriate max_age_min
3. Add the source to the PWA Sources page registry (pwa/preview-sources.html)
4. Update deploy_pwa.sh to copy the new file (cherry-pick gotcha — see ~/Documents/SkyRun_Remediation_2026-05-08/HANDOFF_TO_GC_SYSTEM.md Section 3)

Verification commands

bash

Watchdog last run

cat "$(ls -t ~/Library/Application\ Support/SkyRun/health/freshness_watchdog_*.json | head -1)"

Live cron schedule

for p in com.skyrun.{track-metrics,keydata-metrics,smartlead-metrics,freshness-watchdog,session-keep-alive}; do echo "=== $p ===" grep -A1 "StartInterval" ~/Library/LaunchAgents/$p.plist done

Per-source freshness (live)

cd ~/Library/Application\ Support/SkyRun/pwa for f in track_metrics.json keydata_metrics.json outbound.json; do echo "=== $f ===" python3 -c "import json; d=json.load(open('$f')); print('captured_at:', d.get('captured_at'), '| chrome_bridge_status:', d.get('chrome_bridge_status'))" done

Why this is safe (token-wise)

Every layer is bash + Python via launchd — zero Claude tokens. The Cowork Claude scheduled tasks (sot-reconciler, engagement-reconciler, qb, etc.) were the token-burners and got slowed 5/9 (see reference_token_optimization_2026-05-09.md). The freshness fleet runs on the deterministic plumbing layer, not the AI reasoning layer.