Always-live data fleet
Joseph 2026-05-09: "We need to figure out how we keep all of this live — all the time, including local data sources."
The fleet is now three layers stacked on top of each other. Each layer is independent and bash-only (zero Claude tokens). If any one layer fails, the others compensate.
Layer 1 · Cron floor — 15-min extractor cadence
Every chrome_bridge extractor fires every 15 minutes via launchd StartInterval=900. This is the passive freshness floor — no signal needed, just runs.
| Job | Cadence | What it does |
|---|---|---|
com.skyrun.track-metrics | every 15 min | extract_track_metrics.py → pwa/track_metrics.json |
com.skyrun.keydata-metrics | every 15 min | extract_keydata_metrics.py → pwa/keydata_metrics.json |
com.skyrun.smartlead-metrics | every 15 min | extract_smartlead_metrics.py → pwa/outbound.json |
com.skyrun.session-keep-alive | every 10 min | session_keep_alive.py → pwa/source_health.json |
com.skyrun.pwa-periodic-refresh | every 30 min | build_pwa.py + deploy_pwa.sh |
.bak.20260509T112728 for rollback.)
Layer 2 · Watchdog enforcer — 3-min reactive check
freshness_watchdog.py runs every 3 min via com.skyrun.freshness-watchdog. For every source, it:
1. Reads pwa/source_health.json for probe state
2. Reads each metrics file for captured_at
3. Computes age. If older than per-source threshold:
- chrome_bridge source + probe says alive → run extractor (with WrongAccountError retry)
- chrome_bridge source + probe says expired/no_tab → skip + push ntfy on first detection
- local file → run build_pwa.py + deploy_pwa.sh
4. Pushes ntfy on 3-consecutive-failure
5. Writes heartbeat to health/freshness_watchdog_*.json
Per-source max_age_min thresholds (in freshness_watchdog.py):
- track: 15 min · keydata: 30 min · smartlead: 15 min
- deals/commitments/fleet: 60 min
- market: 90 min · leads: 120 min · lead_coords: 240 min · lint_sweep: 240 min
Effective ceiling on staleness = max_age_min + 3 min (the watchdog's polling interval).
Layer 3 · On-demand triggers — instant refresh
extract_trigger_runner.py runs every 30s polling Cloudflare KV for "Trigger now" requests posted from the PWA Sources page (P5 freshness pill button). Joseph clicks "Trigger now" → trigger lands in KV → runner picks up within 30s → extractor fires → redeploy → PWA shows fresh state within ~60s end-to-end.
Failure modes + recovery
| Failure | Detection | Recovery |
|---|---|---|
| Browser session expired | probe flips to expired | ntfy push w/ click-to-Sources URL; watchdog skips trigger; user re-auths |
| Extractor exits 1 (WrongAccountError) | watchdog _run_extractor reads stderr | retries 2x with 30s pause; if still failing → ntfy on 3rd consecutive |
| Extractor exits 0 but writes auth_expired metrics | watchdog reads chrome_bridge_status from output file | counts as failure for freshness purposes; ntfy on first detection (deduped 60 min) |
| build_pwa fails | watchdog captures non-zero rc | adds to errors[]; partial heartbeat; ntfy via dq-realtime-monitor on next tick |
| Local file >threshold + no producer running | watchdog detects stale mtime | kicks build_pwa + deploy |
What's NOT in the fleet (and why)
- HubSpot — no extractor; data pulled live by skills as needed (engagement-reconciler, sot-reconciler)
- Gmail — same reason; live MCP pulls per skill
- BeenVerified — daily scheduled task only (12/day Premium budget); freshness is by design slow
How to add a new source
1. If chrome_bridge: write extract_{source}.py, add launchd plist with StartInterval=900, register in EXTRACTORS dict in extract_trigger_runner.py and in CHROME_SOURCES in freshness_watchdog.py, add probe handler in session_keep_alive.py
2. If local: write the producer skill, ensure it writes a captured_at timestamp, add entry to LOCAL_FILES in freshness_watchdog.py with appropriate max_age_min
3. Add the source to the PWA Sources page registry (pwa/preview-sources.html)
4. Update deploy_pwa.sh to copy the new file (cherry-pick gotcha — see ~/Documents/SkyRun_Remediation_2026-05-08/HANDOFF_TO_GC_SYSTEM.md Section 3)
Verification commands
bash
Watchdog last run
cat "$(ls -t ~/Library/Application\ Support/SkyRun/health/freshness_watchdog_*.json | head -1)"
Live cron schedule
for p in com.skyrun.{track-metrics,keydata-metrics,smartlead-metrics,freshness-watchdog,session-keep-alive}; do
echo "=== $p ==="
grep -A1 "StartInterval" ~/Library/LaunchAgents/$p.plist
done
Per-source freshness (live)
cd ~/Library/Application\ Support/SkyRun/pwa
for f in track_metrics.json keydata_metrics.json outbound.json; do
echo "=== $f ==="
python3 -c "import json; d=json.load(open('$f')); print('captured_at:', d.get('captured_at'), '| chrome_bridge_status:', d.get('chrome_bridge_status'))"
done
Why this is safe (token-wise)
Every layer is bash + Python via launchd — zero Claude tokens. The Cowork Claude scheduled tasks (sot-reconciler, engagement-reconciler, qb, etc.) were the token-burners and got slowed 5/9 (see reference_token_optimization_2026-05-09.md). The freshness fleet runs on the deterministic plumbing layer, not the AI reasoning layer.