← Back to brief

reference sot canonical architecture

memory · reference_sot_canonical_architecture.md

Architecture


                  ┌─────────────────────────────────────┐
                  │     SoT (CANONICAL — read-only       │
                  │     to reconciler; only operator     │
                  │     + owning skills can write)        │
                  │                                       │
                  │  Grand_County_STR_Engine_v3.5_       │
                  │  filtered.xlsx · Lead Details tab    │
                  │  866 leads · 48 cols including       │
                  │  EMAIL_ALL/PHONE_ALL ledgers          │
                  └────────────────┬────────────────────┘
                                   │
                       sot_reconciler.py
                       (every 15 min, idempotent)
                                   │
                                   ▼
                  ┌─────────────────────────────────────┐
                  │   HubSpot — DERIVATIVE                │
                  │   Joseph-owned ONLY                  │
                  │   (hubspot_owner_id = 88361194)      │
                  │   ~931 contacts                       │
                  │   Catches up to SoT every 15 min     │
                  └─────────────────────────────────────┘
                                   ▲
                                   │ BCC alias auto-log
                                   │ (R-01 scope: Joseph aliases)
                                   │
                  ┌─────────────────────────────────────┐
                  │   SmartLead — DERIVATIVE              │
                  │   Enrolled leads ⊆ SoT                │
                  │   (Step 7b auto-load from BV)         │
                  └─────────────────────────────────────┘

Why this is the architecture

SoT was always meant to be canonical

The xlsx file is literally named "Source of Truth" via the SoT acronym. Every CLAUDE.md mention treats it as the master. The DESIGN was always:

1. New leads enter SoT (scout, BV, manual)
2. SoT writes propagate to HS + SL
3. Bounces / replies / engagement events flow back to SoT
4. SoT is the only store that's edited by humans

The implementation didn't enforce it

Multiple write paths could touch HS or SL independently:

Each write path could fail or partially-complete. Drift accumulated. My audits caught drift point-in-time + applied fixes; drift returned because the writers weren't hardened.

The structural fix: continuous reconciliation

sot_reconciler.py runs every 15 minutes. Each fire:

1. Loads SoT (read-only via openpyxl read_only=True — physically can't modify)
2. Loads ALL Joseph-owned HS contacts (per RULES.md R-01 — never touches non-Joseph contacts)
3. Surveys unowned-with-LID contacts (read-only flag, NEVER auto-claims)
4. Reconciles 7 auto-fix drift classes:
- D2: HS contact has email matching SoT but missing ID:Rxxxxx token → add token
- D3: HS city corruption (street suffix in city field) → patch to SoT canonical
- D4: HS primary email differs from SoT EMAIL_ALL ledger primary → sync
- D5: HS hs_additional_emails missing entries from SoT EMAIL_ALL → sync
- D6: HS phone differs from SoT primary phone → sync
- D7: SoT lead has no HS contact → CREATE (with hubspot_owner_id=Joseph from start)
- D8: HS commission attribution differs from SoT-derived expected → sync
5. Flags 3 operator-review classes (NEVER auto-fix — would destroy data without judgment):
- F1: Cross-contamination (HS email NOT in SoT ledger at all)
- F2: Multiple HS contacts share same ID:Rxxxxx token
- F3: Unowned HS contacts with our LID tokens — operator decides whether to claim
6. Writes heartbeat per canonical schema; pushes ntfy on partial/error

Why this is permanent (vs prior bandaids)

Prior bandaid patternThis architecture
Run audit, find drift, apply fix, drift returns next dayReconciler runs every 15 min — max drift age = 15 min
Each writer hardened individually; new writers re-introduce driftReconciler reads live HS state every fire — new writers automatically reconciled
Joseph manually checks dashboardsReconciler ntfy-pushes on partial/error — operator only sees real issues
"I'll fix it next time"Idempotent — re-running is safe + no-op on already-reconciled state
Drift fixed only when noticedDrift fixed continuously regardless of whether anyone is watching

RULES.md M-01 anti-bypass: rules cannot be reasoned around

The reconciler enforces R-01 via two layers:
1. Search filter: hubspot_owner_id EQ 88361194 — only Joseph-owned contacts loaded
2. Post-load assertion: every loaded contact's hubspot_owner_id verified == "88361194"; mismatch raises RuntimeError and aborts before any PATCH

If either layer is silently weakened, PROOF 16 in gate-proof catches it (greps for the filter + assertion presence in the script source) and fails the gate-proof run. ntfy fires high-priority. Operator surfaces the regression before it can do harm.

How to add new HS-writing code

DO: import canonical patterns + add to PROOF 16 allow-list

python
JOSEPH_OWNER_ID = "88361194" # R-01

Search filter MUST include hubspot_owner_id

filterGroups: [{filters: [ {propertyName: "trackingsource", operator: "EQ", value: lead_id}, {propertyName: "hubspot_owner_id", operator: "EQ", value: JOSEPH_OWNER_ID}, ]}]

Pre-PATCH assertion (defense in depth)

def patch_hs_contact(cb, hs_tab, contact_id, props): # Re-fetch + verify owner before write current = hs_get_contact(contact_id) if current.get("hubspot_owner_id") != JOSEPH_OWNER_ID: raise RuntimeError(f"R-01 ABORT: contact {contact_id} owner = {current.get('hubspot_owner_id')!r}, not Joseph") # ... proceed with PATCH

Add the new file to PROOF 16's HS-writers allow-list in gate_proof_runner.sh:

bash
hs_writers = [
'$SR_LIB/sot_reconciler.py',
'$SR_LIB/bv_hs_sync.py',
'$SR_LIB/fix_hs_city_drift.py',
'$SR_LIB/create_missing_hs_contacts.py',
'$SR_LIB/your_new_writer.py', # ADD HERE
]

PROOF 16 verifies every listed file references 88361194. Adding a writer without the reference fails the gate.

DON'T: write HS via fetch() in skill markdown without going through the canonical helpers. Skill markdown CAN show example HS API code (PROOF 16 doesn't gate skill files because they're documentation, not executable code), but actual writes must originate from a Python helper that PROOF 16 can verify.

Files

State as of 2026-05-04 PM

Joseph's bar (the standard the architecture must meet)

> "100% pristine data integrity across the entire system 100% of the time. Nothing should get lost or dropped or missed, EVER."

The reconciler is the mechanism that delivers this for the SoT↔HS axis. SmartLead axis enforcement is partial (Step 7b auto-load + bounce-handler rotation) — true 2-way SL reconciliation requires SL API access we don't have at our tier. The Step 7b CSV import path remains the bridge.