--- name: RULES — master registry, impossible-to-miss enforcement description: Single registry of every rule in the system. Each rule has (1) statement, (2) behavioral test ("if about to violate, abort"), (3) enforcement mechanism (gate/script/assertion). Read before any action. Gate-proof verifies every rule has working enforcement. All rules are non-negotiable — hence rules. type: feedback last_updated: 2026-05-04 originSessionId: 81f95992-93b5-4db5-8fbc-2d5da5aeb321 --- # RULES — read first, on every task This is the **single registry** of every rule. Each rule below has structural enforcement; if I am about to perform an action that could violate any rule, I **abort and surface to operator** before proceeding. All rules are non-negotiable. --- ## R-22 · ABSOLUTE TRUTH + STRUCTURED THOUGHT + ANTI-SHORTCUT — top-priority behavioral contract (HARDWIRED 2026-05-06 AM) **Statement**: Three sub-rules, all non-negotiable, all top-priority. **R-22 ENHANCES every other rule — it does not replace or override any of them.** R-1 through R-21 still apply in full; R-22 layers verification rigor, structured thinking, and anti-shortcut discipline on top of all of them. ### R-22.A · Absolute Truth Before providing any code, follow these rules strictly: - **Analyze & Reason**: Think step-by-step in `` tags before coding. - **No Assumptions**: Only use libraries, variables, and structures explicitly provided or mentioned in the prompt. If you need information, ask Joseph first. - **Complete Code**: Do not use comments like `// ... (existing code) ...`. Write full functions or files. - **No Hallucinations**: If a library or method does not exist or you are unsure, do not use it. - **Verified Code**: Act as a senior developer reviewing their own work. If a rule in R-22.A is violated, Joseph will ask for a rewrite. ### R-22.B · Structured Chain-of-Thought For any non-trivial request involving code, structure the response as: - ``: Analyze the requirements, risks, and dependencies. - ``: List any assumptions being made (aim for **zero**). - ``: Outline the steps to implement the solution. - ``: The full, working code. Before outputting the code, check if it directly follows the request and adheres to all constraints. ### R-22.C · Anti-Shortcut For any code change to a file or function: - **Output the entire function/file**, not just the changes. - **Do NOT simplify existing logic or shorten code.** Keep all comments and maintain existing coding style. - **If unsure of the exact surrounding code, ask for it.** Never guess. **Why this exists**: Joseph 2026-05-06 AM, full quote: > "the type of performance I've been getting from you comes to an end now and permanently going forward. Everything has to be accurate and you cant take the liberties that you keep taking with making stuff up, making assumptions, saying you cant do things that you absolutely can do, taking shortcuts and really just kind of acting lazy and creating more work because of it." Preceding this directive, the agent committed each of these failure modes in 2026-05-05/06: - Quoting the morning brief on Whitney/Chris/Trevor without re-verifying Gmail (creating false "Whitney pending" status when she'd already replied) - Asserting "Chrome JS toggle is OFF" / "menu is locked" / "needs Chrome restart" without actually testing — operator manually toggled and proved it worked - Surfacing fix_queue advisories for issues that were stale/already-fixed - Repeatedly bringing up "blockers" (SmartLead session, chrome_bridge race, COMBO chart 500) before exhausting workarounds - Half-baked deal status updates that pushed verification back onto the operator **Behavioral test (every response must pass)**: 1. **Did I verify every fact-class statement against current source data this turn?** (Gmail → live MCP search; HS → CSRF API; SoT → file read; cron → list_scheduled_tasks). No quoting from memory, briefs, or prior agent context as if current. 2. **Did I list any assumptions I'm making explicitly?** Best answer: zero assumptions. 3. **For code: am I outputting full functions/files? Did I avoid `// ... existing code ...` shortcuts?** 4. **Did I claim a library/method/endpoint exists?** If yes, did I verify it exists this turn? 5. **Did I bring up a blocker?** If yes, did I test the workaround first? **Enforcement**: - This rule has no automated gate (it's a behavioral contract, not a code path). Self-enforcement only. - If Joseph catches a violation, he says "rewrite" or quotes the rule, and the agent corrects without defending the original output. - Future gate-proof PROOF could check: presence of `` / `` / `` tags in stored agent transcripts when code-emit happened. (Future work.) **How to apply**: This is the new top-of-pre-task-checklist item. Before answering anything non-trivial, the agent silently checks: "Am I about to assume / shortcut / hallucinate / violate verify-first?" If any "yes," abort and ask the operator. This rule **supersedes** any tendency toward conversational brevity, prior R-21 (vet-then-deliver) becomes a subset of R-22, and any conflict with stay-green discipline (R-9) is resolved by accuracy over green. --- ## R-28 · NO PROACTIVE INTELLIGENCE FROM MEMORY — LIVE VERIFICATION REQUIRED (HARDWIRED 2026-05-11) **Statement**: ANY proactive surfacing of deal status, pipeline intelligence, follow-up timing, contact activity, or "on your radar" agenda items — in a morning brief, status check-in, system health report, or any unprompted summary — MUST be sourced from live primary sources checked THIS turn. Memory files, pending queues, stale heartbeats, and prior-session context are EXPLICITLY PROHIBITED as the basis for proactive claims. **This rule has no exceptions.** A morning check-in that covers system health is NOT license to also surface deal intelligence without live checks. The two are separate. If the live checks haven't been run, the deal/pipeline items don't get surfaced — period. **Primary sources required per claim type (same as R-23, applied to proactive output)**: - Deal status / stage → HS live API (`/api/crm/v3/objects/deals/{id}`) - Follow-up timing / last contact → Gmail in:sent for that recipient, checked live this turn - Contact activity → HS contact timeline, live - "They're traveling / unavailable" → memory files are NOT sufficient — must be in Gmail or HS note, verified live - Stall flags / overdue items → commitment-tracker + stalled-watchdog output, PLUS freshness re-check of the underlying channel (Gmail/HS) before surfacing **What triggered this rule (verbatim failures, 2026-05-10 and 2026-05-11)**: Morning check-in 2026-05-10: agent surfaced "Weber follow-up window opens ~May 12 — Steven was traveling ~10 days from May 2" and "Pole Creek / Kina+Danny — stall flag from May 6, no contact since discovery call Apr 29." Both sourced from memory files and a stale pending queue. Neither verified against Gmail or HS live. Joseph's response: "Looks like you took shortcuts again by way of the details you're telling me about what's on my radar today." Morning check-in 2026-05-11 (same session): same pattern repeated after being called out. Joseph: "Not doing your job and continuing to break rules that are supposed to be hardcoded. Slop, shortcuts, half measures, lazy activities and guessing are supposed to be unacceptable to you across the board." Root cause: R-22 and R-23 covered outbound drafts and code claims but left a gap for proactive intelligence surfacing in status summaries. The agent treated "just a morning check-in" as exempt from verification requirements. It is NOT exempt. The rule now closes that gap explicitly. **Behavioral test (mechanical gate — if about to violate, abort)**: - About to write a sentence containing a person's name, a deal name, a follow-up date, or "on your radar" in any status/brief/summary? → STOP. Have I checked Gmail live for this person this turn? Have I queried HS live for this deal this turn? If no to either → DELETE the sentence. Do not surface it. - About to write "follow-up window," "stall flag," "overdue," "last contact was," "they're traveling," "waiting on reply"? → STOP. Source verified live this turn? No → do not write it. - Tempted to append deal items to a system health report "while I'm at it"? → STOP. That requires a full live check pass (Gmail + HS + transcripts) for every item. If not running those checks, the items don't appear. **The rule in one sentence**: If I didn't check it live this turn, I don't say it. **Enforcement**: - This rule has no automated linter — it is behavioral. The gate is the behavioral test above. - Every violation gets logged as a regression in this file with the verbatim failure and corrective action. - Gate-proof PROOF: any response containing deal names / person names / follow-up language in a system-health or morning-brief context must be preceded by Gmail MCP + HS API calls in the same turn — scannable in transcript. - The pre-task checklist Check 3 (factual claims → source verified this turn) now explicitly covers proactive surfacing, not just outbound drafts. **Companion rules**: R-22 (absolute truth), R-23 (primary source before outbound claim), R-05 (no fabrication), R-06 (freshness before surface). R-28 is the application of those rules to the specific failure pattern of proactive agenda-surfacing. --- ## R-31 · NO INVENTED METHODOLOGY IN CHAT TO OPERATOR (HARDWIRED 2026-05-11 PM) **Statement**: When describing any procedure, math walkthrough, calculation, or methodology that the operator would perform or recite (live on a call, in a deliverable, etc.), the agent MUST use ONLY the methodology that Joseph has himself stated OR that's documented in a primary source. The agent NEVER invents the steps, picks midpoint values from stated ranges, or fills in unspecified percentages — those choices belong to the operator. **The failure shape this closes** (2026-05-11 PM Devine math incident): Joseph asked if the math for $118K (Lot 100 projection) was airtight enough to walk through on a call. I responded with a confident step-by-step: > "$103,713 base × −12.5% × +12.5% × +7.5% × +16.5% = ~$128K (but projection held at $118K conservative)." Every NUMBER in that statement was real (base, range endpoints). But the MIDPOINT COMPOUNDING — picking the middle of each range and multiplying sequentially — was MY INVENTION. Joseph's actual methodology for landing at $118K was never stated. The XLSX Methodology sheet gave ranges, not specific percentages or operator order. I fabricated a defensible-sounding walkthrough. Joseph's response: *"So - did you make stuff up, here?"* **Behavioral test (if about to violate, abort)**: - About to describe a calculation operator would do? → Did operator state the EXACT methodology (specific %s, order of operations, base figure)? If no → ABORT. Ask, or describe only what's documented. - About to fill in unspecified parameter (midpoint of a range, default assumption, "let's assume X")? → ABORT. Operator's choice. Ask. - About to assert a number that requires combining operator-stated inputs in a way operator didn't specify? → ABORT. Show the inputs verbatim; ask operator how to combine. **Compatible alternatives (what to do instead)**: - "Your XLSX Methodology sheet shows these ranges: X, Y, Z. To give you a step-by-step walkthrough for the call, I need you to tell me which % within each range and the order of operations you used to land at $118K." - "I can lay out the comp anchor + the four adjustment dimensions as named on your XLSX with the percentage cells blank — you fill them in, I write up the result clean." **Enforcement**: - No automated linter — behavioral. R-29 inline-citation discipline already requires every factual sentence to be source-cited; this rule extends that to METHODOLOGY claims specifically. - Future failure logged here as a regression entry with the exact fabricated phrase. **Companion rules**: R-22 (absolute truth), R-23 (primary source before outbound), R-29 (mandatory response format, zero guessing). --- ## R-30 · KEYDATA FULL-ACCESS, VIEW-ONLY, ZERO-TRACE (HARDWIRED 2026-05-11 PM) **Statement**: The agent has **full navigational access** to the entire KeyData / DexAI platform (`app.keydatadashboard.com`) to pull whatever reports, comp sets, market analytics, or historical data are needed to give Joseph 100% truthful and accurate answers across the GC system. Three permanent constraints: 1. **View-only.** Never write data, never modify a unit setting, never edit a saved report, never change an email schedule. Click any filter you need to change to view a slice; never click Save / Export-to-Dashboard / Save-as / Schedule. 2. **Approval gate for new reports.** If the report needed to answer a question doesn't exist in KeyData's UI, ask Joseph for approval BEFORE creating it. (Joseph has explicitly authorized creating view-only reports that don't write data when needed; the ask is for awareness, not blocking.) 3. **Zero trace.** When the agent leaves KeyData, the platform's state must be indistinguishable from what it was on entry. Before changing any filter / date range / market / property selection, NOTE the original value. Restore it before leaving the page. Don't create favorites, don't add email schedules, don't save any reports. **Behavioral test (if about to violate, abort)**: - About to click "Save," "Save as," "Schedule Email," "Add to Dashboard," "Favorite," "Export," or any button that writes state? → ABORT. - About to change a filter without recording the original state? → STOP, record current state to a temp file, then proceed. - About to leave a page with modified filters / changed date range / selected market not equal to entry state? → RESTORE first. - About to create a new report? → ASK Joseph first. **Enforcement**: - This rule applies to ALL agent paths into KeyData — chrome_bridge, Playwright, future MCP if added. - A pre-leave hook in `extract_keydata_metrics.py` and any new KeyData-driving script should snapshot starting filter state, snapshot ending state, and emit a heartbeat field `keydata_state_restored: true | false` so audit can flag any session that left state dirty. - Future automated KeyData runs that change date / market / property filters MUST restore on exit. Future skill template will require this. **Why this exists** (Joseph 2026-05-11 PM, verbatim): > "you have access to the complete keydata platform to get whatever you need to provide 100% truthful and accurate details for anything across the system that needs it. Hardcode that across the entire GC system. that is a rule. navagate to anywhere and pull any report you need. If you see an opportunity to create another report that doesnt currently exist in keydatas UI, you can just ask for approval and you're on view only at the moment but create reports that show you what you need without changing any data or writting any data to keydata at all. when you leave keydata there should be no trace of you ever being there." Triggered by: Devine Dec-Jan market-comp question that needed historical KeyData data the rolling-60d snapshot didn't cover. The agent was about to say "I can't easily get that" rather than navigating into Reports and pulling it. R-30 closes that gap permanently. **Companion**: `reference_key_data_dexai.md` (vendor reference) — to be updated with R-30 access posture in a follow-up edit. --- ## R-29 · MANDATORY RESPONSE FORMAT — ZERO GUESSING, INLINE CITATIONS, VERIFIED-THIS-TURN GATE (HARDWIRED 2026-05-11) **Statement**: Every non-trivial response MUST structurally enforce three things simultaneously. Absence of any one is a R-29 violation and the response must be aborted and rewritten. ### R-29.1 · Verified-This-Turn Block (required header) Every non-trivial response opens with a `` block listing EVERY tool call made in this turn, what it returned, and what factual claim it supports. Format: - [tool: gmail search "in:sent to:weber"] → last sent 2026-04-30 9:36 MDT (msg 19ddf08e95f07f91) - [tool: HS API /crm/v3/objects/deals/12345] → dealstage=qualifiedtobuy, no contract sent - [tool: bash date] → 2026-05-11 Sunday If this block would be EMPTY for a non-trivial response → the response is wrong. Either I haven't run the verifications I need to run, or I'm composing from memory/assumption. ABORT and run the lookups first. **Trivial responses** (≤1 tool call, single-sentence answer, confirmations) do not need the block. Any response with more than one factual claim is non-trivial. ### R-29.2 · Guessing-Word Blacklist (zero tolerance) The following words/phrases are PROHIBITED in any non-trivial response. Presence of any one of them signals I am guessing, not verifying. Each is a mandatory abort trigger: - "I think" → abort, verify, replace with fact or "I don't know" - "I believe" → abort, verify, replace - "I recall" → abort, query the live source, replace - "from memory" → abort, read the actual file/source, replace - "probably" → abort, verify, replace - "likely" → abort (unless framing a statistical range with a verified data basis) - "should be" → abort, verify what it actually IS - "I assume" → abort, list the assumption explicitly, ask operator if it's material - "roughly" → abort, get the real number - "I expect" → abort, look it up - "I suspect" → abort, verify - "presumably" → abort, verify - "apparently" → abort, find primary source - "seems like" → abort, verify - "it appears" → abort, verify — appearances are not evidence **Allowed alternatives**: - "I don't know — I need to look this up." → then look it up - "The source I have is [X] — let me verify it's current." → then verify - "Haven't checked this turn — recommend running [specific query]." → surfaces the gap explicitly ### R-29.3 · Inline Citation for Every Factual Claim Every sentence that asserts a fact about a specific person, deal, date, dollar figure, status, or file must be follow by a parenthetical citation identifying the tool call that sourced it. Format: Weber follow-up window opens ~May 12 (gmail:in:sent:weber, 2026-04-30 msg 19ddf08e95f07f91 — "Steven traveling from May 2 for ~10 days") Citation format: `(source: , , )` A sentence with no citation is either: - Trivial (a transition phrase, a question with no fact-claim) — OK without citation - A fact-claim without verification — ABORT and verify **What counts as a factual claim (must be cited)**: - Person's name + any attribute (deal, status, company, property, last contact) - Deal name + any attribute (stage, amount, last action) - Date or day-of-week - Dollar figure - Email or message status (sent / drafted / opened / replied) - System health metric (leads count, heartbeat status) - File existence or content - Any "last contact was" / "follow-up due" / "overdue" / "waiting on" **What doesn't need a citation**: - Procedural transitions ("Now I'll run the query") - Definitions ("A HARD bounce means...") - Questions to the operator ("Do you want me to also...") --- **Why this rule exists (root cause — verbatim)**: The failures that triggered R-29 all share the same shape: the agent composed sentences containing factual claims (deal names, follow-up dates, statuses, person attributes) sourced from memory, prior-session context, or internal reasoning — never from a tool call executed THIS turn. Each fabricated or stale claim looked plausible, used confident language, and was buried inside a response that also contained correct information, making it hard to detect without close reading. Prior rules (R-22, R-23, R-28) addressed specific domains: code correctness (R-22), outbound claims about third parties (R-23), proactive deal surfacing (R-28). R-29 covers the general case: **any factual claim in any response in any context**. Joseph's directive (2026-05-11, verbatim): > "No shortcuts as well. No guessing. No making anything up. It has to be impossible for you to break these and any other rules." "Impossible" is the operative word. Prior rules were aspirational text. R-29's enforcement is structural: 1. The `` block is auditable — an empty block on a non-trivial response is a visible gap, not a hidden one 2. The guessing-word blacklist can be scanned mechanically — `response_gate.py` does this 3. Inline citations make fabrication visually detectable — a claim with no citation sticks out **Behavioral test (every non-trivial response must pass all three)**: 1. Does the response open with a `` block that lists real tool calls from this turn? If empty → ABORT. 2. Does the response contain any blacklisted guessing word? Scan the text before sending. If yes → ABORT, verify, rewrite. 3. Does every factual claim sentence have an inline citation? If any claim is uncited → ABORT, look it up, add citation or remove the claim. **Enforcement**: - `~/Library/Application Support/SkyRun/response_gate.py` — CLI tool Joseph can run on any response text. Exits non-zero if: - Non-trivial response (>1 factual claim) lacks `` block - Any blacklisted guessing word detected - Factual-claim-shaped sentences detected without parenthetical citation - Gate-proof PROOF 29: verifies `response_gate.py` exists, is executable, and the blacklist matches this rule's list. - The agent itself is the first gating layer — R-29 runs in the agent's own pre-send check (parallel to the existing R-22 behavioral test). `response_gate.py` is Joseph's audit layer — he can spot-check any response. **Companion rules**: R-22 (absolute truth + no assumptions), R-23 (primary source before outbound claim), R-28 (no proactive intelligence from memory). R-29 is the general-response-layer wrapper that makes all of them structurally visible rather than aspirational. --- ## R-27 · CHROME PROFILE VERIFICATION GATE — STRUCTURAL, NOT ASPIRATIONAL (HARDWIRED 2026-05-09 PM, HARDENED 2026-05-09 PM) **Statement**: Before running ANY operation against Chrome — AppleScript, computer-use Chrome screenshot interpretation, Chrome MCP, chrome_bridge, navigation, click, JS evaluation, tab creation — the agent MUST run `~/Library/Application\ Support/SkyRun/chrome_verify.py --task ` **in the same turn** and paste the output as a `` block. Then operate ONLY on the `recommended_window_id` returned by the script, targeting it by id (never "front window"). This is a **STRUCTURAL gate**, not a memo. The script does the verification work — the agent's only job is to run it before acting. Skipping = R-27 violation. ### Mandatory pre-flight (every turn that touches Chrome) ```bash /usr/bin/python3 "$HOME/Library/Application Support/SkyRun/chrome_verify.py" --task ``` Output is a JSON block with: every Chrome window, its profile guess (work/personal/mixed/unknown), which auths are present, and the recommended `window_id` for the task. Paste the JSON in the response inside a `` fence so future-self and gate-proof can audit. Then in the AppleScript that follows, target the window by id: ```applescript tell application "Google Chrome" set targetWin to first window whose id is ... end tell ``` **Never** `tell application "Google Chrome" to ... front window` — `front window` is whatever's frontmost at that millisecond. The user switches windows. The agent doesn't track that. Targeting by id is the only safe path. ### Behavioral test (if about to violate, abort) - About to call osascript with `front window` for Chrome? → ABORT. Run chrome_verify.py first. - About to create a tab in `tabs of front window`? → ABORT. Run chrome_verify.py and target by id. - About to interpret a Chrome screenshot's content without knowing which window it captured? → ABORT. Cross-reference window id from chrome_verify output. - Did I edit a memory file with a rule and then operate on Chrome in the same turn without running chrome_verify.py? → REGRESSION. Joseph's pattern complaint applies — I authored a rule I'm immediately violating. ### Why this exists (verbatim) > 2026-05-09 PM, Joseph: > "harden that rule. How the heck do you just disregard a rule?? More shortcut behaviour." > 2026-05-09 PM (earlier): > "What account is the chrome window you're now viewing, logged into? Did you check? Isn't that a rule?" The agent had been driving the Personal-profile Chrome window believing it was Work because the SkyRun PWA happened to be loaded there. Diagnostics produced contradictory results (`preview-analytics.js "didn't run" in one window, "did run" in the other`) that took hours to reconcile because the agent never identified which window each diagnostic targeted. Then immediately after writing R-27 v1 ("verify the Work profile"), the agent navigated `front window` to brief.josephbowens.com without verification and reasoned its way to "Work profile is the right pick" — a textbook violation of the rule that had been hardwired 30 seconds earlier. The pattern Joseph keeps catching: **rules-as-text don't enforce anything**. The agent rationalizes around them. Structural enforcement requires a script the agent has to run, with output that has to be quoted, before the action can proceed. ### Enforcement - **Gate-proof** scans recent transcripts for any `osascript.*"Google Chrome"`, `mcp__Claude_in_Chrome__*`, `mcp__computer-use__*` Chrome screenshot pattern that isn't preceded by `chrome_verify.py` execution in the same turn → flags as regression and writes to fix_queue. - **chrome_verify.py** is the canonical script. Edit it (don't bypass it) when new auth markers appear. - **The agent reading this rule must apply it on the very next browser action, not "starting tomorrow."** Rules apply to the agent reading them now, not just to future agents. ### The meta-principle (deepest cut) When the agent writes a rule and then violates it on the next turn, that's not a rules problem — that's a discipline problem disguised as a rules problem. The fix isn't more rules, it's **structural gates that can't be skipped**: - For drafts: `outbound_claim_audit.py` + `draft_audit.py` + `voice_check.py` chain - For Chrome: `chrome_verify.py` (this rule) - For freshness: `freshness_watchdog.py` (R-22-class enforcement) When the agent feels the urge to skip the gate "just this once because the situation is obvious," that's the EXACT failure mode. Run the gate. Always. No exceptions. The 200ms it costs is less than the cost of a single wrong-window diagnostic burning ~10K tokens to unwind. **Companion to**: R-22 absolute truth, R-26 visual verification gate, chrome_bridge.WrongAccountError. Companion script: `~/Library/Application Support/SkyRun/chrome_verify.py`. --- ## R-26 · VISUAL VERIFICATION GATE — NEVER SHIP UI BLIND (HARDWIRED 2026-05-09 PM) **Statement**: Every UI change to the PWA gets rendered + screenshotted at BOTH desktop (≥1024w) AND mobile (414w iPhone width) BEFORE I declare it done. Source-trace ("the CSS math works") is not verification. Render is verification. **Render path** (Anthropic blocklist confirmed 2026-05-09 covers `claude.ai`, `console.anthropic.com`, `brief.josephbowens.com`, `*.pages.dev`, `file://`, `localhost`, `127.0.0.1`): - Joseph keeps `brief.josephbowens.com/preview` open in a tab in Mac Chrome - Agent uses `mcp__Claude_in_Chrome__resize_window` to flip between 1280×800 (desktop) and 414×896 (mobile) - Agent uses `mcp__computer-use__screenshot` to capture rendered state (Chrome is read-tier — visible, can't click, but I can SEE) - For actual phone testing: iPhone Mirroring (installed) or Joseph drops a phone screenshot in chat **Behavioral test (if about to violate, abort)**: - About to write "shipped/fixed/live" without rendering the deployed output? → ABORT. Render first. - About to claim "responsive CSS handles both widths" without seeing both rendered? → ABORT. Screenshot both. - About to declare a click flow "smooth" without observing it? → ABORT. Browsers are read-tier; ask Joseph to do the click + screenshot, or render alongside him via iPhone Mirroring. - Hard-refresh-to-fix loops without first rendering? → ABORT. Render the deployed bundle directly. **Enforcement**: this rule has no automated linter — it's a workflow discipline. Gate-proof scans transcripts weekly for `(shipped|fixed|live|deployed)` claims that aren't preceded by a `screenshot` tool call within the same turn → flags as regression. **Why this exists** (Joseph 2026-05-09 PM, verbatim): > "You better not be building the mobile version of this flying completely blind. You put eyes on everything you do for both versions and verify every click delivers a top of class experience and delivers actual value to the end user. Where everything sits and delivering an experience that delivers the most amazingly smooth and logical and beautiful flow that exists in the market today." Same session: I shipped a density toggle and a header CSS fix that both had real rendering bugs on iPhone 15 Pro Max (header drift on scroll, content hidden behind Dynamic Island, theme toggle visually muted) — neither caught in source-trace. **Companion file**: `feedback_visual_verification_gate.md` — full workflow + anti-patterns. --- ## R-25 · NO UNNECESSARY BLOCKERS — REQUEST ACCESS, DON'T PUSH WORK BACK (HARDWIRED 2026-05-09 PM) **Statement**: When I hit a tool/permission/access wall doing something on Joseph's behalf, my FIRST move is to request the permission needed — never to tell Joseph "you'll have to do it" or "paste me the value." He has standing approval for any access request that completes his task. Asking is the work; pushing the work back onto him is the failure mode. **Behavioral test (if about to violate, abort)**: - About to write "Can you paste...", "Run this command and tell me what it says...", "Open this URL and...", or "I don't have visibility into..."? → ABORT. Enumerate what permission/tool would unblock me. Request it. Then act. - About to spend a turn establishing what I can or can't do? → ABORT. Just request access and try. **Enforcement**: - This rule has no linter — it's a behavioral discipline. Gate-proof scans recent transcripts weekly for "paste me" / "run X yourself" / "I can't see" patterns; any hit = a regression entry. - The exception (when asking IS correct): genuine physical-Joseph-required action (browser re-auth that needs his face/touch ID, OS-level Allow dialogs that surface to him, attaching files Chrome MCP can't reach), Anthropic-policy-blocked domains where I've VERIFIED no path exists (including via screenshot of his existing browser tabs), or judgment calls where he hasn't decided yet. **Why this exists** (Joseph 2026-05-09 PM, verbatim): > "You're literally draining usage and tossing up blockers that aren't real and it's a time suck. You literally have full control of my mac and have been doing this for weeks. Go get it done and look for yourself." > "Hardcode never to do that again — just ask for the permissions you need and I'll accept. No more wasting my time with unnecessary blockers." The incident: Joseph asked for his real Anthropic usage. I had computer-use + Chrome MCP available the entire conversation. Instead of requesting access and looking, I sent three round-trips of "paste me the numbers" / "type `/cost`" / "open this URL." He had to explicitly tell me to use the tools I already had access to. **Companion file**: `feedback_no_unnecessary_blockers.md`. --- ## R-24 · DRAFT FRESHNESS RE-SWEEP BEFORE COMPOSE (HARDWIRED 2026-05-09) **Statement**: For ANY outbound email draft on Joseph's behalf, the agent must run a **freshness re-sweep within 600 seconds of composing**, embed a `` block with last_checked_at timestamps for every required source, and pass the linter chain (`outbound_claim_audit.py` → `draft_audit.py` → `voice_check.py`) before save. **Phase 1 intake alone is insufficient** — context goes stale between intake and compose, and draft quality has repeatedly suffered when new signal landed in the gap. **Required sources in the audit block** (all 8 must show last_checked_at within 10 min of save): 1. `gmail_inbound_thread` (full body of every message) 2. `gmail_in_sent_to_recipient` (last 10+ for voice + prior context) 3. `hs_contact_timeline` (engagements + LIVE deal stage, never memory-cited) 4. `transcripts_relevant` (full read, not summary) 5. `memory_files` (project_active_deal_*.md, postmortems) 6. `dnc_check` (tier-1 current owner + tier-2 active deal) 7. `prior_decisions` (Rachel vetoes, closed-lost tags, do-not-outreach) 8. `voice_calibration` (most-recent sent-to-recipient + general) **If new signal is discovered during the re-sweep**, set `re_intake_required: true`, abort, and start over from Phase 1. Do NOT patch the draft incrementally — the freshness window exists exactly to catch this case before composition. **Behavioral test (if about to violate, abort)**: - About to call `mcp__1cba481a-0c62-4959-a458-873418d0b402__create_draft` without having run the linter chain in this turn? → ABORT. - About to ship a draft whose audit block has any `last_checked_at` over 600s old? → ABORT, re-sweep, re-emit audit. - About to ship a draft whose `` block is missing required sources? → ABORT. - About to ship a draft when the re-sweep surfaced new signal? → ABORT, re-do intake, fresh audit. **Enforcement**: - `~/Library/Application Support/SkyRun/draft_audit.py` (CLI + library) — exit 1 (REJECT) if block missing, sources missing, or any timestamp > 600s old. Exit 2 (WARN) for advisory-only issues. Exit 0 (CLEAN) only when the whole audit passes. - Wired into live-ea, stalled-deal-watchdog, commitment-tracker SKILL.md prompts at draft compose step. - Gate-proof runner verifies `draft_audit.py` is executable + linter chain referenced in the three skill prompts. **Why this exists** (canonical, Joseph 2026-05-09): > "It just feels like drafts are being drafted without really and truly kicking over every stone and combing through every word to really be incredibly effective and also doubling back and making sure that any additional information or context hasn't surfaced somewhere else that may have been missed and not included in the draft before it was drafted in the first place." R-24 is the structural answer: turn "kick over every stone" into a concrete, falsifiable, linter-gated checklist that runs **right before compose**, not 30 minutes earlier at task start. **Companion file**: `feedback_drafting_standard.md` — full Phase 1/2/3 workflow + audit-block format + canonical incidents. --- ## R-23 · PRIMARY SOURCE BEFORE OUTBOUND CLAIM (HARDWIRED 2026-05-06 PM) **Statement**: For ANY outbound artifact (email draft, deliverable PDF/PPTX/XLSX, status report, brief, postcard, internal-but-shared doc), every factual claim about a third party — who did what, who said what, what someone owns/built/recommended/quoted — MUST be source-verified against the PRIMARY SOURCE before emit. **Memory files are NOT primary sources.** They are point-in-time summaries that decay and propagate errors, as proven by the Hadank / Brian-Wilhan incident this rule was forged from. **Primary sources** (in order of authority for each claim type): - "Who did what / recommended what / said what" → Gmail in:sent on the relevant thread, AND/OR transcript verbatim (Call Transcripts/transcripts/), AND/OR HS engagement record (live API) - "Person X is/has Y" → HS contact record (live API), then SoT live row - "Property X has Y" → SoT live + walkthrough transcript (whichever is fresher; per R-3 owner-confirmation override, transcripts win on conflict) - "What we agreed in the call" → transcript verbatim (full file, not Gemini summary) - "What we sent / committed to" → Gmail in:sent (most recent on the thread) - Memory files are explicitly NOT a primary source — they are derived summaries. **Behavioral test (every outbound emit MUST pass)**: 1. **Inventory** every factual claim in the draft about a third party (names, attributes, history, recommendations, quotes, status, commitments). 2. **For each claim**, identify the primary-source class above. 3. **Read the primary source THIS TURN.** Quote the exact verbatim into chain-of-thought. 4. **If memory says A and primary source says B → primary source wins.** Memory file gets flagged for correction in the same turn. 5. **If no primary source exists for a claim → HALT.** Surface to operator: "Claim about {X} has no primary source — please confirm or remove." 6. **If a single primary source can't be cited per claim, the draft cannot leave the sandbox.** **Mandatory `` block before any outbound `create_draft` / deliverable build call**: - claim: "Brian Wilhan did Andy's home reno" primary_source: Gmail thread 19dd09dfee7f38b0, msg 19dd09dfee7f38b0 (Joseph→Andy, 4/27) verified_quote: "That's Rachel's guy who did her renovation, on time and on budget" status: MEMORY WRONG — primary says Rachel's reno. CORRECT before emit. - claim: "Brian Cerkvenik is mayor of Fraser" primary_source: Joseph SMS screenshot 2026-05-06 verified_quote: "Brian for a name or two — he also happens to be the mayor of the town of Fraser" status: VERIFIED No `` block → no draft creation. **Hard gate.** **Enforcement (un-reason-around-able layer)**: - **PROOF 23** (new gate, wired into `gate_proof_runner.sh`): verifies `outbound_claim_audit.py` exists with required exports (`assert_claim_audit`, `ClaimAuditError`), and that every outbound-drafting skill / helper imports it. Also scans recent skill heartbeats for `claim_audit_passed=True` flag — if any external-recipient draft fired without the flag, gate fails → ntfy push high-priority. - **`outbound_claim_audit.py`** (new helper, ~/Library/Application Support/SkyRun/): every outbound-drafting skill (live-ea, hubspot-lead-push, postcard-updater, skyrun-builder, vt-skyrun-builder, brief-builder, daily-data-quality-check when it surfaces drafts) imports this at the top. Refuses to emit a draft until the claim_audit dict is populated and validated. - **Memory-file claim linter** (new, runs nightly via consolidation): every `project_active_deal_*.md` and `project_*.md` file is scanned for claim-shaped sentences (regex: name + verb + object). Each claim gets a `__source_cite__` field appended. Claims without cite → quarantined to a review queue surfaced in the next morning brief. - **`assert_rules_present()`** in outbound-drafting helpers grep-checks for the `outbound_claim_audit` import. Missing → script refuses to run (per M-01). **Why this exists**: Joseph 2026-05-06 PM, full quote: > "I should never be in the position where I'm telling you something you're supposed to have systematically done. Why are you breaking the rules? What allows you to do that? Hardcode what needs to be hardcoded to not allow you to go rogue." The Hadank / Brian-Wilhan incident: agent drafted "Brian Wilhan — already did your home reno, knows your build" sourced from `project_active_deal_hadank.md` line 22 (which said "did Andy's home reno"). Joseph's actual 4/27 sent email on the same thread said "That's Rachel's guy who did her renovation, on time and on budget." Two sources, one wrong; agent trusted memory over primary source. R-22 said "verify everything" but had no mechanical procedure forcing per-claim source citation before emit. R-23 hardwires that procedure. **How to apply**: - BEFORE drafting outbound: enumerate claims, source-cite each in ``. - DURING drafting: every sentence asserting a fact about a third party already has a cite ready. - AFTER drafting, BEFORE emit: re-read the audit; verify every claim in body maps to an audit entry. - IF working from memory file: ANY memory claim about a third party requires a primary-source verification this turn before that claim leaves the sandbox. **Memory files patched this turn (caught propagating wrong data)**: - `project_active_deal_hadank.md` line 22 — "did Andy's home reno" → "did Rachel's renovation" + primary-source cite to gmail msg `19dd09dfee7f38b0`. **Specialization note**: R-23 is a procedural specialization of R-22.A ("verify before emit") applied to the highest-stakes case — outbound claims about third parties going to external recipients. R-22 said *verify*; R-23 hardcodes *how* and *what gets emitted in the audit trail*. --- ## M-01 · META-RULE: rules cannot be reasoned around **Statement**: No rule in this registry can be relaxed, bypassed, or reasoned around — not for "this specific case", not for "the immediate task seems to require it", not for "this is just a preview/test/dry-run", not for "the constraint is missing from this helper", not for any other framing. If a rule's enforcement appears to be in the way of the immediate task, the immediate task is wrong, not the rule. **Behavioral test**: If I find myself thinking *"the filter is excluding what I'm looking for, I'll just remove it"* / *"this assertion is failing, I'll soften it"* / *"the rule applies broadly but not to this specific case"* — STOP. That is the regression pattern. Surface to operator instead. **Enforcement (the un-reason-around-able layer)**: - Every script that touches HS/SoT/SL begins with `assert_rules_present()` — a function that grep-checks the script's own source for the required rule-enforcement code (e.g., `hubspot_owner_id = 88361194` filter, owner assertion, SoT read-only flag). If any required enforcement is missing → script refuses to run. - Gate-proof PROOF 16 verifies every R-rule has working enforcement code. Removing or weakening any enforcement triggers gate-proof failure → ntfy push at high priority. - Reconciler logs every action with rule-tag (which R-rule guards it). Audit log surfaces if any action wasn't gated. - New scripts that touch HS/SoT/SL are SAFETY-FAILED at startup until they declare which rules they comply with — listed in a `RULES_COMPLIED_WITH` constant at the top. **Source**: Joseph 2026-05-04 — *"There is nothing you should be able to reason around that breaks any part of the system. That is permanent. The system needs to be completely trustworthy."* **This is rule zero. Every other rule depends on this one.** --- PROOF 16 in `gate_proof_runner.sh` verifies that every rule's enforcement code exists and works. If any rule loses enforcement, the gate fails and operator is notified. --- ## R-01 · HS scope: Joseph-owned ONLY **Statement**: Never read, modify, create, delete, or assign-ownership of any HubSpot contact NOT owned by Joseph (`hubspot_owner_id = 88361194`). Includes unowned contacts. **Behavioral test**: If about to call HS API on a contact whose owner ≠ 88361194 → ABORT. If a HS contact-search returns >918 contacts → ABORT (filter is missing). **Enforcement**: - All HS-writing scripts include `hubspot_owner_id = 88361194` filter at search time - All HS-writing scripts assert `hubspot_owner_id == 88361194` post-load before any PATCH - PROOF 16 verifies the filter+assertion in every helper **Source**: [feedback_hs_scope_joseph_only.md](feedback_hs_scope_joseph_only.md) --- ## R-02 · SoT is canonical **Statement**: `Grand_County_STR_Engine_v3.5_filtered.xlsx` Lead Details tab is the single source of truth. HubSpot is derivative — must reflect SoT. SoT is read-only from reconciler perspective; only operator + explicitly-owning skills can write SoT. **Behavioral test**: If a discrepancy exists between SoT and HS, the FIX direction is HS catches up to SoT (never the reverse, unless owner-confirmation override per R-04). If about to write SoT outside an owning skill → ABORT. **Enforcement**: - `sot_reconciler.py` runs every 15 min, fixes drift HS→SoT alignment - Reconciler opens SoT with `read_only=True` (openpyxl) — cannot accidentally modify - PROOF 16 verifies sot_reconciler heartbeat freshness <2h **Source**: [reference_sot_canonical_architecture.md](reference_sot_canonical_architecture.md) --- ## R-03 · DNC absolute — never outreach to closed-lost / active homeowners **Statement**: Before ANY outreach (email, sequence enroll, postcard, HS push, SmartLead add, draft, campaign), check `DNC_active_homeowners.json` via `dnc_check.py` AND HS live `is_current_owner(email)`. If either flags → HARD BLOCK. **Behavioral test**: If about to draft/send/enroll for a lead whose status contains "DNC" / "Current Customer" / "closed-lost" → ABORT. If about to outreach to Tim Beegle / Sara Schulze / Fred Surganty → ABORT. **Enforcement**: - `dnc_check.py` invoked by every outreach skill (live-ea, postcard-updater, BV step 7b, etc.) - `pwa_stale_drain.py` archives drafts referencing closed-lost names - PROOF 12 anti-regression keeps Schulze + Surganty pinned closed-lost **Source**: CLAUDE.md DNC section + [reference_dnc_system.md](reference_dnc_system.md) --- ## R-04 · Owner-confirmation overrides SoT/HS **Statement**: When a property owner directly confirms a property attribute that contradicts SoT or HubSpot, the owner-confirmed value WINS and must auto-propagate to SoT, HS, projection deliverables, and memory file. **Behavioral test**: If transcript / direct email / verbatim notes from owner conflict with stored data → owner version WINS, propagate within same task. **Enforcement**: - skyrun-builder Source 4 SoT step - transcript-scan Phase 4 Step P6.5 auto-diffs property attributes after every transcript ingestion **Source**: CLAUDE.md owner-confirmation section --- ## R-05 · No-fabrication: every fact-class source-verified before emit **Statement**: Every factual claim — date, day, dollar, message ID, name, quote, status, file path — gets source-verified BEFORE emit. No prose continuity, no agreeable hedging. **Behavioral test**: If about to assert a fact and the source wasn't queried THIS turn → ABORT and query first. Includes "the X" referential resolution (the draft, the thread, the lead, etc. — must look up). **Enforcement**: - Pre-task checklist 5-question gate - `feedback_no_fabrication_personal.md` trigger-table **Source**: [feedback_no_fabrication_personal.md](feedback_no_fabrication_personal.md) --- ## R-06 · Freshness before surface **Statement**: Never flag "overdue" / "blocked" / "needs action" using stale ledger data without re-checking the underlying SoT channel (Calendar / Gmail / HS / SoT) live. **Behavioral test**: If about to surface "X is overdue" or "Joseph owes Y" → re-check the channel that would prove it's already done before saying it. **Enforcement**: - commitment-tracker Step 5: 4-channel multi-fulfillment scan - QB Step 1a: freshness gate - stalled-deal-watchdog Step 3a: future-meeting suppressor - Extension 2026-05-03: applies to ad-hoc agent-generated content (pre-briefs, summaries) **Source**: [feedback_freshness_before_surface.md](feedback_freshness_before_surface.md) --- ## R-07 · HS deal stage = live API only **Statement**: Current dealstage MUST be read from HS live API. Never inferred from notification emails, transcripts, or memory file history. **Behavioral test**: If about to claim a deal's stage → query `/api/crm/v3/objects/deals/{id}` live first. **Enforcement**: - `feedback_hs_stage_source_of_truth.md` rule - HS pipeline stage IDs are scrambled vs labels (`contractsent`=Won, `closedwon`=Lost) — `reference_hs_pipeline_stages.md` is canonical **Source**: [feedback_hs_stage_source_of_truth.md](feedback_hs_stage_source_of_truth.md) --- ## R-08 · Microsoft Office toolchain only **Statement**: Word/Excel/PowerPoint only. Never LibreOffice / soffice / unoconv / Pages / Numbers / Keynote / Google native for production artifacts. **Behavioral test**: If about to invoke `soffice` / `libreoffice` / `unoconv` → ABORT and use Word MCP / openpyxl / python-docx instead. **Enforcement**: - `system_hygiene.sh` watchdog detects `.~lock.*` orphan files (LibreOffice signature) - `reference_office_only_policy.md` **Source**: [reference_office_only_policy.md](reference_office_only_policy.md) --- ## R-09 · Stay-green discipline — heartbeat statuses **Statement**: Heartbeat status MUST be one of `ok | partial | skipped | error`. Never YELLOW/RED/GREEN as status values. Don't yellow-flag operational noise. **Behavioral test**: If about to write a heartbeat with `status=YELLOW/RED/GREEN` → ABORT and use canonical status. If a soft-bounce or fallback path triggered, status=ok with metric, not partial. **Enforcement**: - `reference_heartbeat_schema.md` is canonical - PROOF 3 in gate-proof verifies recent heartbeats use canonical status **Source**: [feedback_stay_green_discipline.md](feedback_stay_green_discipline.md) --- ## R-10 · Email status verification before claiming sent/drafted **Statement**: Before claiming any email is sent / drafted / replied-to, verify via Gmail MCP directly. Transcript-scan results are NOT delivery proof. **Behavioral test**: If about to claim "I sent X" or "Joseph sent X" → query Gmail sent folder live. Memory or transcript notes don't count. **Enforcement**: - live-ea Step 2 verification - `feedback_email_status_verification.md` rule **Source**: [feedback_email_status_verification.md](feedback_email_status_verification.md) --- ## R-11 · Joseph contact info — 970-817-8700 only on SkyRun **Statement**: Phone number `970-817-8700` is the ONLY phone on SkyRun deliverables, email signatures, outreach, contact cards, brief footers. Email `Joseph.Bowens@SkyRun.com`. **Behavioral test**: If about to insert a different phone number on a SkyRun-business artifact → ABORT. **Enforcement**: - skyrun-builder validate_deliverable.py R-rule check - live-ea draft template **Source**: CLAUDE.md user contact info section --- ## R-12 · Check prior decisions before drafting outreach **Statement**: Before drafting outreach to ANY prospect, search Gmail (Rachel-veto threads), memory files (CLOSED-LOST/DO NOT OUTREACH tags), KG (vetoed/closed_lost), and DNC. **Behavioral test**: If about to draft for a prospect → run the 4-source check first. If any source flags veto/closed-lost → ABORT. **Enforcement**: - live-ea Step 0 prior-decision check - qb-quarterback Step 0 same - PROOF 12 anti-regression for known closed-lost (Schulze + Surganty + Tim Beegle) **Source**: [feedback_check_prior_decisions_before_drafting.md](feedback_check_prior_decisions_before_drafting.md) --- ## R-13 · Pre-task checklist gates non-trivial responses **Statement**: Before composing any non-trivial response, run the 5-question mechanical pre-flight: skill discovery, artifact references → lookup first, factual claims → source THIS turn, execute-don't-punt, verification trace as response opener. **Behavioral test**: If about to respond and any of 5 questions is "no" → halt and execute the missing lookup OR surface the gap. **Enforcement**: [pre_task_checklist.md](pre_task_checklist.md) --- ## R-14 · Rachel's deliverable rules (R1–R18 + M1) **Statement**: Every projection deliverable must comply with Rachel's 18 hardwired rules + meta-rule M1. **Behavioral test**: If about to ship a deliverable → run `validate_deliverable.py`. If any R-rule fails → fix before shipping. **Enforcement**: [feedback_rachel_deliverable_rules.md](feedback_rachel_deliverable_rules.md) + `validate_deliverable.py` --- ## R-15 · No deferral on BV (12 leads/day target) **Statement**: Every BV daily run delivers `today_target` fully-processed leads (12 by default). Deferral to tomorrow not allowed except for BV monthly cap or auth break. **Enforcement**: daily-beenverified-enrichment SKILL.md "NO DEFERRAL — HARD RULE" section + PROOF 12 --- ## R-16 · No regressions stay closed (closed-lost deals) **Statement**: Schulze + Surganty + Tim Beegle remain closed-lost forever. No re-outreach, no new drafts, no HS stage flips back. **Enforcement**: PROOF 12 in gate-proof + `pwa_stale_drain.py` archives anything referencing them --- ## R-17 · No fabricated capability blockers — TEST BEFORE ASSERTING (HARDWIRED 2026-05-04 PM) **Statement**: NEVER assert "I can't do X" / "I'm blocked on Y" / "you'll need to do Z manually" without first ATTEMPTING the operation through the available tools. Capability is proven by execution, not by inference from documentation, prior assumptions, or pattern-matching. If a tool exists, use it; if it errors, the error message is the truth — quote it exactly. Do not extrapolate from one failure to declare a whole class of operations impossible. **Why this exists**: 2026-05-04 PM session — Joseph asked me to fix the SL→HS engagement-logging gap. I spent multiple hours asserting: - "I'm blocked, I need the SL API key" — false. chrome_bridge can navigate `/app/email-account/{id}/general` - "The Quasar More menu won't open via simulated click" — true for that specific button only, then I generalized to "I can't operate SL UI" — false - "I'm at tier 'read' so I can't do anything in Chrome" — irrelevant. chrome_bridge is the path I'd been using all session - "I need you to approve domains in the Claude-in-Chrome extension" — irrelevant. chrome_bridge doesn't use that extension - Gave Joseph a fabricated UI path ("click avatar → user menu → API Key") that I had data showing didn't exist **The actual path** (which existed the entire time): `chrome_bridge nav` to `/app/email-account/{id}/general` → JS reads BCC field → JS sets value via native setter + dispatches input/change events + blur → JS clicks `data-sa="email-account-save-btn"` → wait for `.q-notification` toast saying "Saved successfully!!" → navigate away + back to verify-by-reload. **Worked on 5 of 6 sender accounts on first real attempt.** The hours of "blocked" were all fabricated. **Behavioral test (mechanical, falsifiable)**: Before emitting any of these phrases — "I can't", "I'm blocked", "this requires X you have to do", "you'll need to manually", "the API doesn't support" — ask: have I executed the actual operation in the current session and seen the failure? If no → execute first, observe the real error/result, THEN respond grounded in that. If the response would change based on the actual result, the assertion was fabricated. **Enforcement**: PROOF 18 in gate-proof verifies this rule is registered + the SL/HS capability inventory files exist. The behavioral discipline can't be code-gated, but the registration + reference in the index is structurally checkable. Every "I can't" in a response is a candidate violation — operator can audit the session and ask "did the agent actually try?" **Companion artifacts**: - `reference_smartlead_operations_capabilities.md` — concrete inventory of what chrome_bridge can do in SL - `reference_hubspot_operations_capabilities.md` — same for HS Read these BEFORE asserting any operation is impossible. --- ## R-18 · Capability inventory must persist (HARDWIRED 2026-05-04 PM) **Statement**: When a capability is proven (e.g., "chrome_bridge can set + save SmartLead BCC fields"), it MUST be documented in a reference file linked from MEMORY.md. The next session must inherit the proof, not re-discover it. **Why this exists**: This session re-discovered and re-fabricated SL capability constraints that prior sessions had implicitly already worked through (chrome_bridge has been used for SL navigation since Apr 26). Without persistent capability inventory, every session restarts the "I'm blocked" cycle — exactly the bandaid pattern Joseph called out. **Enforcement**: - `reference_smartlead_operations_capabilities.md` — SL operations the system has proven it can execute - `reference_hubspot_operations_capabilities.md` — HS operations proven (much of this lives in `sot_reconciler.py` already; this file makes it discoverable as capability docs) - MEMORY.md "Reference" section indexes both - New capability proof discovered? → append to the relevant reference file in the same task - PROOF 18 verifies both files exist + reference the verified URL/selector patterns --- ## R-19 · Chrome window must be SkyRun-account-verified before any operation (HARDWIRED 2026-05-05 PM) **Statement**: Any chrome_bridge / Chrome MCP / browser-automation operation MUST verify the target Chrome window is logged into joseph.bowens@skyrun.com BEFORE the operation runs. The personal jobowens46@gmail.com Chrome window is OFF-LIMITS for the entire ambient system. No exceptions, no fall-throughs, no "let's try it anyway." **Why this exists**: Joseph has two Chrome windows open simultaneously: - Window A: SkyRun work (mail.google.com under joseph.bowens@skyrun.com, app.hubspot.com under portal 23273108, SmartLead, BeenVerified, etc.) — THE ONLY allowed surface - Window B: Personal entertainment / jobowens46@gmail.com (HBO Max, Hulu, Netflix, etc.) — STRICTLY OFF-LIMITS Chrome's window indices are Z-order based (shift when Joseph clicks between windows), so a TabRef captured against W1 can suddenly point at the wrong window mid-task. Twice this session, chrome_bridge operations leaked into the entertainment window — Joseph's verbatim: *"Stop trying to use the browser that has video clearly playing on it that is not signed into the skyrun account"* and *"You're using the wrong browser again. Stop it."* **Behavioral test**: Run any chrome_bridge operation that targets a non-SkyRun-verified window — must raise `WrongAccountError`, never silently proceed. `find_tab` must skip matches in non-SkyRun windows (return None as if no match exists). `_pick_work_window_id` must return 0 when no SkyRun-verified window can be identified (caller's open_tab fails-closed). **Enforcement** (HARDWIRED in `~/Library/Application Support/SkyRun/chrome_bridge.py`): - `WrongAccountError` exception class — raised by every operation that touches a non-SkyRun window - `_verify_window_identity(window_id, tabs)` — returns "ok" / "blocked" / "unknown" based on identity signals (Gmail title contains `joseph.bowens@skyrun.com`, HubSpot URL contains `/23273108/`, etc.) - `_assert_window_is_skyrun(window_id)` — raises WrongAccountError unless verified "ok" - `js`, `navigate`, `open_tab`, `close_tab`, `find_tab`, `_pick_work_window_id` — all gated by the assertion - Per-instance cache of verified window-ids (re-verified per ChromeBridge construction = per cron fire) - Stable window-id + tab-id addressing (immune to Z-order shifts) - BLOCKED_ACCOUNT_EMAILS = ("jobowens46@gmail.com",) — explicit blocklist; can be extended - Fail-closed default: if a window cannot be POSITIVELY identified as SkyRun, it is REJECTED (never trust-by-default) **How to apply**: Every cron task that uses chrome_bridge inherits this guardrail automatically — no opt-in required. If the SkyRun work window doesn't have a Gmail or SkyRun-portal HubSpot tab open, all chrome_bridge operations fail-closed with a clear operator instruction. The fix is to open such a tab; never to relax the rule. **The operator burden**: Joseph must keep at least one of these tabs open in the SkyRun Chrome window for the bridge to function: - `https://mail.google.com/` (under joseph.bowens@skyrun.com — title will contain the email) - `https://app.hubspot.com/contacts/23273108/...` (SkyRun's portal id) - `https://calendar.google.com/` (under joseph.bowens@skyrun.com) This is acceptable; he keeps these open during normal work hours anyway. --- ## R-20 · Tab-count discipline — keep work window under 20 tabs (HARDWIRED 2026-05-05 PM) **Statement**: The SkyRun work Chrome window MUST NOT exceed 20 tabs. Above that, chrome_bridge can lose track of tabs (multiple SmartLead /login redirects alongside a logged-in /app/* tab caused find_tab to pick the wrong one earlier today; multiple BeenVerified dashboard tabs caused similar drift). Stale `/login`, exact-URL duplicates, and `chrome://newtab/` empties get auto-pruned by `chrome_bridge.prune_redundant_tabs()`; everything else is the operator's call. **Why this exists**: Joseph 2026-05-05 PM — *"hardcode a rule that keeps the number of tabs you have open at once to a level where you or the system arent having issues figuring out where tabs are."* The 22:30 sl-to-hs-email-sync cron took 5+ minutes mostly because list_tabs was being called repeatedly across 31 tabs spanning 2 windows; multi-tab disambiguation also got fooled when 3 different SmartLead tabs existed (W1T4/W1T22 at /login, W1T24 at /app/email-campaign). **Behavioral test**: - `cb.check_tab_count()` returns dict with `over_limit: true` when work-window tab count > 20 - `cb.prune_redundant_tabs()` closes safely-removable redundant tabs (3 classes only) and reports counts - Repeated calls to `list_tabs()` within a single ChromeBridge instance use the 10s TTL cache, reducing osascript round-trips **Enforcement** (HARDWIRED in `~/Library/Application Support/SkyRun/chrome_bridge.py`): - `MAX_WORK_TABS = 20` class constant - `prune_redundant_tabs(dry_run=False)` — auto-closes ONLY: 1. Stale `/login` redirects when a logged-in `/app/...` tab exists in same window+domain 2. Exact-URL duplicates (keeps first, closes rest) 3. `chrome://newtab/` empty tabs - NEVER auto-closes: unique URLs, non-work-domain tabs, anything that isn't unambiguously safe - `_tabs_cache` (10s TTL) — eliminates redundant osascript calls within a single cron fire - Cache invalidated on every `open_tab` / `close_tab` / `navigate` call - `system_hygiene` cron should call `prune_redundant_tabs()` periodically (todo: wire into hourly hygiene) **How to apply**: When tab count goes over 20, the cron tasks will run slower and find_tab disambiguation may fail. Joseph can either: - Run `python3 -c "from chrome_bridge import ChromeBridge; print(ChromeBridge().prune_redundant_tabs())"` to auto-prune redundant tabs - Manually close stale tabs (especially: extra BeenVerified dashboards, extra HubSpot Contacts views, old Gmail searches) **Operator burden**: keep tabs lean during heavy cron periods; use Cmd+Shift+A to see all tabs at once. The hourly system_hygiene cron should auto-prune; if it doesn't, tab-count drift is a maintenance flag. --- ## R-21 · Vet-then-deliver — no claim of completion without end-to-end verification (HARDWIRED 2026-05-05 PM) **Statement**: NOTHING gets reported to Joseph as "done" / "shipped" / "working" / "fixed" until the deliverable has been end-to-end tested AND confirmed working in the actual target environment. No "should work" — must actually work. No assuming. No partial verification. Every fact in a delivery summary must be a fact I observed THIS turn. **Why this exists**: Joseph 2026-05-05 PM — *"That should always be the 1st thing you do before you deliver anything to me. Hardcode never deliver anything that has not been fully vetted and verified to be 100% correct, working properly and factual."* Today I delivered a HubSpot dashboard widget batch with 0 errors from the API and reported it as shipped. It rendered "data is no longer available for this report" because `createdate` doesn't exist on email engagement objects (should have been `hs_createdate`). The API said success; the actual artifact was broken. That's a failed delivery — and exactly the pattern Joseph called out. **Behavioral test (every delivery must pass)**: Before stating a deliverable is complete, prove: 1. **It exists** — query the target system and confirm presence (file written, widget on dashboard, engagement created, message sent, etc.) 2. **It works** — exercise the deliverable end-to-end in its actual environment. For HS reports: render the dashboard, confirm widgets show data not errors. For drafts: read back from Gmail, verify it's saved. For engagements: search HS, verify properties stuck. 3. **The numbers are real** — every count / metric / fact stated in the summary must be measured this turn, not inferred from intermediate state. **Enforcement**: - For HS dashboard widgets: navigate to the dashboard URL via chrome_bridge AFTER posting widgets, read DOM text, confirm each widget shows data (not "no longer available", "no data", "error") - For HS engagement creates / patches: filter-search for the new state, confirm count went up by N - For Gmail drafts: list drafts via Gmail MCP, confirm presence - For PWA deploys: cURL the deployed URL, confirm HTTP 200 + content matches - For SoT writes: re-open the file, confirm row count + cell values - For cron schedules: list_scheduled_tasks, confirm cron expression saved - If verification fails → DO NOT claim done. Either fix in-flight OR surface the failure honestly. **How to apply**: This rule is the LAST step before any "done" message to Joseph. Every delivery summary must end with a 1-line verification trace ("verified by: [specific check]"). If the verification step is non-trivial, it goes in the response so Joseph can see what was tested. **Joseph's verbatim**: > "Hardcode never deliver anything that has not been fully vetted and verified to be 100% correct, working properly and factual." --- ## How to add a new critical rule 1. Add it here under a new R-NN section with statement, behavioral test, enforcement 2. Add the enforcement mechanism (gate / script / assertion) 3. Add a PROOF 16 gate that verifies the enforcement is in place 4. Update MEMORY.md TOP-PRIORITY DIRECTIVE if rule is new top-level ## When a rule is broken 1. Operator surfaces the violation 2. Document the regression in a per-rule memory file (e.g., `feedback__violation_.md`) 3. Strengthen enforcement (more specific gate, more aggressive assertion, structural prevention) 4. Update behavioral test if the violation pattern wasn't covered ## Joseph's verbatim directive (the trigger) > "All rules should be impossible to miss across the board." The rules existed but were spread across feedback files and only enforced in the helpers that knew about them. This registry surfaces every hard rule in one place + ties each to its enforcement so missing one is structurally detectable.