SQL Tool — Design Doc

Read-only, tenant-isolated SQL access for AgentCore agents. Lives at app/api/agent/ — this doc is the entry point.

Status	Shipped on PR #8322 (ONRAMP-4761)
Audience	Engineers reading, modifying, or reviewing SQL-tool code
What it is	`read_via_sql` + `describe_schema` tools wired into portal + onramp_chat agents
Tenant model	Vendor + customer-account, with portal vs. vendor-user split
Threat-model primer	`~/.claude/plans/sql-tool-tenant-isolation-primer.md`

TL;DR

The SQL tool lets agents answer open-ended data questions (e.g. "how many tasks did I complete last week?") by writing SQL against the OnRamp database. Cross-tenant reads are physically prevented by four independent defenses; any single layer would suffice on its own.

mermaid

flowchart LR
    A[Agent writes SQL] --> B[1. Validators<br/>reject dangerous shapes]
    B --> C[2. Parse + rewrite<br/>inject vendor/customer scope]
    C --> D[3. Postgres role<br/>SELECT-only on allowlisted tables]
    D --> E[4. RLS policies<br/>scope rows by session GUC]
    E --> F[(Result)]
    style B fill:#fff4e6,color:#000
    style C fill:#e6f3ff,color:#000
    style D fill:#e6ffe6,color:#000
    style E fill:#ffe6f3,color:#000

Locality wins: exposure decisions live on the column line via typed markers (Public[T], Confidential[T], Internal[T], VendorOwner[T], CustomerOwner[T]). PR diff = security-policy diff.

Default-deny enforced by CI: untagged column on an exposed table fails test_every_column_on_exposed_table_is_classified — unmergeable without a marker, an info-dict, or an explicit signoff entry.

How a query flows end-to-end

mermaid

sequenceDiagram
    autonumber
    participant Agent as Agent (portal/copilot)
    participant Tool as read_via_sql tool
    participant Flask as Flask /api/internal/agent/sql-read
    participant Val as validators.py
    participant Rew as parser+scope+cap injectors
    participant DB as Postgres (or_readonly_user)

    Agent->>Tool: SQL string
    Tool->>Flask: POST {vendor_id, user_id, is_customer_user, sql}
    Flask->>Flask: Cross-check User row<br/>(vendor, is_customer_user, mapping)
    Flask->>Val: validate_sql(sql)
    Val-->>Flask: accept | reject(reason)
    Flask->>Rew: inject_scope + inject_caps
    Rew-->>Flask: rewritten SQL
    Flask->>DB: SET LOCAL app.vendor_id=N<br/>SET LOCAL app.user_id=N<br/>...
    Flask->>DB: execute(rewritten SQL)
    Note over DB: RLS restrictive policies<br/>filter rows by GUC
    DB-->>Flask: rows (≤1000, ≤1MB, ≤5s)
    Flask-->>Tool: {scope_status, data, meta}
    Tool-->>Agent: result dict
    Note over Flask: emit_audit() on every path,<br/>including rejections

What each step physically prevents (click to expand)

#	Step	Blocks
3	`_resolve_caller_context` cross-checks `User.vendor` + `is_customer_user` against payload, plus `or_customer_user_account_mapping` for portal users	Agent passing a foreign `vendor_id` / `user_id`
4	`validate_sql` — regex + AST	`INSERT/UPDATE/DELETE/DDL`, multi-statement, `pg_` / `lo_` functions, `SELECT `, recursive CTE depth >3, polymorphic `assoc_object_` columns
5	`inject_scope` (sqlglot `build_scope` walk)	Cross-vendor reads via subquery, CTE, UNION arm, EXISTS, derived table — every scope gets `vendor = caller_vendor_id` injected
5	`inject_caps`	Result blow-up: forces `LIMIT ≤1000`. Also decodes hashid string literals and rejects cross-vendor matches
7	`or_readonly_user` Postgres role + `statement_timeout=5s`	Any write attempt, `pg_`/`lo_` calls that slipped past validators
9	RLS restrictive policies referencing `app.vendor_id` GUC	Anything else — last-resort row-level filter at DB layer
10	`emit_audit(record)`	Silent failures: every accept and every reject is logged (`sql_tool.query`)

Security model — the four layers, in detail

mermaid

flowchart TB
    subgraph L1[Layer 1 — Identity]
        I1[POST payload<br/>vendor_id, user_id, is_customer_user]
        I2[DB cross-check<br/>User.vendor, User.is_customer_user]
        I3[Mapping cross-check<br/>or_customer_user_account_mapping]
        I1 --> I2 --> I3
    end

    subgraph L2[Layer 2 — Parse + Rewrite]
        P1[validators.py<br/>shape rejection]
        P2[parser.py<br/>sqlglot AST]
        P3[scope_injector.py<br/>vendor/customer/role/soft-delete WHERE]
        P4[cap_injector.py<br/>LIMIT 1000 + hashid check]
        P1 --> P2 --> P3 --> P4
    end

    subgraph L3[Layer 3 — DB role]
        D1[or_readonly_user<br/>LOGIN, NOCREATEDB, NOCREATEROLE]
        D2[GRANT SELECT only<br/>on allowlisted tables]
        D3[statement_timeout = 5000ms]
        D1 --> D2 --> D3
    end

    subgraph L4[Layer 4 — RLS]
        R1[ALTER TABLE ... ENABLE ROW LEVEL SECURITY]
        R2[Permissive FOR ALL TO public<br/>web app unaffected]
        R3[Restrictive FOR SELECT TO or_readonly<br/>vendor = current_setting'app.vendor_id']
        R1 --> R2 --> R3
    end

    L1 --> L2 --> L3 --> L4

Layer 1 — Caller identity (no HMAC; here is why that's OK)

The endpoint sits at /api/internal/agent/sql-read and skips HMAC. The PR description records that decision; it relies on:

VPC isolation — /api/internal/* not externally exposed.
Cross-checking the User row in _resolve_caller_context (sql_read_controller.py:49–125):
- User.vendor == payload.vendor_id or PermissionError
- User.is_customer_user == payload.is_customer_user or PermissionError
Customer-mapping cross-check for portal users — or_customer_user_account_mapping row must exist and be active.
Layers 2–4 are independent of any caller assertion.

Even if an attacker reaches the Flask network and forges identity, the rewriter and RLS still enforce row scope from the same authenticated payload.

Layer 2 — Parse + rewrite (sqlglot, fail-closed)

The rewriter chain is three files:

File	Job	Fail-closed behavior
`validators.py`	Reject shapes that should never reach the AST: multi-statement, `INSERT/UPDATE/DELETE/DDL`, `pg_` / `lo_` calls, `SELECT *`, polymorphic columns, `RECURSIVE` CTE depth > 3	Returns `ValidationResult.reject(reason)` — caller emits audit + 4xx
`scope_injector.py`	For every `exp.Table` source in every sqlglot scope (CTE, UNION arm, derived-table, EXISTS body): inject `vendor = :caller_vendor_id`, `deleted_at IS NULL`, `customer_id = :caller_customer_id` (portal only), self-only / project-membership filters	Raises `ScopeInjectionError` if the table has no resolvable vendor column and is not tagged `global_lookup` or `rls_scoped`
`cap_injector.py`	Inject `LIMIT 1000`. Scan original SQL for hashid-shaped string literals; if any decode to an id whose vendor row ≠ caller, reject	Raises `CapInjectionError` on `hashid_cross_vendor`

The scope walk is built on sqlglot.optimizer.scope.build_scope so subqueries, CTEs, UNION arms, and EXISTS bodies are all walked by sqlglot — not by hand. Every real exp.Table source in every scope gets predicates appended; CTE / derived-table aliases appear as Scope objects in scope.sources and are skipped (their inner scope handles it).

python

# Example: agent submits this
SELECT id, project_name FROM or_projects WHERE health = 'red'

# Rewriter emits this
SELECT id, project_name
FROM or_projects
WHERE (health = 'red')
  AND "or_projects"."vendor" = 17
  AND "or_projects"."deleted_at" IS NULL
LIMIT 1000

The original WHERE is wrapped in () to preserve OR precedence — see _inject_where_predicates line 233 (CRITICAL #1 paren wrap).

Layer 3 — Postgres read-only role

Two principals provisioned by migration b6519fcbc296:

or_readonly — group role, NOLOGIN, NOCREATEDB, NOCREATEROLE. Holds GRANT CONNECT + GRANT USAGE ON SCHEMA public + table-level GRANT SELECT on each allowlisted table. Carries statement_timeout = 5000ms at role level.
or_readonly_user — login user, LOGIN, NOCREATEDB, NOCREATEROLE. Member of or_readonly, inherits all grants. Authenticates via ONRAMP_READONLY_DATABASE_PASSWORD env var.

Provisioning per tenant: bun devtools/set-readonly-db-env-vars.ts writes the password into Duplo for each environment.

Even a perfect rewriter bug that emits DROP TABLE would fail at this layer: or_readonly_user cannot execute write SQL.

Layer 4 — Row-Level Security

Migration e09d5464d3fe applies two policies per table from app/api/agent/rls_policies/:

sql

-- Permissive: web-app role unaffected (postgres bypasses RLS by default,
-- but this is belt-and-suspenders for managed-Postgres surprises).
CREATE POLICY "<table>_app_full" ON "<table>"
  FOR ALL TO public USING (true) WITH CHECK (true);

-- Restrictive: SQL-tool role sees only rows matching the GUC.
CREATE POLICY "<table>_readonly_scope" ON "<table>"
  AS RESTRICTIVE FOR SELECT TO or_readonly
  USING (vendor = current_setting('app.vendor_id', true)::int);

The , true in current_setting returns NULL when the GUC is unset. vendor = NULL is FALSE in SQL, so an unset GUC means zero rows — fail-closed.

SqlReadService._execute_query runs SET LOCAL app.vendor_id = :v (and app.user_id, app.is_customer_user, app.customer_id) inside the transaction before execute(sql_rewritten). SET LOCAL is transaction-scoped, so values can't leak across pool checkouts.

Tables currently RLS-protected (app/api/agent/rls_policies/__init__.py):

or_comment, or_modules, or_object_status, or_projects, or_resources,
or_tasks, or_vendors, or_project_user_mappings, or_customer_user_account_mapping, user

Drift gate: test_rls_policy_drift.py reads pg_policies and asserts equality with the registry.

ORM-declarative allowlist (the column markers)

Schema inventory is the ORM (alembic-gated to the DB). Exposure policy lives on the column line:

python

class ORProject(ORStandardModel, PortalAccessible):
    project_name: Mapped[Public[str]] = mapped_column(String, nullable=False)
    health_score: Mapped[Internal[float]] = mapped_column(Float)         # explicit deny
    internal_notes: Mapped[Confidential[str]] = mapped_column(Text)               # vendor-only, no portal
    vendor: Mapped[VendorOwner[int]] = mapped_column(ForeignKey(...))    # override-only
    account: Mapped[CustomerOwner[int]] = mapped_column(ForeignKey(...)) # override-only

    __sql_tool_policy__ = {
        "role_filters": {
            "CONTRIBUTOR": "project_member_only",
            "API_USER": "denied",
        },
    }

Marker	Use	Visible to vendor user	Visible to portal user
`Public[T]`	Most exposed columns	✅	✅
`Confidential[T]`	Sensitive but business-relevant	✅	❌
`Internal[T]`	Passwords, secrets, internal hashes	❌	❌
`VendorOwner[T]`	Override when FK detection ambiguous	(tenancy column)	(tenancy column)
`CustomerOwner[T]`	Override when FK detection ambiguous	(tenancy column)	(tenancy column)

Auto-classified (no marker required): id, uuid, created_at, modified_at, archived_at, deleted_at, created_by, modified_by, archived_by, deleted_by — see AUTO_EXPOSED_COLUMN_NAMES.

Migration form (legacy mapped_column without Mapped[T] annotations):

python

some_col = mapped_column(Text, info={"sql_tool": {"exposed": True, "pii": True}})

Both forms feed the same registry — the loader merges them (orm_allowlist._collect_exposed_columns).

How the boot loader builds the registry

mermaid

flowchart TB
    Start([Flask boot]) --> Walk[_all_orm_subclasses<br/>walk ORBaseModel + ORStandardModel]
    Walk --> Each{For each class}
    Each --> Hints[_resolve_class_hints<br/>walk MRO with per-class tolerance]
    Hints --> Auto[Auto-classify by name<br/>id, uuid, created_at, ...]
    Auto --> Markers[Extract _SqlMeta from<br/>Mapped/Annotated/PEP-695 alias]
    Markers --> Info[Merge column.info'sql_tool' overrides]
    Info --> OptIn{Has explicit marker<br/>OR PortalAccessible<br/>OR __sql_tool_policy__?}
    OptIn -->|No| Skip[Skip — not in registry]
    OptIn -->|Yes| Resolve[resolve_vendor_owner<br/>resolve_customer_owner]
    Resolve --> Fail{Resolution failed?}
    Fail -->|Yes & not global/rls_scoped| Boom[RuntimeError at boot<br/>fail-safe Edge 2/3]
    Fail -->|No| Entry[Add to registry]
    Entry --> Each

The registry is @lru_cache(maxsize=1) — built once per worker. Tests clear via collect_allowlist.cache_clear().

Why MRO walked manually (orm_allowlist.py:65–102): a single get_type_hints(cls) call evaluates every annotation on every ancestor in one ForwardRef pass. One unresolvable name on an unrelated parent class drops every hint — including the marker annotations we depend on. The MRO walk tolerates per-class failures and falls back to per-annotation eval.

Owner resolution order

Vendor owner (owner_resolution.resolve_vendor_owner):

Explicit Mapped[VendorOwner[T]] marker — single match wins, multi-marker raises.
__sql_tool_policy__["owner_via"] override.
FK to or_vendors.id — single match wins, multi-FK raises (Edge 1).
Naming convention fallback: vendor_id, vendor, tenant_id.
None — caller (loader) decides whether to fail.

Customer owner (resolve_customer_owner):

Explicit Mapped[CustomerOwner[T]] marker.
__sql_tool_policy__["customer_owner_via"].
FK to account.id.
Naming convention fallback: customer_id, customer, account.
Bridge resolution — walk relationships annotated with info={"sql_tool": {"resolves_customer_owner_via": "<col>"}} (e.g. User → CustomerUserAccountMapping → account).
None.

Five fail-safe edges (boot will refuse to start if any fire)

mermaid

flowchart LR
    E1[Edge 1<br/>multi-FK ambiguity] --> Boot([RuntimeError at Flask boot])
    E2[Edge 2<br/>Public column on table<br/>without resolvable vendor owner] --> Boot
    E3[Edge 3<br/>PortalAccessible without<br/>resolvable customer owner] --> Boot
    E4[Edge 4<br/>untagged column on<br/>exposed table — CI gate] --> CI([pytest fails])
    E5[Edge 5<br/>RLS drift between<br/>registry and pg_policies] --> CI
    style Boot fill:#ffe6e6,color:#000
    style CI fill:#fff4e6,color:#000

Edges 1–3 raise at boot. Edges 4–5 fail in CI. None can be silently bypassed.

Footguns + areas of improvement

Honest list. Some are documented trade-offs; others are real gaps that should land as follow-ups.

🔴 Gap — `AUTO_DENIED_COLUMN_PATTERNS` is defined but never enforced

sql_tool_conventions.py:41 defines patterns for columns that look sensitive (*password*, *token*, *ssn*, etc.) and exports a matches_auto_deny_pattern helper. Nothing in the rewriter or the CI gate calls it. Tests cover the pattern matcher in isolation, but no production code path inspects column names against it.

A new column named payment_token marked Mapped[Public[str]] would pass the classification gate today.

Fix: wire matches_auto_deny_pattern into test_every_column_on_exposed_table_is_classified to require an entry in tests/snapshots/sql_tool_sensitive_overrides.json whenever a Public column matches a deny pattern. The signoff JSON file already exists for that purpose.

🔴 Gap — RLS-registry drift not enforced by structure

RLS_POLICIES in app/api/agent/rls_policies/__init__.py is the source of truth; migration e09d5464d3fe reads it at upgrade time. Changing a policy's predicate in the registry alone is invisible to already-migrated databases — old migrations don't replay. test_rls_policy_drift.py catches drift on a fresh DB but not on long-running ones.

Convention (currently undocumented in CONTRIBUTING): adding or changing a policy requires a new alembic migration calling apply_policy() for the changed entry.

Fix: add a version int to each PolicyDefinition. Drift test asserts pg_policies matches the current version. Adding to the registry without bumping version + writing a migration fails CI. Manageable now (10 tables); dangerous past ~25.

🔴 Gap — semantic ambiguity in column names

Cross-tenant leaks are physically prevented by the four layers. The actual long-term failure mode is answers that are technically scoped but semantically wrong — agent picks task_is_internal thinking it means "ops-internal" when it gates customer visibility, etc.

Three levers under-used today:

agent_description per column via info={"sql_tool": {"agent_description": "..."}} — flows into describe_schema. Used on exactly one column (milestone_name in ORModule).
example_queries per table — supported in allowlist.describe but empty everywhere.
Semantic views. A vw_customer_visible_tasks that codifies the joint meaning of task_is_internal + is_customer_visible once. Agents stop reasoning about flags. Separate plan.

Adding more columns without addressing this scales rope, not power.

🔴 Gap — critical mapping tables not yet exposed

or_task_mappings (module ↔ task ordering), or_module_mappings (playbook ↔ module ordering), and or_task_dependencies (task dependency graph) are not in the allowlist or RLS_POLICIES. Agent can read or_tasks but cannot reconstruct task order or module composition — most "what's the next task" / "what's blocked by what" questions unanswerable via SQL. All three have direct vendor FK; RLS template identical to or_tasks.

🔴 Gap — `account` table has no RLS policy

account is in the _ALLOWLISTED_TABLES SELECT-grant list (migration b6519fcbc296) and the agent can SELECT from it, but it is not in RLS_POLICIES in app/api/agent/rls_policies/__init__.py. The rewriter is the only defense for cross-vendor account reads.

If a future SQL shape slips past the scope injector, RLS would catch it on every other RLS-protected table — but not on account.

Fix: add app/api/agent/rls_policies/account.py and register it.

🟡 Footgun — over-permissive table flags have no signoff gate

__sql_tool_policy__["global_lookup"] = True and "rls_scoped": True both short-circuit predicate injection in scope_injector.py:144–146. A developer marking a table global_lookup=True by mistake silently drops scope. Default-deny only catches missing markers, not over-permissive ones.

Fix: add a snapshot test listing tables with global_lookup or rls_scoped set, requiring reviewer signoff to add new entries (mirrors the sql_tool_sensitive_overrides.json pattern).

🟡 Footgun — hashid validator only probes two tables

cap_injector._reject_if_cross_vendor_hashid decodes hashid-shaped string literals and looks them up in or_projects then or_tasks. A cross-vendor hashid for or_resources, or_modules, etc. is not caught at this layer. (The scope injector still rewrites the WHERE so cross-vendor data wouldn't be returned — but the early hashid check is incomplete.)

Fix: extend the lookup to every table with a vendor FK, or — better — drop the hashid pre-check entirely and rely on scope injection + RLS. The pre-check is best-effort already.

🟡 Footgun — hashid string-literal scan misses pg-string forms

_validate_hashids regex is r"'([^']*)'". It misses:

E-strings (E'...')
Dollar-quoted strings ($$...$$)
Escaped quotes ('O''Brien')
Comments (/* ... */)

Same caveat — scope injection is the real defense, this is opportunistic. Worth either tightening or removing.

🟡 Footgun — `_lookup_customer_id` returns `LIMIT 1` from mapping table

sql_read_controller._lookup_customer_id falls back to the lowest-id mapping row when customer_id isn't passed. A portal user mapped to multiple accounts gets a stable but possibly wrong default. Fine in steady state (most portal users map to one account), but a quiet failure mode if that ever changes.

Fix: require customer_id in the payload for portal users — never silently pick.

🟡 Footgun — `column.info["sql_tool"]` shallow-merges over markers

orm_allowlist._collect_exposed_columns lines 170–171 dict-merges info_override on top of marker-derived flags. info={"sql_tool": {"exposed": False}} on a column also marked Public[T] would silently disable exposure. Probably intentional as an escape hatch — but no warning logged.

Fix: log a warning when info-dict overrides a marker flag with a different value.

🟢 Dead code — `rejection_snippet` is unused

agent_surfaces.rejection_snippet is exported but never called. The rewriter raises ScopeInjectionError with raw message strings instead. Either wire it into the error path so the agent gets a uniform refusal, or delete it.

🟢 Limitation — polymorphic columns blanket-rejected

validators._POLYMORPHIC_COLUMNS = {"assoc_object_type_id", "assoc_object_id"} rejects any query touching them. Documented as v1; v2 would resolve the type id at parse time and rewrite the join. Affects or_comment and similar polymorphic tables. Documented in app/api/agent/rls_policies/or_comment.py.

🟢 Operational — registry cache lives until worker restart

collect_allowlist and load_registry are @lru_cache. Adding a new ORM model needs a worker restart to land in the SQL-tool registry. Fine in production; surprising in dev. make dev already restarts on Python changes.

🟢 Operational — no HMAC on the internal endpoint

Documented decision in PR description. The four defense layers below it still hold; HMAC would only add a "trusted-caller" signal. Future hardening idea: HKDF-derived per-endpoint keys from a single master secret (single env var, cryptographic isolation per endpoint) — currently tracked as TBD in the PR body.

Adding a new column or table — the playbook

mermaid

flowchart TD
    Start([New column on a SQL-tool-exposed table]) --> Q1{Is it sensitive?}
    Q1 -->|Yes — secrets, passwords| Internal[Mapped Internal'T']
    Q1 -->|Yes — vendor-internal| Confidential[Mapped Confidential'T']
    Q1 -->|No — agent should query| Public[Mapped Public'T']
    Q1 -->|It's a tenancy FK| OwnerCheck{FK auto-detect ambiguous?}
    OwnerCheck -->|Yes| Override[Mapped VendorOwner/CustomerOwner'T']
    OwnerCheck -->|No| Skip[No marker — auto-detected]

    Internal --> Done([CI passes])
    Confidential --> Done
    Public --> Done
    Override --> Done
    Skip --> Done

    Start --> Q2{Auto-classified name?<br/>id/uuid/created_at/...}
    Q2 -->|Yes| Done

    Start --> Q3{New table?}
    Q3 -->|Yes — portal-readable| Mixin[Add PortalAccessible mixin]
    Q3 -->|Yes — vendor-only| Policy[Add __sql_tool_policy__ dict]
    Mixin --> RLS{Tenant join through bridge?}
    Policy --> RLS
    RLS -->|Yes| RLSAdd[Add app/api/agent/rls_policies/your_table.py]
    RLS -->|No| Done
    RLSAdd --> Done

Quick checklist for a new exposed table:

Inherit ORStandardModel (or ORBaseModel) — gives you the auto-classified columns for free.
Add PortalAccessible mixin if portal callers should see it.
Add __sql_tool_policy__ = {...} for role filters, customer scoping, project membership.
Tag every non-auto-classified column with a marker (or info={"sql_tool": {...}}).
If the table has no direct vendor FK and isn't global_lookup, add an RLS policy in app/api/agent/rls_policies/.
Run pytest app/api/agent/tests -q --tb=short --no-header — the classification gate, snapshot test, and RLS drift test will tell you what's missing.
Snapshot bump (tests/snapshots/sql_tool_allowlist.json) — regenerate, review the diff.

Test surfaces

Test	What it locks
`test_sql_tool_classification_gate.py`	Default-deny. Every column on every exposed table must be classified or signoff-listed
`test_sql_tool_skeleton.py`	Marker extraction, both forms (typed + info-dict), auto-deny patterns, conventions
`test_sql_rewriter.py`	50+ cases against scope_injector, validators, cap_injector — cross-vendor isolation, CTE/UNION/EXISTS coverage
`test_sql_read_service.py`	End-to-end orchestration including audit emission on every path
`test_sql_read_controller.py`	Identity cross-check, customer mapping, role resolution
`test_rls_policy_drift.py`	`pg_policies` matches `RLS_POLICIES` registry (CI-skip without docker)
`test_rls_enforcement.py`	RLS actually rejects cross-vendor reads at the DB layer (CI-skip without docker)

CloudWatch Insights queries for on-call live in sql_audit.py docstring.

File map (one-glance)

app/api/agent/
├── sql_tool_types.py         # 5 markers (Public/Confidential/Internal/VendorOwner/CustomerOwner)
├── sql_tool_mixins.py        # PortalAccessible
├── sql_tool_conventions.py   # AUTO_EXPOSED_COLUMN_NAMES + AUTO_DENIED_COLUMN_PATTERNS + SIGNOFF_FILE
│
├── controllers/
│   └── sql_read_controller.py   # POST endpoint + identity cross-check
├── services/
│   ├── sql_read_service.py      # 3-layer orchestration + audit
│   ├── sql_audit.py             # AuditRecord + emit_audit
│   └── sql_rewriter/
│       ├── orm_allowlist.py        # ORM walk → registry (boot loader)
│       ├── owner_resolution.py     # FK auto-detection + bridge walk
│       ├── allowlist.py            # registry → AllowlistRegistry per caller type
│       ├── agent_surfaces.py       # describe_schema + orientation card
│       ├── parser.py               # sqlglot wrapper
│       ├── validators.py           # shape rejection (regex + AST)
│       ├── scope_injector.py       # vendor/customer/role/soft-delete WHERE injection
│       └── cap_injector.py         # LIMIT 1000 + hashid cross-vendor check
├── rls_policies/                  # one file per RLS-protected table + helpers + registry
└── tests/
    ├── test_sql_tool_classification_gate.py
    ├── test_sql_tool_skeleton.py
    ├── test_sql_rewriter.py
    ├── test_sql_read_service.py
    ├── test_sql_read_controller.py
    ├── test_rls_policy_drift.py
    └── test_rls_enforcement.py

migrations/versions/
├── 2026_04_29_b6519fcbc296_add_or_readonly_role.py
└── 2026_04_29_e09d5464d3fe_add_sql_tool_rls_policies.py

onramp-agents/src/shared/tools/
└── sql_read.py                # read_via_sql + describe_schema Strands tool factories

Operational env vars

ONRAMP_READONLY_DATABASE_PASSWORD   # required — login user password
ONRAMP_READONLY_DATABASE_HOST       # optional — falls back to ONRAMP_DATABASE_HOST
ONRAMP_DATABASE_PORT                # shared with primary
ONRAMP_DATABASE_NAME                # shared with primary

Per-tenant provisioning: bun devtools/set-readonly-db-env-vars.ts.

Where this plugs into agents

Agent	Tool factory called from
Portal copilot	`onramp-agents/src/agents/portal/...` builds tools from `PortalAgentState` (carries `is_customer_user=True`, `customer_id`)
OnRamp chat / vendor copilot	builds from `ChatAgentState` (carries `is_customer_user=False`)

Both call the same create_read_via_sql_tool(state) and create_describe_schema_tool(state) factories in shared/tools/sql_read.py. Identity comes from trusted state — never from agent input.

~/.claude/plans/sql-tool-tenant-isolation-primer.md — threat model, invariants, OnRamp-specific pitfalls
~/.claude/plans/sql-tool-orm-refactor-chunk1.md — design that landed: markers, mixin, conventions, fail-safe edges
~/.claude/plans/convert-it-now-piped-catmull.md — chunk 1.5: convert info-dict form to typed markers
~/.claude/plans/sql-tool-security-review-fixes.md — chunk 2 security findings
~/.claude/plans/sql-tool-fixup-rls.md — RLS migration consolidation
~/.claude/plans/sql-tool-drop-hmac.md — explicit decision record on no-HMAC posture

SQL Tool — Design Doc ​

TL;DR ​

How a query flows end-to-end ​

Security model — the four layers, in detail ​

ORM-declarative allowlist (the column markers) ​

Five fail-safe edges (boot will refuse to start if any fire) ​

Footguns + areas of improvement ​

🔴 Gap — AUTO_DENIED_COLUMN_PATTERNS is defined but never enforced ​

🔴 Gap — RLS-registry drift not enforced by structure ​

🔴 Gap — semantic ambiguity in column names ​

🔴 Gap — critical mapping tables not yet exposed ​

🔴 Gap — account table has no RLS policy ​

🟡 Footgun — over-permissive table flags have no signoff gate ​

🟡 Footgun — hashid validator only probes two tables ​

🟡 Footgun — hashid string-literal scan misses pg-string forms ​

🟡 Footgun — _lookup_customer_id returns LIMIT 1 from mapping table ​

🟡 Footgun — column.info["sql_tool"] shallow-merges over markers ​

🟢 Dead code — rejection_snippet is unused ​

🟢 Limitation — polymorphic columns blanket-rejected ​

🟢 Operational — registry cache lives until worker restart ​

🟢 Operational — no HMAC on the internal endpoint ​

Adding a new column or table — the playbook ​

Test surfaces ​

File map (one-glance) ​

Operational env vars ​

Where this plugs into agents ​

Related plans (reading order) ​

SQL Tool — Design Doc

TL;DR

How a query flows end-to-end

Security model — the four layers, in detail

ORM-declarative allowlist (the column markers)

Five fail-safe edges (boot will refuse to start if any fire)

Footguns + areas of improvement

🔴 Gap — `AUTO_DENIED_COLUMN_PATTERNS` is defined but never enforced

🔴 Gap — RLS-registry drift not enforced by structure

🔴 Gap — semantic ambiguity in column names

🔴 Gap — critical mapping tables not yet exposed

🔴 Gap — `account` table has no RLS policy

🟡 Footgun — over-permissive table flags have no signoff gate

🟡 Footgun — hashid validator only probes two tables

🟡 Footgun — hashid string-literal scan misses pg-string forms

🟡 Footgun — `_lookup_customer_id` returns `LIMIT 1` from mapping table

🟡 Footgun — `column.info["sql_tool"]` shallow-merges over markers

🟢 Dead code — `rejection_snippet` is unused

🟢 Limitation — polymorphic columns blanket-rejected

🟢 Operational — registry cache lives until worker restart

🟢 Operational — no HMAC on the internal endpoint

Adding a new column or table — the playbook

Test surfaces

File map (one-glance)

Operational env vars

Where this plugs into agents

Related plans (reading order)