Skip to content

SQL Tool — Design Doc

Read-only, tenant-isolated SQL access for AgentCore agents. Lives at app/api/agent/ — this doc is the entry point.

StatusShipped on PR #8322 (ONRAMP-4761)
AudienceEngineers reading, modifying, or reviewing SQL-tool code
What it isread_via_sql + describe_schema tools wired into portal + onramp_chat agents
Tenant modelVendor + customer-account, with portal vs. vendor-user split
Threat-model primer~/.claude/plans/sql-tool-tenant-isolation-primer.md

TL;DR

The SQL tool lets agents answer open-ended data questions (e.g. "how many tasks did I complete last week?") by writing SQL against the OnRamp database. Cross-tenant reads are physically prevented by four independent defenses; any single layer would suffice on its own.

mermaid
flowchart LR
    A[Agent writes SQL] --> B[1. Validators<br/>reject dangerous shapes]
    B --> C[2. Parse + rewrite<br/>inject vendor/customer scope]
    C --> D[3. Postgres role<br/>SELECT-only on allowlisted tables]
    D --> E[4. RLS policies<br/>scope rows by session GUC]
    E --> F[(Result)]
    style B fill:#fff4e6,color:#000
    style C fill:#e6f3ff,color:#000
    style D fill:#e6ffe6,color:#000
    style E fill:#ffe6f3,color:#000

Locality wins: exposure decisions live on the column line via typed markers (Public[T], Confidential[T], Internal[T], VendorOwner[T], CustomerOwner[T]). PR diff = security-policy diff.

Default-deny enforced by CI: untagged column on an exposed table fails test_every_column_on_exposed_table_is_classified — unmergeable without a marker, an info-dict, or an explicit signoff entry.


How a query flows end-to-end

mermaid
sequenceDiagram
    autonumber
    participant Agent as Agent (portal/copilot)
    participant Tool as read_via_sql tool
    participant Flask as Flask /api/internal/agent/sql-read
    participant Val as validators.py
    participant Rew as parser+scope+cap injectors
    participant DB as Postgres (or_readonly_user)

    Agent->>Tool: SQL string
    Tool->>Flask: POST {vendor_id, user_id, is_customer_user, sql}
    Flask->>Flask: Cross-check User row<br/>(vendor, is_customer_user, mapping)
    Flask->>Val: validate_sql(sql)
    Val-->>Flask: accept | reject(reason)
    Flask->>Rew: inject_scope + inject_caps
    Rew-->>Flask: rewritten SQL
    Flask->>DB: SET LOCAL app.vendor_id=N<br/>SET LOCAL app.user_id=N<br/>...
    Flask->>DB: execute(rewritten SQL)
    Note over DB: RLS restrictive policies<br/>filter rows by GUC
    DB-->>Flask: rows (≤1000, ≤1MB, ≤5s)
    Flask-->>Tool: {scope_status, data, meta}
    Tool-->>Agent: result dict
    Note over Flask: emit_audit() on every path,<br/>including rejections
What each step physically prevents (click to expand)
#StepBlocks
3_resolve_caller_context cross-checks User.vendor + is_customer_user against payload, plus or_customer_user_account_mapping for portal usersAgent passing a foreign vendor_id / user_id
4validate_sql — regex + ASTINSERT/UPDATE/DELETE/DDL, multi-statement, pg_* / lo_* functions, SELECT *, recursive CTE depth >3, polymorphic assoc_object_* columns
5inject_scope (sqlglot build_scope walk)Cross-vendor reads via subquery, CTE, UNION arm, EXISTS, derived table — every scope gets vendor = caller_vendor_id injected
5inject_capsResult blow-up: forces LIMIT ≤1000. Also decodes hashid string literals and rejects cross-vendor matches
7or_readonly_user Postgres role + statement_timeout=5sAny write attempt, pg_*/lo_* calls that slipped past validators
9RLS restrictive policies referencing app.vendor_id GUCAnything else — last-resort row-level filter at DB layer
10emit_audit(record)Silent failures: every accept and every reject is logged (sql_tool.query)

Security model — the four layers, in detail

mermaid
flowchart TB
    subgraph L1[Layer 1 — Identity]
        I1[POST payload<br/>vendor_id, user_id, is_customer_user]
        I2[DB cross-check<br/>User.vendor, User.is_customer_user]
        I3[Mapping cross-check<br/>or_customer_user_account_mapping]
        I1 --> I2 --> I3
    end

    subgraph L2[Layer 2 — Parse + Rewrite]
        P1[validators.py<br/>shape rejection]
        P2[parser.py<br/>sqlglot AST]
        P3[scope_injector.py<br/>vendor/customer/role/soft-delete WHERE]
        P4[cap_injector.py<br/>LIMIT 1000 + hashid check]
        P1 --> P2 --> P3 --> P4
    end

    subgraph L3[Layer 3 — DB role]
        D1[or_readonly_user<br/>LOGIN, NOCREATEDB, NOCREATEROLE]
        D2[GRANT SELECT only<br/>on allowlisted tables]
        D3[statement_timeout = 5000ms]
        D1 --> D2 --> D3
    end

    subgraph L4[Layer 4 — RLS]
        R1[ALTER TABLE ... ENABLE ROW LEVEL SECURITY]
        R2[Permissive FOR ALL TO public<br/>web app unaffected]
        R3[Restrictive FOR SELECT TO or_readonly<br/>vendor = current_setting'app.vendor_id']
        R1 --> R2 --> R3
    end

    L1 --> L2 --> L3 --> L4
Layer 1 — Caller identity (no HMAC; here is why that's OK)

The endpoint sits at /api/internal/agent/sql-read and skips HMAC. The PR description records that decision; it relies on:

  1. VPC isolation/api/internal/* not externally exposed.
  2. Cross-checking the User row in _resolve_caller_context (sql_read_controller.py:49–125):
    • User.vendor == payload.vendor_id or PermissionError
    • User.is_customer_user == payload.is_customer_user or PermissionError
  3. Customer-mapping cross-check for portal users — or_customer_user_account_mapping row must exist and be active.
  4. Layers 2–4 are independent of any caller assertion.

Even if an attacker reaches the Flask network and forges identity, the rewriter and RLS still enforce row scope from the same authenticated payload.

Layer 2 — Parse + rewrite (sqlglot, fail-closed)

The rewriter chain is three files:

FileJobFail-closed behavior
validators.pyReject shapes that should never reach the AST: multi-statement, INSERT/UPDATE/DELETE/DDL, pg_* / lo_* calls, SELECT *, polymorphic columns, RECURSIVE CTE depth > 3Returns ValidationResult.reject(reason) — caller emits audit + 4xx
scope_injector.pyFor every exp.Table source in every sqlglot scope (CTE, UNION arm, derived-table, EXISTS body): inject vendor = :caller_vendor_id, deleted_at IS NULL, customer_id = :caller_customer_id (portal only), self-only / project-membership filtersRaises ScopeInjectionError if the table has no resolvable vendor column and is not tagged global_lookup or rls_scoped
cap_injector.pyInject LIMIT 1000. Scan original SQL for hashid-shaped string literals; if any decode to an id whose vendor row ≠ caller, rejectRaises CapInjectionError on hashid_cross_vendor

The scope walk is built on sqlglot.optimizer.scope.build_scope so subqueries, CTEs, UNION arms, and EXISTS bodies are all walked by sqlglot — not by hand. Every real exp.Table source in every scope gets predicates appended; CTE / derived-table aliases appear as Scope objects in scope.sources and are skipped (their inner scope handles it).

python
# Example: agent submits this
SELECT id, project_name FROM or_projects WHERE health = 'red'

# Rewriter emits this
SELECT id, project_name
FROM or_projects
WHERE (health = 'red')
  AND "or_projects"."vendor" = 17
  AND "or_projects"."deleted_at" IS NULL
LIMIT 1000

The original WHERE is wrapped in () to preserve OR precedence — see _inject_where_predicates line 233 (CRITICAL #1 paren wrap).

Layer 3 — Postgres read-only role

Two principals provisioned by migration b6519fcbc296:

  • or_readonly — group role, NOLOGIN, NOCREATEDB, NOCREATEROLE. Holds GRANT CONNECT + GRANT USAGE ON SCHEMA public + table-level GRANT SELECT on each allowlisted table. Carries statement_timeout = 5000ms at role level.
  • or_readonly_user — login user, LOGIN, NOCREATEDB, NOCREATEROLE. Member of or_readonly, inherits all grants. Authenticates via ONRAMP_READONLY_DATABASE_PASSWORD env var.

Provisioning per tenant: bun devtools/set-readonly-db-env-vars.ts writes the password into Duplo for each environment.

Even a perfect rewriter bug that emits DROP TABLE would fail at this layer: or_readonly_user cannot execute write SQL.

Layer 4 — Row-Level Security

Migration e09d5464d3fe applies two policies per table from app/api/agent/rls_policies/:

sql
-- Permissive: web-app role unaffected (postgres bypasses RLS by default,
-- but this is belt-and-suspenders for managed-Postgres surprises).
CREATE POLICY "<table>_app_full" ON "<table>"
  FOR ALL TO public USING (true) WITH CHECK (true);

-- Restrictive: SQL-tool role sees only rows matching the GUC.
CREATE POLICY "<table>_readonly_scope" ON "<table>"
  AS RESTRICTIVE FOR SELECT TO or_readonly
  USING (vendor = current_setting('app.vendor_id', true)::int);

The , true in current_setting returns NULL when the GUC is unset. vendor = NULL is FALSE in SQL, so an unset GUC means zero rows — fail-closed.

SqlReadService._execute_query runs SET LOCAL app.vendor_id = :v (and app.user_id, app.is_customer_user, app.customer_id) inside the transaction before execute(sql_rewritten). SET LOCAL is transaction-scoped, so values can't leak across pool checkouts.

Tables currently RLS-protected (app/api/agent/rls_policies/__init__.py):

or_comment, or_modules, or_object_status, or_projects, or_resources,
or_tasks, or_vendors, or_project_user_mappings, or_customer_user_account_mapping, user

Drift gate: test_rls_policy_drift.py reads pg_policies and asserts equality with the registry.


ORM-declarative allowlist (the column markers)

Schema inventory is the ORM (alembic-gated to the DB). Exposure policy lives on the column line:

python
class ORProject(ORStandardModel, PortalAccessible):
    project_name: Mapped[Public[str]] = mapped_column(String, nullable=False)
    health_score: Mapped[Internal[float]] = mapped_column(Float)         # explicit deny
    internal_notes: Mapped[Confidential[str]] = mapped_column(Text)               # vendor-only, no portal
    vendor: Mapped[VendorOwner[int]] = mapped_column(ForeignKey(...))    # override-only
    account: Mapped[CustomerOwner[int]] = mapped_column(ForeignKey(...)) # override-only

    __sql_tool_policy__ = {
        "role_filters": {
            "CONTRIBUTOR": "project_member_only",
            "API_USER": "denied",
        },
    }
MarkerUseVisible to vendor userVisible to portal user
Public[T]Most exposed columns
Confidential[T]Sensitive but business-relevant
Internal[T]Passwords, secrets, internal hashes
VendorOwner[T]Override when FK detection ambiguous(tenancy column)(tenancy column)
CustomerOwner[T]Override when FK detection ambiguous(tenancy column)(tenancy column)

Auto-classified (no marker required): id, uuid, created_at, modified_at, archived_at, deleted_at, created_by, modified_by, archived_by, deleted_by — see AUTO_EXPOSED_COLUMN_NAMES.

Migration form (legacy mapped_column without Mapped[T] annotations):

python
some_col = mapped_column(Text, info={"sql_tool": {"exposed": True, "pii": True}})

Both forms feed the same registry — the loader merges them (orm_allowlist._collect_exposed_columns).

How the boot loader builds the registry
mermaid
flowchart TB
    Start([Flask boot]) --> Walk[_all_orm_subclasses<br/>walk ORBaseModel + ORStandardModel]
    Walk --> Each{For each class}
    Each --> Hints[_resolve_class_hints<br/>walk MRO with per-class tolerance]
    Hints --> Auto[Auto-classify by name<br/>id, uuid, created_at, ...]
    Auto --> Markers[Extract _SqlMeta from<br/>Mapped/Annotated/PEP-695 alias]
    Markers --> Info[Merge column.info'sql_tool' overrides]
    Info --> OptIn{Has explicit marker<br/>OR PortalAccessible<br/>OR __sql_tool_policy__?}
    OptIn -->|No| Skip[Skip — not in registry]
    OptIn -->|Yes| Resolve[resolve_vendor_owner<br/>resolve_customer_owner]
    Resolve --> Fail{Resolution failed?}
    Fail -->|Yes & not global/rls_scoped| Boom[RuntimeError at boot<br/>fail-safe Edge 2/3]
    Fail -->|No| Entry[Add to registry]
    Entry --> Each

The registry is @lru_cache(maxsize=1) — built once per worker. Tests clear via collect_allowlist.cache_clear().

Why MRO walked manually (orm_allowlist.py:65–102): a single get_type_hints(cls) call evaluates every annotation on every ancestor in one ForwardRef pass. One unresolvable name on an unrelated parent class drops every hint — including the marker annotations we depend on. The MRO walk tolerates per-class failures and falls back to per-annotation eval.

Owner resolution order

Vendor owner (owner_resolution.resolve_vendor_owner):

  1. Explicit Mapped[VendorOwner[T]] marker — single match wins, multi-marker raises.
  2. __sql_tool_policy__["owner_via"] override.
  3. FK to or_vendors.id — single match wins, multi-FK raises (Edge 1).
  4. Naming convention fallback: vendor_id, vendor, tenant_id.
  5. None — caller (loader) decides whether to fail.

Customer owner (resolve_customer_owner):

  1. Explicit Mapped[CustomerOwner[T]] marker.
  2. __sql_tool_policy__["customer_owner_via"].
  3. FK to account.id.
  4. Naming convention fallback: customer_id, customer, account.
  5. Bridge resolution — walk relationships annotated with info={"sql_tool": {"resolves_customer_owner_via": "<col>"}} (e.g. User → CustomerUserAccountMapping → account).
  6. None.

Five fail-safe edges (boot will refuse to start if any fire)

mermaid
flowchart LR
    E1[Edge 1<br/>multi-FK ambiguity] --> Boot([RuntimeError at Flask boot])
    E2[Edge 2<br/>Public column on table<br/>without resolvable vendor owner] --> Boot
    E3[Edge 3<br/>PortalAccessible without<br/>resolvable customer owner] --> Boot
    E4[Edge 4<br/>untagged column on<br/>exposed table — CI gate] --> CI([pytest fails])
    E5[Edge 5<br/>RLS drift between<br/>registry and pg_policies] --> CI
    style Boot fill:#ffe6e6,color:#000
    style CI fill:#fff4e6,color:#000

Edges 1–3 raise at boot. Edges 4–5 fail in CI. None can be silently bypassed.


Footguns + areas of improvement

Honest list. Some are documented trade-offs; others are real gaps that should land as follow-ups.

🔴 Gap — AUTO_DENIED_COLUMN_PATTERNS is defined but never enforced

sql_tool_conventions.py:41 defines patterns for columns that look sensitive (*password*, *token*, *ssn*, etc.) and exports a matches_auto_deny_pattern helper. Nothing in the rewriter or the CI gate calls it. Tests cover the pattern matcher in isolation, but no production code path inspects column names against it.

A new column named payment_token marked Mapped[Public[str]] would pass the classification gate today.

Fix: wire matches_auto_deny_pattern into test_every_column_on_exposed_table_is_classified to require an entry in tests/snapshots/sql_tool_sensitive_overrides.json whenever a Public column matches a deny pattern. The signoff JSON file already exists for that purpose.

🔴 Gap — RLS-registry drift not enforced by structure

RLS_POLICIES in app/api/agent/rls_policies/__init__.py is the source of truth; migration e09d5464d3fe reads it at upgrade time. Changing a policy's predicate in the registry alone is invisible to already-migrated databases — old migrations don't replay. test_rls_policy_drift.py catches drift on a fresh DB but not on long-running ones.

Convention (currently undocumented in CONTRIBUTING): adding or changing a policy requires a new alembic migration calling apply_policy() for the changed entry.

Fix: add a version int to each PolicyDefinition. Drift test asserts pg_policies matches the current version. Adding to the registry without bumping version + writing a migration fails CI. Manageable now (10 tables); dangerous past ~25.

🔴 Gap — semantic ambiguity in column names

Cross-tenant leaks are physically prevented by the four layers. The actual long-term failure mode is answers that are technically scoped but semantically wrong — agent picks task_is_internal thinking it means "ops-internal" when it gates customer visibility, etc.

Three levers under-used today:

  1. agent_description per column via info={"sql_tool": {"agent_description": "..."}} — flows into describe_schema. Used on exactly one column (milestone_name in ORModule).
  2. example_queries per table — supported in allowlist.describe but empty everywhere.
  3. Semantic views. A vw_customer_visible_tasks that codifies the joint meaning of task_is_internal + is_customer_visible once. Agents stop reasoning about flags. Separate plan.

Adding more columns without addressing this scales rope, not power.

🔴 Gap — critical mapping tables not yet exposed

or_task_mappings (module ↔ task ordering), or_module_mappings (playbook ↔ module ordering), and or_task_dependencies (task dependency graph) are not in the allowlist or RLS_POLICIES. Agent can read or_tasks but cannot reconstruct task order or module composition — most "what's the next task" / "what's blocked by what" questions unanswerable via SQL. All three have direct vendor FK; RLS template identical to or_tasks.

🔴 Gap — account table has no RLS policy

account is in the _ALLOWLISTED_TABLES SELECT-grant list (migration b6519fcbc296) and the agent can SELECT from it, but it is not in RLS_POLICIES in app/api/agent/rls_policies/__init__.py. The rewriter is the only defense for cross-vendor account reads.

If a future SQL shape slips past the scope injector, RLS would catch it on every other RLS-protected table — but not on account.

Fix: add app/api/agent/rls_policies/account.py and register it.

🟡 Footgun — over-permissive table flags have no signoff gate

__sql_tool_policy__["global_lookup"] = True and "rls_scoped": True both short-circuit predicate injection in scope_injector.py:144–146. A developer marking a table global_lookup=True by mistake silently drops scope. Default-deny only catches missing markers, not over-permissive ones.

Fix: add a snapshot test listing tables with global_lookup or rls_scoped set, requiring reviewer signoff to add new entries (mirrors the sql_tool_sensitive_overrides.json pattern).

🟡 Footgun — hashid validator only probes two tables

cap_injector._reject_if_cross_vendor_hashid decodes hashid-shaped string literals and looks them up in or_projects then or_tasks. A cross-vendor hashid for or_resources, or_modules, etc. is not caught at this layer. (The scope injector still rewrites the WHERE so cross-vendor data wouldn't be returned — but the early hashid check is incomplete.)

Fix: extend the lookup to every table with a vendor FK, or — better — drop the hashid pre-check entirely and rely on scope injection + RLS. The pre-check is best-effort already.

🟡 Footgun — hashid string-literal scan misses pg-string forms

_validate_hashids regex is r"'([^']*)'". It misses:

  • E-strings (E'...')
  • Dollar-quoted strings ($$...$$)
  • Escaped quotes ('O''Brien')
  • Comments (/* ... */)

Same caveat — scope injection is the real defense, this is opportunistic. Worth either tightening or removing.

🟡 Footgun — _lookup_customer_id returns LIMIT 1 from mapping table

sql_read_controller._lookup_customer_id falls back to the lowest-id mapping row when customer_id isn't passed. A portal user mapped to multiple accounts gets a stable but possibly wrong default. Fine in steady state (most portal users map to one account), but a quiet failure mode if that ever changes.

Fix: require customer_id in the payload for portal users — never silently pick.

🟡 Footgun — column.info["sql_tool"] shallow-merges over markers

orm_allowlist._collect_exposed_columns lines 170–171 dict-merges info_override on top of marker-derived flags. info={"sql_tool": {"exposed": False}} on a column also marked Public[T] would silently disable exposure. Probably intentional as an escape hatch — but no warning logged.

Fix: log a warning when info-dict overrides a marker flag with a different value.

🟢 Dead code — rejection_snippet is unused

agent_surfaces.rejection_snippet is exported but never called. The rewriter raises ScopeInjectionError with raw message strings instead. Either wire it into the error path so the agent gets a uniform refusal, or delete it.

🟢 Limitation — polymorphic columns blanket-rejected

validators._POLYMORPHIC_COLUMNS = {"assoc_object_type_id", "assoc_object_id"} rejects any query touching them. Documented as v1; v2 would resolve the type id at parse time and rewrite the join. Affects or_comment and similar polymorphic tables. Documented in app/api/agent/rls_policies/or_comment.py.

🟢 Operational — registry cache lives until worker restart

collect_allowlist and load_registry are @lru_cache. Adding a new ORM model needs a worker restart to land in the SQL-tool registry. Fine in production; surprising in dev. make dev already restarts on Python changes.

🟢 Operational — no HMAC on the internal endpoint

Documented decision in PR description. The four defense layers below it still hold; HMAC would only add a "trusted-caller" signal. Future hardening idea: HKDF-derived per-endpoint keys from a single master secret (single env var, cryptographic isolation per endpoint) — currently tracked as TBD in the PR body.


Adding a new column or table — the playbook

mermaid
flowchart TD
    Start([New column on a SQL-tool-exposed table]) --> Q1{Is it sensitive?}
    Q1 -->|Yes — secrets, passwords| Internal[Mapped Internal'T']
    Q1 -->|Yes — vendor-internal| Confidential[Mapped Confidential'T']
    Q1 -->|No — agent should query| Public[Mapped Public'T']
    Q1 -->|It's a tenancy FK| OwnerCheck{FK auto-detect ambiguous?}
    OwnerCheck -->|Yes| Override[Mapped VendorOwner/CustomerOwner'T']
    OwnerCheck -->|No| Skip[No marker — auto-detected]

    Internal --> Done([CI passes])
    Confidential --> Done
    Public --> Done
    Override --> Done
    Skip --> Done

    Start --> Q2{Auto-classified name?<br/>id/uuid/created_at/...}
    Q2 -->|Yes| Done

    Start --> Q3{New table?}
    Q3 -->|Yes — portal-readable| Mixin[Add PortalAccessible mixin]
    Q3 -->|Yes — vendor-only| Policy[Add __sql_tool_policy__ dict]
    Mixin --> RLS{Tenant join through bridge?}
    Policy --> RLS
    RLS -->|Yes| RLSAdd[Add app/api/agent/rls_policies/your_table.py]
    RLS -->|No| Done
    RLSAdd --> Done

Quick checklist for a new exposed table:

  1. Inherit ORStandardModel (or ORBaseModel) — gives you the auto-classified columns for free.
  2. Add PortalAccessible mixin if portal callers should see it.
  3. Add __sql_tool_policy__ = {...} for role filters, customer scoping, project membership.
  4. Tag every non-auto-classified column with a marker (or info={"sql_tool": {...}}).
  5. If the table has no direct vendor FK and isn't global_lookup, add an RLS policy in app/api/agent/rls_policies/.
  6. Run pytest app/api/agent/tests -q --tb=short --no-header — the classification gate, snapshot test, and RLS drift test will tell you what's missing.
  7. Snapshot bump (tests/snapshots/sql_tool_allowlist.json) — regenerate, review the diff.

Test surfaces

TestWhat it locks
test_sql_tool_classification_gate.pyDefault-deny. Every column on every exposed table must be classified or signoff-listed
test_sql_tool_skeleton.pyMarker extraction, both forms (typed + info-dict), auto-deny patterns, conventions
test_sql_rewriter.py50+ cases against scope_injector, validators, cap_injector — cross-vendor isolation, CTE/UNION/EXISTS coverage
test_sql_read_service.pyEnd-to-end orchestration including audit emission on every path
test_sql_read_controller.pyIdentity cross-check, customer mapping, role resolution
test_rls_policy_drift.pypg_policies matches RLS_POLICIES registry (CI-skip without docker)
test_rls_enforcement.pyRLS actually rejects cross-vendor reads at the DB layer (CI-skip without docker)

CloudWatch Insights queries for on-call live in sql_audit.py docstring.


File map (one-glance)

app/api/agent/
├── sql_tool_types.py         # 5 markers (Public/Confidential/Internal/VendorOwner/CustomerOwner)
├── sql_tool_mixins.py        # PortalAccessible
├── sql_tool_conventions.py   # AUTO_EXPOSED_COLUMN_NAMES + AUTO_DENIED_COLUMN_PATTERNS + SIGNOFF_FILE

├── controllers/
│   └── sql_read_controller.py   # POST endpoint + identity cross-check
├── services/
│   ├── sql_read_service.py      # 3-layer orchestration + audit
│   ├── sql_audit.py             # AuditRecord + emit_audit
│   └── sql_rewriter/
│       ├── orm_allowlist.py        # ORM walk → registry (boot loader)
│       ├── owner_resolution.py     # FK auto-detection + bridge walk
│       ├── allowlist.py            # registry → AllowlistRegistry per caller type
│       ├── agent_surfaces.py       # describe_schema + orientation card
│       ├── parser.py               # sqlglot wrapper
│       ├── validators.py           # shape rejection (regex + AST)
│       ├── scope_injector.py       # vendor/customer/role/soft-delete WHERE injection
│       └── cap_injector.py         # LIMIT 1000 + hashid cross-vendor check
├── rls_policies/                  # one file per RLS-protected table + helpers + registry
└── tests/
    ├── test_sql_tool_classification_gate.py
    ├── test_sql_tool_skeleton.py
    ├── test_sql_rewriter.py
    ├── test_sql_read_service.py
    ├── test_sql_read_controller.py
    ├── test_rls_policy_drift.py
    └── test_rls_enforcement.py

migrations/versions/
├── 2026_04_29_b6519fcbc296_add_or_readonly_role.py
└── 2026_04_29_e09d5464d3fe_add_sql_tool_rls_policies.py

onramp-agents/src/shared/tools/
└── sql_read.py                # read_via_sql + describe_schema Strands tool factories

Operational env vars

ONRAMP_READONLY_DATABASE_PASSWORD   # required — login user password
ONRAMP_READONLY_DATABASE_HOST       # optional — falls back to ONRAMP_DATABASE_HOST
ONRAMP_DATABASE_PORT                # shared with primary
ONRAMP_DATABASE_NAME                # shared with primary

Per-tenant provisioning: bun devtools/set-readonly-db-env-vars.ts.


Where this plugs into agents

AgentTool factory called from
Portal copilotonramp-agents/src/agents/portal/... builds tools from PortalAgentState (carries is_customer_user=True, customer_id)
OnRamp chat / vendor copilotbuilds from ChatAgentState (carries is_customer_user=False)

Both call the same create_read_via_sql_tool(state) and create_describe_schema_tool(state) factories in shared/tools/sql_read.py. Identity comes from trusted state — never from agent input.


  1. ~/.claude/plans/sql-tool-tenant-isolation-primer.md — threat model, invariants, OnRamp-specific pitfalls
  2. ~/.claude/plans/sql-tool-orm-refactor-chunk1.md — design that landed: markers, mixin, conventions, fail-safe edges
  3. ~/.claude/plans/convert-it-now-piped-catmull.md — chunk 1.5: convert info-dict form to typed markers
  4. ~/.claude/plans/sql-tool-security-review-fixes.md — chunk 2 security findings
  5. ~/.claude/plans/sql-tool-fixup-rls.md — RLS migration consolidation
  6. ~/.claude/plans/sql-tool-drop-hmac.md — explicit decision record on no-HMAC posture

Internal documentation — gated behind Cloudflare Access.