Prismatic Decoupling Plan

OnRamp's CRM integration today routes nearly every CRM read, write, and webhook through Prismatic. This couples our throughput to Prismatic's rate limits, makes our caching strategy opaque, and forces us to debug across a third-party runtime when things go wrong. This document proposes a phased decoupling that reduces Prismatic to a lightweight OAuth token broker — we use it only because it already holds long-lived refresh tokens for our customers' CRM instances.

ProposalNot yet scheduledOwner: Ross RasmussenLast updated: 2026-05-22

4 → 3

Webhook hops

Phases (0–6)

Flow Prismatic keeps

Customer-side migrations*

* Through Phase 5. Phase 6 (SF symmetry) is the only phase requiring customer-side action.

TL;DR

Outbound

Vue → Flask → CRMs direct

Prismatic mints the access token; we cache it and call HubSpot / Salesforce APIs ourselves. Schema cache lives in Postgres.

Inbound

Acker + SQS + processor

Existing path stays. Drop the async-automation-handler tail; processor invokes the Python workflow executor directly.

Salesforce

Stays asymmetric (initially)

SF Flow pre-filter remains for the initial refactor. Symmetry with HubSpot is a future-state goal — not a prerequisite.

Highest-leverage move — land the OAuth broker first.

Everything downstream depends on it. Without local access tokens, you can't migrate UI off Prismatic, can't replace the processor's fetch path, can't simplify anything else. Single highest-ROI step in the whole plan.

Why bother

HighRate limiting

Prismatic API call budget is shared across all our customers; noisy ones throttle the rest.

MedOpaque caching

Can't tune TTLs, invalidate on-demand, or see hit rates.

HighDebugging across boundary

Workflow failures surface in Prismatic logs, requiring their UI to investigate. No structured access from our observability stack.

MedExecution visibility

Their dashboard is the source of truth; integrating into incident response is awkward.

HighCost scaling

Prismatic pricing scales per customer. Our customer count grows faster than their unit economics work for us.

Running incident log: .agents/skills/prismatic-scaling/SKILL.md

Architecture

Two distinct flows share infrastructure but follow separate paths:

Flow	Trigger	Latency budget	Audience	Failure mode
Creation	User clicks in UI	Sub-second perceived	Synchronous, user-facing	Show error, retry
Trigger	CRM webhook event	Seconds to minutes	Asynchronous, no UI	Retry from SQS, alert on DLQ

Workflow Creation Flow

UI-driven, outbound, synchronous. User builds a workflow trigger config in the UI. UI lists CRM objects, fields, picklist values, and owners to populate the form. Every render that needs CRM metadata is a round-trip.

CREATION — CURRENT

mermaid

flowchart LR
    UI[Vue UI<br/>Workflow Builder]
    F[Flask CRM proxy]
    P[Prismatic<br/>outbound flows]
    HS[HubSpot]
    SF[Salesforce]

    UI -->|fetch objects/fields| F
    F --> P
    P --> HS
    P --> SF

    classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
    class P prism

CREATION — TARGET

mermaid

flowchart LR
    UI[Vue UI<br/>Workflow Builder]
    F[Flask CRM client]
    SC[(or_crm_cache)]
    TB[Token broker]
    TC[(or_crm_tokens)]
    P[Prismatic mint-token]
    HS[HubSpot]
    SF[Salesforce]

    UI -->|fetch objects/fields| F
    F --> SC
    F --> TB
    TB --> TC
    TB -.->|miss/expiry| P
    F -->|cache miss| HS
    F -->|cache miss| SF

    classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
    classDef cache fill:#d1fae5,stroke:#065f46,color:#064e3b
    class P prism
    class TC,SC cache

Current pain

Every UI render that needs CRM metadata round-trips through Prismatic
Prismatic's per-customer API budget shared across all customers
No insight into cache hit rates; no way to invalidate on demand

Target wins

Prismatic touched only on token mint (cached 2–6h per vendor)
Schema cache absorbs most page renders (4–5 calls → 0 CRM hits when warm)
Cache TTLs, invalidation, hit ratios all observable + tunable

Workflow Trigger Flow

Backend-only, inbound, asynchronous. CRM fires a webhook → OnRamp evaluates the trigger config → workflow executes. No UI involvement. Reliability comes from SQS redrive + idempotency.

TRIGGER — CURRENT

mermaid

flowchart LR
    HS[HubSpot]
    SF[Salesforce]
    ACK[hubspot-acker]
    Q1[(hs-inbound SQS)]
    PROC[hubspot-processor]
    API[prismatic-api-handler]
    PI[Prismatic<br/>inbound flow]
    Q2[(async SQS)]
    AH[async-automation-handler]
    F[Flask /process]

    HS -->|fire-all| ACK
    ACK --> Q1
    Q1 --> PROC
    PROC -.->|advanced filter| PI
    PI --> HS

    SF -->|SF Flow filtered| API
    API -.->|blind proxy| PI
    PI --> SF

    PI -->|callback| API
    API --> Q2
    Q2 --> AH
    AH --> F

    classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
    classDef queue fill:#fef3c7,stroke:#92400e,color:#78350f
    class PI prism
    class Q1,Q2 queue

TRIGGER — TARGET

mermaid

flowchart LR
    HS[HubSpot]
    SF[Salesforce]
    HACK[hubspot-acker]
    HQ[(hs-inbound)]
    HPROC[hubspot-processor]
    SACK[salesforce-acker<br/>Phase 6]
    SQ[(sf-inbound)]
    SPROC[salesforce-processor<br/>Phase 6]
    EXEC[Workflow executor]
    TB[Token broker]

    HS -->|fire-all| HACK
    HACK --> HQ
    HQ --> HPROC
    HPROC -.->|token| TB
    HPROC -->|filter+fetch| HS
    HPROC --> EXEC

    SF -->|Phase 6| SACK
    SACK --> SQ
    SQ --> SPROC
    SPROC -.->|token| TB
    SPROC -->|fetch| SF
    SPROC --> EXEC

    classDef queue fill:#d1fae5,stroke:#065f46,color:#064e3b
    classDef phase6 fill:#fef3c7,stroke:#b45309,color:#78350f,stroke-dasharray: 5 3
    class HQ,SQ queue
    class SACK,SPROC phase6

Current pain

Many hops, two ingress shapes — HS via acker, SF via api-handler proxy
Prismatic always in path, plus api-handler always called back
Two SQS queues, two consumers — async-handler is dead weight
api-handler does double duty — coupling worth untangling

Target wins

3 hops: acker → SQS → processor → executor (no Prismatic in path)
Per-CRM ackers + queues = independent scaling, monitoring, blast radius
Same flow shape for both CRMs (SF mirrors HS in Phase 6)

SF rollout note: The salesforce-acker / salesforce-processor / sf-inbound SQS are Phase 6 deliverables (dashed nodes in the target diagram). Through Phases 1–5, Salesforce continues to flow through SF Flows → Prismatic inbound unchanged.

The phased plan

PHASE 0Low riskDoc-only

Inventory

Confirm Prismatic surface area is exactly what we think. Pick canary vendor.

Deliverables:

Inventory doc enumerating every Prismatic-hosted flow and its consumer
Confirmed canary vendor ID + feature flag plumbing
HubSpot webhook signature gap closed in hubspot-acker (or scheduled for Phase 3)

Notes: This is the lowest-risk phase — it's purely documentation and canary plumbing. No live traffic changes. The canary vendor selected here must remain consistent across all subsequent phases; different canary populations break attribution.

PHASE 1Med riskCritical pathPer-vendor canary

OAuth Broker Shim

Python-side get_access_token(vendor_id, crm_type). Calls Prismatic only when cached token is missing or expiring.

Component flow:

mermaid

flowchart LR
    Caller[Flask service code] --> Broker[TokenBroker]
    Broker --> Cache[(or_crm_tokens)]
    Broker -->|on miss/expiry| Prism[Prismatic mint-token flow]
    Prism -->|access token| Broker
    Broker -->|write-through| Cache
    Broker --> Caller

DB schema:

sql

CREATE TABLE or_crm_tokens (
    id BIGSERIAL PRIMARY KEY,
    vendor_id INTEGER NOT NULL,
    crm_type VARCHAR(32) NOT NULL,
    access_token TEXT NOT NULL,         -- encrypted at rest
    expires_at TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (vendor_id, crm_type)
);
CREATE INDEX or_crm_tokens_expires_at_idx ON or_crm_tokens (expires_at);

Single-flight required. Multiple concurrent callers after token expiry must resolve to a single Prismatic call (Redis SETNX or pg advisory lock). Without it, you'll thundering-herd Prismatic on every refresh cycle.

PHASE 2Med riskCritical pathPer-vendor canary

UI Outbound Migration

Replace UI-driven Prismatic flows with Flask CRM endpoints. Mirror their APIs 1:1 to ease QA diff.

Endpoint surface:

http

GET  /api/crm/{vendor_id}/objects
GET  /api/crm/{vendor_id}/objects/{type}/fields
GET  /api/crm/{vendor_id}/objects/{type}/picklist/{field}
GET  /api/crm/{vendor_id}/owners
POST /api/crm/{vendor_id}/objects/{type}/search
GET  /api/crm/{vendor_id}/objects/{type}/{id}
POST /api/crm/{vendor_id}/objects/{type}/{id}
POST /api/crm/{vendor_id}/cache/invalidate

Cache TTL matrix:

Resource	TTL	Cache key
Object types	1h	`(vendor_id, "objects")`
Field definitions	1h	`(vendor_id, "fields", object_type)`
Picklist values	1h	`(vendor_id, "picklist", object_type, field_name)`
Owners / users	6h	`(vendor_id, "owners")`
Record search	0s	not cached
Single-record fetch	0s	not cached

Client wrapper pattern:

python

# app/services/crm/hubspot_client.py
from hubspot import HubSpot

class HubSpotClient:
    def __init__(self, vendor_id, token_broker, cache):
        self._vendor_id = vendor_id
        self._broker = token_broker
        self._cache = cache

    def list_object_fields(self, object_type):
        cached = self._cache.get_fields(self._vendor_id, object_type)
        if cached:
            return cached
        client = HubSpot(access_token=self._broker.get_access_token(self._vendor_id, "hubspot"))
        resp = client.crm.properties.core_api.get_all(object_type=object_type)
        fields = [self._to_field(p) for p in resp.results]
        self._cache.put_fields(self._vendor_id, object_type, fields, ttl=3600)
        return fields

SDK + wrapper: HubSpot hubspot-api-client, Salesforce simple-salesforce. Wrap aggressively; never leak SDK types past the wrapper. SDK absorbs API drift; wrapper owns caching + retry + normalization.

PHASE 3Med riskCritical pathFeature-flagged on processor

Webhook Pipeline Rehaul

Remove async-automation-handler + second SQS queue. Processor writes to DB and invokes executor directly.

Before vs after:

BEFORE

mermaid

flowchart LR
    A[acker] --> Q1[(SQS inbound)]
    Q1 --> P[processor]
    P --> Q2[(SQS async)]
    Q2 --> AAH[async-automation-handler]
    AAH --> F[Flask /process]

AFTER

mermaid

flowchart LR
    A[acker] --> Q1[(SQS inbound)]
    Q1 --> P[processor]
    P -->|direct invoke| F[Workflow executor]

Idempotency audit:

Step	Idempotent?	Mitigation
DB upsert by event UUID	Yes	None
HS object fetch	Read-only	None
CRM field write (HS/SF)	If same value	None
Slack / email notification	No	Dedup table `(event_id, action) UNIQUE`
Project / task creation	If keyed on UUID	Enforce in service
eSign trigger	No	Dedup key in DB

Durability decision: SQS + DLQ + idempotency is sufficient. Step Functions deferred until workflow exceeds 15-min Lambda limit, pause-for-callback semantics needed, or compliance audit demands step-level replay. See Open Questions for revisit triggers.

PHASE 4Low riskPositioning

FastAPI Strangler — Greenfield

Mount Flask under FastAPI to unblock ASGI-native endpoints (SSE, AG-UI). Existing routes unchanged.

Pattern:

python

# main.py
from fastapi import FastAPI
from fastapi.middleware.wsgi import WSGIMiddleware
from app import app as flask_app

api = FastAPI()

@api.post("/api/agent")
async def agent_stream(...):
    return StreamingResponse(...)

api.mount("/", WSGIMiddleware(flask_app))

Run with uvicorn main:api --workers 4. Flask routes unchanged; run in threadpool.

Hard scope rule: FastAPI is for greenfield ASGI-native endpoints only. Do not opportunistically migrate existing Flask routes. Flask-SQLAlchemy ties scoped sessions to request context; FastAPI uses DI. Sharing models works, sharing session-bound services breaks.

Auth bridge: Decode Flask's signed session cookie with the same SECRET_KEY via itsdangerous.URLSafeTimedSerializer. Works only for default signed-cookie sessions — confirm before relying on it.

PHASE 5High riskPhased per workflow type

Workflow Executor Port

Extract workflow logic into a context-free Python module. Importable by Flask routes AND FastAPI Lambda.

Decoupling rules:

Must not reference flask.g, flask.current_app, flask.request, or blueprints
Accepts DB session as argument (not Flask-SQLAlchemy scoped session)
Accepts caller identity (user_id, vendor_id) as arguments
Returns DTOs, not Flask Response objects

Migration path:

mermaid

flowchart TB
    subgraph BEFORE["Before"]
        F1[Flask route] -->|uses g, current_app| W1[Workflow logic]
    end
    subgraph AFTER["After"]
        F2[Flask route] --> S[Workflow service module<br/>context-free]
        L[FastAPI Lambda] --> S
        TS[TS processor] -->|HTTP invoke| L
    end
    BEFORE -.->|step-by-step extract| AFTER

Lambda gotchas

RDS Proxy mandatory
psycopg3 sync + async
Cold start ~1–2s with full ORM
Don't import Flask transitively

Big wins

Eliminates HTTP hop processor→Flask
Same code, two runtimes
Foundation for full FastAPI migration
Serverless cost model

PHASE 6High riskLong-termDual-run required

Salesforce Acker Symmetry

Salesforce gets its own acker + SQS + processor. SF Flows decommissioned per-customer.

Two ackers, two queues — one per CRM. Independent scaling, monitoring, blast radius. HS rate-limit backpressure doesn't starve SF processing.

Acker DOES

Verify webhook signature
Dedup by event ID
Group by tenant (portal/org)
Enqueue to SQS FIFO
Return 200 fast

Acker DOESN'T

Evaluate trigger config
Fetch from CRM
Touch DB beyond dedup
Filter on content
Any CRM business logic

Shared code, per-CRM lambdas. Acker core lives in aws-helpers-js exposing interface AckerAdapter { verifySignature, extractGroupId, deriveDedupId }. Two thin Lambda entries (hubspot-acker, salesforce-acker) each instantiate the shared core with their CRM's adapter.

Why not earlier:

Volume — SF pre-filter today suppresses noise inside Salesforce. Killing it shifts volume to OnRamp infrastructure.
Limits — SF org-level limits (~15k SOQL/24h Pro). Fire-all could exhaust without aggressive caching.
Coord — Migration is destructive on customer side. Coordinated rollout, not a backend-only change.

Deliverables:

Acker core in aws-helpers-js — shared dedup, group-by-tenant, enqueue logic
salesforce-acker lambda — thin entry implementing AckerAdapter for SF
hubspot-acker refactored — same adapter pattern, no behavior change
New salesforce-inbound SQS FIFO — MessageGroupId keyed on SF orgId
salesforce-processor lambda — drains SF queue, fetches via token broker, invokes executor
Customer-side migration runbook — per-customer steps to delete SF Flows and subscribe to webhook firehose

Dual-run plan:

mermaid

flowchart TB
    subgraph DUAL["Dual-run window (N weeks)"]
        SF1[SF Flow pre-filtered] --> OLD[Existing inbound path]
        SF2[SF webhook subscription fire-all] --> NEW[New acker + SQS + processor]
        OLD --> CMP[Comparison logger]
        NEW --> CMP
        CMP --> ALERT[Alert on divergence]
    end

Confirm zero missed events. Cut over per-customer. Decommission SF Flows on customer-side only after confidence is high.

Cross-cutting concerns

Apply across every phase:

Canary deployment

Every phase touching live traffic uses vendor-scoped feature flags. Same canary set across all phases — different populations break attribution.

Rate-limit governance

Cache is load-bearing, not optimization. Workflow builder renders 4–5 CRM calls if uncached; a busy day exhausts the HS daily limit.

Observability

Baseline metrics required per component: cache hit ratio, p50/p95/p99 latencies, retry counts, SQS lag, DLQ depth. Surface in existing dashboards.

Security

Tokens encrypted at rest. HubSpot webhook signature verification (X-HubSpot-Signature-V3) on acker. OAuth scope mismatch → graceful "reconnect" UX, not 500.

CRM rate limits (defaults):

CRM	Limit	Period
HubSpot	250k (Pro) / 500k (Enterprise)	Daily / portal
HubSpot	100 (Pro) / 150 (Enterprise)	Per 10 seconds
Salesforce	~15k SOQL (Pro)	24 hours
Salesforce	100k+ API requests	24 hours (edition-dependent)

Risks

Mitigation must be in place before that phase ships.

High

Idempotency assumption wrong in workflow executor

SQS redrive duplicates side effects (double-email, double-eSign). Mitigation: audit table + dedup-key DB constraint per non-idempotent action.

High

Stacking Phase 1 + 3 + 5 simultaneously bricks a customer

Mitigation: strict phase ordering; canary scope identical across phases.

Med

Cache staleness frustrates UX

"I added a field in HS and OnRamp doesn't see it." Mitigation: short TTL + manual refresh button + HS schema-change webhooks where available.

Med

Single-flight bug → thundering herd to Prismatic on token expiry

Mitigation: Redis SETNX or pg advisory lock + load test before canary expansion.

Med

FastAPI Lambda cold start slow under bursty load

Mitigation: provisioned concurrency on executor Lambda + RDS Proxy + load test.

Med

SF dual-run divergence missed during cutover

Mitigation: comparison logger + alert on any divergence, N-week manual review before cutover.

Low

Prismatic mint-token flow itself rate-limits us

Mitigation: cache aggressively (tokens live 2–6h); single-flight ensures one mint per (vendor, crm) per refresh cycle.

Low

Two CRM client codebases (Python + TS) drift

Mitigation: defer until Phase 5 unifies; document boundary contracts in the interim.

Open questions

Durability — when to revisit Step Functions Deferred

SQS + idempotency is the chosen strategy. Revisit Step Functions if any become true:

Workflow executions exceed 15-min Lambda timeout
Pause-for-callback semantics required (eSign completion, customer approval mid-workflow)
Compliance audit requires step-level replay history
Non-idempotent step count exceeds maintainable dedup-key surface

CRM client language unification Phase 5 territory

Today: Python CRM client (new, Flask) + TS CRM client (existing, lambda processor). Drift potential.

Options:

Port processor to Python (Phase 5 territory)
Keep TS processor but route CRM calls through Python via HTTP (adds latency)
Accept drift, document the boundary

Decision deferred until Phase 5 scope is clear.

Cache backing store Postgres default

Postgres JSONB is the proposed default. Revisit Redis if:

Cache table size exceeds ~10GB and bloats VACUUM
p99 cache read latency exceeds 50ms under load
Need pub/sub for cache invalidation across pods

Envelope pattern for inbound messages Picking up next week

Today SQS messages between acker → processor carry raw CRM payloads; processor branches on origin. Proposal: wrap raw payload in a normalized envelope that the per-CRM acker adapter populates.

Sketch:

json

{
  "envelope": {
    "version": "1",
    "event_id": "evt_xyz",
    "received_at": "2026-05-22T18:42:00Z",
    "tenant": {
      "crm": "hubspot",
      "tenant_id": "portal_12345",
      "vendor_id": 42
    },
    "object": {
      "type": "deal",
      "id": "67890",
      "action": "property_change",
      "changed_fields": ["stage", "amount"]
    },
    "trace_id": "...",
    "signature_verified": true
  },
  "raw": {}
}

Why bother:

Acker adapter does sig-verify + tenant resolution + dedup + envelope build
Processor reads envelope for routing/filtering — drops into raw only for CRM-specific business logic
CRM-agnostic metrics — single counter events_processed{crm=..., object_type=..., action=...}
Replay/audit — envelope alone tells you what fired without re-parsing raw
Adding new CRM (Pipedrive, MS Dynamics) becomes "implement adapter," not "fork everything"

Shared processor — yes, with caveats. Plausible if envelope is the only contract upstream consumers see. Compromise: shared processor core library in Python, thin per-CRM Lambda entries (mirror the acker adapter pattern). Keeps independent scaling per CRM, independent blast radius, and shared filter-eval + workflow-invocation logic.

Full single-Lambda unification is not worth it: branches on envelope.tenant.crm anyway, collapses blast radius for no real gain.

When to decide: Phase 6 timeframe. Acker adapter design ships either way; envelope-vs-CRM-native is the message-contract decision between adapter output and processor input. Cheap to defer; expensive to retrofit if processor branches on raw schema everywhere.

Webhook signature verification Security gap

HubSpot acker today does not verify X-HubSpot-Signature-V3. Tracked as follow-up in the acker source comment. Close as part of Phase 0 or Phase 3, not later.

References

.agents/skills/prismatic-scaling/SKILL.md — running incident log
docs/workflow-execution-retry.md
docs/interactive-agent-service-architecture.md
…/lambdas/hubspot-acker/handler.js
…/lambdas/hubspot-processor/handler.js
prismatic/.docs/hubspot-workflows-prefilter-plan.md (scrapped)

Prismatic Decoupling Plan ​

TL;DR ​

Why bother ​

Architecture ​

Workflow Creation Flow ​

Workflow Trigger Flow ​

The phased plan ​

Inventory ​

OAuth Broker Shim ​

UI Outbound Migration ​

Webhook Pipeline Rehaul ​

FastAPI Strangler — Greenfield ​

Workflow Executor Port ​

Salesforce Acker Symmetry ​

Cross-cutting concerns ​

Risks ​

Open questions ​

References ​

Prismatic Decoupling Plan

TL;DR

Why bother

Architecture

Workflow Creation Flow

Workflow Trigger Flow

The phased plan

Inventory

OAuth Broker Shim

UI Outbound Migration

Webhook Pipeline Rehaul

FastAPI Strangler — Greenfield

Workflow Executor Port

Salesforce Acker Symmetry

Cross-cutting concerns

Risks

Open questions

References