Prismatic Decoupling Plan
OnRamp's CRM integration today routes nearly every CRM read, write, and webhook through Prismatic. This couples our throughput to Prismatic's rate limits, makes our caching strategy opaque, and forces us to debug across a third-party runtime when things go wrong. This document proposes a phased decoupling that reduces Prismatic to a lightweight OAuth token broker — we use it only because it already holds long-lived refresh tokens for our customers' CRM instances.
* Through Phase 5. Phase 6 (SF symmetry) is the only phase requiring customer-side action.
TL;DR
Vue → Flask → CRMs direct
Prismatic mints the access token; we cache it and call HubSpot / Salesforce APIs ourselves. Schema cache lives in Postgres.
Acker + SQS + processor
Existing path stays. Drop the async-automation-handler tail; processor invokes the Python workflow executor directly.
Stays asymmetric (initially)
SF Flow pre-filter remains for the initial refactor. Symmetry with HubSpot is a future-state goal — not a prerequisite.
Highest-leverage move — land the OAuth broker first.
Everything downstream depends on it. Without local access tokens, you can't migrate UI off Prismatic, can't replace the processor's fetch path, can't simplify anything else. Single highest-ROI step in the whole plan.
Why bother
Prismatic API call budget is shared across all our customers; noisy ones throttle the rest.
Can't tune TTLs, invalidate on-demand, or see hit rates.
Workflow failures surface in Prismatic logs, requiring their UI to investigate. No structured access from our observability stack.
Their dashboard is the source of truth; integrating into incident response is awkward.
Prismatic pricing scales per customer. Our customer count grows faster than their unit economics work for us.
Running incident log: .agents/skills/prismatic-scaling/SKILL.md
Architecture
Two distinct flows share infrastructure but follow separate paths:
| Flow | Trigger | Latency budget | Audience | Failure mode |
|---|---|---|---|---|
| Creation | User clicks in UI | Sub-second perceived | Synchronous, user-facing | Show error, retry |
| Trigger | CRM webhook event | Seconds to minutes | Asynchronous, no UI | Retry from SQS, alert on DLQ |
Workflow Creation Flow
UI-driven, outbound, synchronous. User builds a workflow trigger config in the UI. UI lists CRM objects, fields, picklist values, and owners to populate the form. Every render that needs CRM metadata is a round-trip.
flowchart LR
UI[Vue UI<br/>Workflow Builder]
F[Flask CRM proxy]
P[Prismatic<br/>outbound flows]
HS[HubSpot]
SF[Salesforce]
UI -->|fetch objects/fields| F
F --> P
P --> HS
P --> SF
classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
class P prismflowchart LR
UI[Vue UI<br/>Workflow Builder]
F[Flask CRM client]
SC[(or_crm_cache)]
TB[Token broker]
TC[(or_crm_tokens)]
P[Prismatic mint-token]
HS[HubSpot]
SF[Salesforce]
UI -->|fetch objects/fields| F
F --> SC
F --> TB
TB --> TC
TB -.->|miss/expiry| P
F -->|cache miss| HS
F -->|cache miss| SF
classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
classDef cache fill:#d1fae5,stroke:#065f46,color:#064e3b
class P prism
class TC,SC cacheCurrent pain
- Every UI render that needs CRM metadata round-trips through Prismatic
- Prismatic's per-customer API budget shared across all customers
- No insight into cache hit rates; no way to invalidate on demand
Target wins
- Prismatic touched only on token mint (cached 2–6h per vendor)
- Schema cache absorbs most page renders (4–5 calls → 0 CRM hits when warm)
- Cache TTLs, invalidation, hit ratios all observable + tunable
Workflow Trigger Flow
Backend-only, inbound, asynchronous. CRM fires a webhook → OnRamp evaluates the trigger config → workflow executes. No UI involvement. Reliability comes from SQS redrive + idempotency.
flowchart LR
HS[HubSpot]
SF[Salesforce]
ACK[hubspot-acker]
Q1[(hs-inbound SQS)]
PROC[hubspot-processor]
API[prismatic-api-handler]
PI[Prismatic<br/>inbound flow]
Q2[(async SQS)]
AH[async-automation-handler]
F[Flask /process]
HS -->|fire-all| ACK
ACK --> Q1
Q1 --> PROC
PROC -.->|advanced filter| PI
PI --> HS
SF -->|SF Flow filtered| API
API -.->|blind proxy| PI
PI --> SF
PI -->|callback| API
API --> Q2
Q2 --> AH
AH --> F
classDef prism fill:#fecdd3,stroke:#9f1239,color:#881337
classDef queue fill:#fef3c7,stroke:#92400e,color:#78350f
class PI prism
class Q1,Q2 queueflowchart LR
HS[HubSpot]
SF[Salesforce]
HACK[hubspot-acker]
HQ[(hs-inbound)]
HPROC[hubspot-processor]
SACK[salesforce-acker<br/>Phase 6]
SQ[(sf-inbound)]
SPROC[salesforce-processor<br/>Phase 6]
EXEC[Workflow executor]
TB[Token broker]
HS -->|fire-all| HACK
HACK --> HQ
HQ --> HPROC
HPROC -.->|token| TB
HPROC -->|filter+fetch| HS
HPROC --> EXEC
SF -->|Phase 6| SACK
SACK --> SQ
SQ --> SPROC
SPROC -.->|token| TB
SPROC -->|fetch| SF
SPROC --> EXEC
classDef queue fill:#d1fae5,stroke:#065f46,color:#064e3b
classDef phase6 fill:#fef3c7,stroke:#b45309,color:#78350f,stroke-dasharray: 5 3
class HQ,SQ queue
class SACK,SPROC phase6Current pain
- Many hops, two ingress shapes — HS via acker, SF via api-handler proxy
- Prismatic always in path, plus api-handler always called back
- Two SQS queues, two consumers — async-handler is dead weight
- api-handler does double duty — coupling worth untangling
Target wins
- 3 hops: acker → SQS → processor → executor (no Prismatic in path)
- Per-CRM ackers + queues = independent scaling, monitoring, blast radius
- Same flow shape for both CRMs (SF mirrors HS in Phase 6)
SF rollout note: The salesforce-acker / salesforce-processor / sf-inbound SQS are Phase 6 deliverables (dashed nodes in the target diagram). Through Phases 1–5, Salesforce continues to flow through SF Flows → Prismatic inbound unchanged.
The phased plan
OAuth Broker Shim
Python-side get_access_token(vendor_id, crm_type). Calls Prismatic only when cached token is missing or expiring.
Component flow:
flowchart LR
Caller[Flask service code] --> Broker[TokenBroker]
Broker --> Cache[(or_crm_tokens)]
Broker -->|on miss/expiry| Prism[Prismatic mint-token flow]
Prism -->|access token| Broker
Broker -->|write-through| Cache
Broker --> CallerDB schema:
CREATE TABLE or_crm_tokens (
id BIGSERIAL PRIMARY KEY,
vendor_id INTEGER NOT NULL,
crm_type VARCHAR(32) NOT NULL,
access_token TEXT NOT NULL, -- encrypted at rest
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (vendor_id, crm_type)
);
CREATE INDEX or_crm_tokens_expires_at_idx ON or_crm_tokens (expires_at);Single-flight required. Multiple concurrent callers after token expiry must resolve to a single Prismatic call (Redis SETNX or pg advisory lock). Without it, you'll thundering-herd Prismatic on every refresh cycle.
Cross-cutting concerns
Apply across every phase:
Canary deployment
Every phase touching live traffic uses vendor-scoped feature flags. Same canary set across all phases — different populations break attribution.
Rate-limit governance
Cache is load-bearing, not optimization. Workflow builder renders 4–5 CRM calls if uncached; a busy day exhausts the HS daily limit.
Observability
Baseline metrics required per component: cache hit ratio, p50/p95/p99 latencies, retry counts, SQS lag, DLQ depth. Surface in existing dashboards.
Security
Tokens encrypted at rest. HubSpot webhook signature verification (X-HubSpot-Signature-V3) on acker. OAuth scope mismatch → graceful "reconnect" UX, not 500.
CRM rate limits (defaults):
| CRM | Limit | Period |
|---|---|---|
| HubSpot | 250k (Pro) / 500k (Enterprise) | Daily / portal |
| HubSpot | 100 (Pro) / 150 (Enterprise) | Per 10 seconds |
| Salesforce | ~15k SOQL (Pro) | 24 hours |
| Salesforce | 100k+ API requests | 24 hours (edition-dependent) |
Risks
Mitigation must be in place before that phase ships.
Idempotency assumption wrong in workflow executor
SQS redrive duplicates side effects (double-email, double-eSign). Mitigation: audit table + dedup-key DB constraint per non-idempotent action.
Stacking Phase 1 + 3 + 5 simultaneously bricks a customer
Mitigation: strict phase ordering; canary scope identical across phases.
Cache staleness frustrates UX
"I added a field in HS and OnRamp doesn't see it." Mitigation: short TTL + manual refresh button + HS schema-change webhooks where available.
Single-flight bug → thundering herd to Prismatic on token expiry
Mitigation: Redis SETNX or pg advisory lock + load test before canary expansion.
FastAPI Lambda cold start slow under bursty load
Mitigation: provisioned concurrency on executor Lambda + RDS Proxy + load test.
SF dual-run divergence missed during cutover
Mitigation: comparison logger + alert on any divergence, N-week manual review before cutover.
Prismatic mint-token flow itself rate-limits us
Mitigation: cache aggressively (tokens live 2–6h); single-flight ensures one mint per (vendor, crm) per refresh cycle.
Two CRM client codebases (Python + TS) drift
Mitigation: defer until Phase 5 unifies; document boundary contracts in the interim.
Open questions
Durability — when to revisit Step Functions Deferred
SQS + idempotency is the chosen strategy. Revisit Step Functions if any become true:
- Workflow executions exceed 15-min Lambda timeout
- Pause-for-callback semantics required (eSign completion, customer approval mid-workflow)
- Compliance audit requires step-level replay history
- Non-idempotent step count exceeds maintainable dedup-key surface
CRM client language unification Phase 5 territory
Today: Python CRM client (new, Flask) + TS CRM client (existing, lambda processor). Drift potential.
Options:
- Port processor to Python (Phase 5 territory)
- Keep TS processor but route CRM calls through Python via HTTP (adds latency)
- Accept drift, document the boundary
Decision deferred until Phase 5 scope is clear.
Cache backing store Postgres default
Postgres JSONB is the proposed default. Revisit Redis if:
- Cache table size exceeds ~10GB and bloats VACUUM
- p99 cache read latency exceeds 50ms under load
- Need pub/sub for cache invalidation across pods
Envelope pattern for inbound messages Picking up next week
Today SQS messages between acker → processor carry raw CRM payloads; processor branches on origin. Proposal: wrap raw payload in a normalized envelope that the per-CRM acker adapter populates.
Sketch:
{
"envelope": {
"version": "1",
"event_id": "evt_xyz",
"received_at": "2026-05-22T18:42:00Z",
"tenant": {
"crm": "hubspot",
"tenant_id": "portal_12345",
"vendor_id": 42
},
"object": {
"type": "deal",
"id": "67890",
"action": "property_change",
"changed_fields": ["stage", "amount"]
},
"trace_id": "...",
"signature_verified": true
},
"raw": {}
}Why bother:
- Acker adapter does sig-verify + tenant resolution + dedup + envelope build
- Processor reads envelope for routing/filtering — drops into
rawonly for CRM-specific business logic - CRM-agnostic metrics — single counter
events_processed{crm=..., object_type=..., action=...} - Replay/audit — envelope alone tells you what fired without re-parsing raw
- Adding new CRM (Pipedrive, MS Dynamics) becomes "implement adapter," not "fork everything"
Shared processor — yes, with caveats. Plausible if envelope is the only contract upstream consumers see. Compromise: shared processor core library in Python, thin per-CRM Lambda entries (mirror the acker adapter pattern). Keeps independent scaling per CRM, independent blast radius, and shared filter-eval + workflow-invocation logic.
Full single-Lambda unification is not worth it: branches on envelope.tenant.crm anyway, collapses blast radius for no real gain.
Normalization vocab: object.type → contact | deal | company | account | opportunity | task | custom_<name>. object.action → created | updated | deleted | property_change | association_change. Note: HS property_change doesn't cleanly map to all SF event types (PushTopic, Platform Event, CDC, Change Events). Start narrow, version aggressively.
When to decide: Phase 6 timeframe. Acker adapter design ships either way; envelope-vs-CRM-native is the message-contract decision between adapter output and processor input. Cheap to defer; expensive to retrofit if processor branches on raw schema everywhere.
Webhook signature verification Security gap
HubSpot acker today does not verify X-HubSpot-Signature-V3. Tracked as follow-up in the acker source comment. Close as part of Phase 0 or Phase 3, not later.
References
.agents/skills/prismatic-scaling/SKILL.md— running incident logdocs/workflow-execution-retry.mddocs/interactive-agent-service-architecture.md…/lambdas/hubspot-acker/handler.js…/lambdas/hubspot-processor/handler.jsprismatic/.docs/hubspot-workflows-prefilter-plan.md(scrapped)