Defend every AI decision before the regulator asks.

Proofpane is the evidence layer for AI in regulated teams. Three contracts:
(1) every call passes a policy gate and lands in a tamper-evident audit log.
(2) every production default — which prompt ships, which model, which memory strategy — comes from a significant experiment with an inter-rater reliability floor, not a hunch.
(3) the whole record exports as a signed Evidence Pack your auditor verifies offline.

For CISOs, CAEs, and Risk Officers. Not a policy-doc GRC checklist; not a log aggregator; not a prompt-injection scanner — see how we’re different.

One decision, end to end: production default → frozen verdict → source experiment → reliability gate → signed Evidence Pack
See it live Create account Talk to us

 One-click demo. No signup. No card. Populated org with real frozen verdicts.

Use any AI tool. Govern all of them.

Proofpane is a thin governance layer beneath the tools your team already loves — Claude Desktop, Cursor, Continue, your in-house MCP servers, your own agent code. Members keep their workflows; you keep the hash-chained audit, the cost cap, the policy gate, the Evidence Pack. No wrappers. No vendor lock-in. No "you must use our IDE."

Available reach: 10,000+ apps and 40,000+ actions. Every tool above speaks MCP — wire Proofpane in once, govern any of them. One integration, the whole ecosystem.

The six regulator questions Proofpane answers

“Why is this your production default?” significance + IRR floor

Every production default — which prompt variant ships, which memory strategy lives, which provider is the baseline — passes (a) a statistical significance gate over a content-hashed fixture, then (b) an inter-rater reliability floor (Krippendorff α with bootstrap CI — the same measure clinical-trial reviewers use to prove humans agree above chance). The verdict, the confidence interval, the fixture hash, the DLP rule-set fingerprint that scrubbed it, the approving operator — all frozen on the audit row and shipped in the Evidence Pack. Your auditor reconstructs why this is the current default from the bundle alone. No meeting required. No engineer dragged in. Six years from now, same answer, same hash.

“Show me everything that happened, signed.” tamper-evident audit

Every AI decision your team makes — every prompt, every multi-agent run, every Cursor session — lands in a cryptographically chained log scoped per tenant, so cross-tenant tampering is structurally detectable. Export as a signed Evidence Pack — a standalone offline verifier ships in the bundle so your auditor reads it without backend access, without a Proofpane account, six years from now.

Audit timeline showing hash-chained events — every AI decision logged with cryptographic linkage between rows.

“Map this to my framework controls.” NIST · ISO 42001 · EU AI Act · GDPR · SOC 2

Control library aligned with NIST AI RMF, ISO/IEC 42001, and EU AI Act evidence expectations — pre-mapped per skill, with per-org overrides. A closed-set guard cross-checks every cited control ID against a curated truth set so fabricated references can't pass. Proofpane supports operational evidence; it does not replace legal, regulatory, or certification assessment.

Compliance dashboard: NIST AI RMF · ISO 42001 · EU AI Act · GDPR · SOC 2 framework coverage with per-control mapping to skills + per-org overrides.

“How do you stop costs running away — and catch quality drops?” cost-aware by design · 5 detection layers

Token budget control is the spine of the architecture, not a dashboard pasted on top — every call records token + latency + cost into the chain, and five layers catch cost-explosions before they become invoices: (1) a pre-call gate refuses LLM calls over per-org cap (refusal audited); (2) threshold alerts (50% / 80% / 100%) push to Slack + email before you hit cap; (3) per-call anomaly flag on any call > N× recent baseline; (4) month-end forecast projects current burn against cap so a 2-week overspend is visible 2 weeks early; (5) provider price-drift detection — a plausibility band catches silent per-token bumps from Anthropic / OpenAI. Quality runs the same way on a parallel track: closed-set hallucination guard against 259+ control IDs from NIST AI RMF / ISO 42001 / EU AI Act / GDPR / SOC 2, judge-grounded scoring, cross-vendor disagreement (3 providers vote), drift alerts on pass-rate drops. The /cost and /quality dashboards are the views; the design is the contract.

Live: agent caught in a tool loop · auto-paused mid-stream · operator redirects it + resumes · all on the audit chain

Want the full walkthrough? Watch the 1-min Slack + 3-min Salesforce demos →

Quality dashboard: pass rate, hallucination rate, fabricated-ref count, by-skill / by-model / by-provider breakdowns + low-score triage queue.
/quality · pass rate · halluc rate · triage
Cost dashboard: monthly USD spend vs cap, 30-day sparkline, top spenders by skill, anomaly table.
/cost · spend vs cap · top spenders

“How do you improve safely?” human approves every change

Two reflection loops, same approval contract. The first watches the audit log for drift, hallucination, and low-score signals, and proposes prompt edits against the org's own failure cases. The second tracks curated AI-research feeds and auto-sandboxes proposed updates against production behaviour. In both cases only the changes a human approves ever go live.

Internal reflection queue: proposed prompt edits awaiting human approval — every change goes through review before going live.
Internal reflection · audit-log driven
External reflection: research-feed scout that promotes high-relevance items into approval-gated sandbox sessions.
External reflection · research-driven

“Show me the process, not just the outputs.” visual workflows · multi-agent · scheduled

Compose governance tasks, multi-agent primitives (consensus and adversarial review), and scheduled triggers on a visual canvas. An AI builder edits the graph for you. Every node execution writes a row into the same audit chain — the canvas is the planning view, the chain is the proof.

Multi-agent workflow canvas — Classify AI System → Impact Assessment 3-way consensus → Map Obligations, with live execution status and skill palette.
How we’re different

What we are, what we are not, and the layer you’re missing.

Policy-doc GRC tools

Vanta, Drata, Secureframe

Certify that you have a control. Auto-collect SOC 2 / ISO evidence about your infrastructure. Excellent for the certification audit. Gap: Don’t see inside the AI call. Can’t prove the model picked a defensible answer.

Log aggregators

CloudTrail, Datadog, Splunk, ELK

Record what happened across infrastructure. Powerful for incident reconstruction. Gap: Plain logs; not hash-chained, not signed, not scored. An auditor still has to take your word that the row wasn’t edited.

Proofpane

Evidence layer for AI in regulated teams

Hash-chained audit + significance-gated production defaults + inter-rater reliability floor + signed offline-verifiable Evidence Pack. When the regulator asks why this is your default — six months from now or six years — the answer is one URL. Same hash. Same row.

Complementary, not competitive: most Proofpane customers keep their GRC tool for SOC 2 + their log aggregator for SRE. Proofpane is the missing third layer — the one your auditor opens when they ask about a specific AI decision.

Works wherever your team uses AI
Get early access Sign in