On this page
Ventrin API
A proxy for LLM chat completions. Requests are evaluated against policy packs, then forwarded to OpenAI, Anthropic, or Gemini using credentials stored in your workspace.
Overview
The API accepts chat completion requests, runs them through a deterministic policy engine, and proxies them to a provider. Each response includes a status field describing what the gateway did: allowed, sanitised, warn, blocked, or error.
Provider credentials (OpenAI, Anthropic, Gemini) are stored encrypted in the workspace and used to make the upstream call. Inference is not run by Ventrin.
Base URL
https://www.ventrin.com
Requests are handled in europe-west1. Provider calls originate from the same region.
Authentication
Every request requires a workspace API key in one header:
Authorization: Bearer vtk_...
# or
X-Ventrin-Api-Key: vtk_...
Keys are prefixed vtk_. Only the SHA-256 of the key is persisted; the plaintext is shown once at creation. Revoke in API Access → API Keys; revocation is immediate.
Key-level controls:
packIds— policy packs applied to requests from this key. Empty uses the workspace default.allowedProviders— restrict to a subset ofopenai,anthropic,gemini. Empty allows all configured providers.rpmLimit— per-key requests per minute.nulluses the workspace default.
POST /v1/chat/completions
| Method | POST |
| Path | /v1/chat/completions |
| Content-Type | application/json |
| Max user-message length | 60000 characters |
| Timeout | 540 seconds |
Request body
{
"model": "gpt-4o-mini",
"messages": [
{ "role": "system", "content": "You are a helpful legal assistant." },
{ "role": "user", "content": "Please draft a reply to the claimant." }
],
"temperature": 0.2,
"max_tokens": 800,
"top_p": 1,
"stream": false,
"metadata": {
"service": "internal_legal_tool",
"request_id": "req_019f"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Provider-native model id. Determines the provider by prefix. |
messages | array | yes | Chat messages. Roles: system, user, assistant. Must include at least one user. |
temperature | number | no | Passed through. |
max_tokens | integer | no | Passed through. Required by Anthropic; defaults to 1024 if omitted for that provider. |
top_p | number | no | Passed through. |
stream | boolean | no | When true, response is text/event-stream. See Streaming. |
metadata.service | string | no | Free-text tag. Indexed on the log row. |
metadata.request_id | string | no | Echoed on the response and the log row. |
metadata.provider_key_id | string | no | Pins this request to a specific stored provider key. See Provider routing. |
Unrecognised fields (tools, tool_choice, response_format, others) are dropped before forwarding.
Response body
All non-error responses return HTTP 200. The shape is determined by status.
allowed
{
"status": "allowed",
"request_id": "req_019f",
"decision": {
"status": "allowed",
"score": 0,
"thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
"categories": [],
"hardBlock": false,
"hardBlockReasons": [],
"reason": "No sensitive signals detected.",
"packIds": ["general"]
},
"provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 820 },
"provider_response": { "id": "chatcmpl-...", "choices": [] }
}
sanitised
The latest user message was rewritten. provider_response answers the rewritten prompt, not the original.
{
"status": "sanitised",
"decision": {
"status": "sanitised",
"score": 72,
"categories": ["Email address", "Phone number", "UK postcode"],
"reason": "Rewritten to remove sensitive detail.",
"hardBlock": false,
"thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
"packIds": ["general", "legal"]
},
"sanitised_prompt": "Please email the client to confirm the meeting.",
"transformations": [
{ "original": "john.smith@example.com", "replacement": "the client", "category": "pii", "strategy": "abstract" },
{ "original": "020 7946 0958", "replacement": "", "category": "pii", "strategy": "remove" }
],
"provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 740 },
"provider_response": {}
}
blocked
The request was not forwarded. suggested_safe_version is absent on credential hard-blocks.
{
"status": "blocked",
"decision": {
"status": "blocked",
"score": 999,
"categories": ["OpenAI API key"],
"reason": "Credentials detected - request refused.",
"hardBlock": true,
"hardBlockReasons": ["OpenAI API key"],
"thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
"packIds": ["general"]
},
"message": "Request blocked. The content could not be sent safely even after rewriting."
}
error
Returned with HTTP 4xx or 5xx. See Error codes.
{
"status": "error",
"request_id": "req_019f",
"error": {
"code": "rate_limited",
"message": "Rate limit exceeded (60/60 rpm). Retry in 17s."
}
}
Status values
| status | HTTP | Forwarded | Body contains |
|---|---|---|---|
allowed | 200 | yes, unchanged | provider_response |
sanitised | 200 | yes, rewritten | provider_response, sanitised_prompt, transformations |
warn | 200 | yes, unchanged | provider_response. Escalates to blocked when strict mode is on. |
blocked | 200 | no | message, optional suggested_safe_version |
error | 400–502 | varies | error.code, error.message |
Policy decisions
Four layers evaluate the latest user message. Identical layers run in the browser extension against the same packs.
-
Hard patterns
Regex for credentials (
sk-,AKIA, bearer tokens, private keys, connection strings, password mentions). First match returnsblockedwithhardBlock: true. -
Weighted terms
Per-pack keyword banks (legal, healthcare, general). Duplicate terms across packs resolve to max weight.
-
Context boosters
Adjacency phrases ("our client", "matter number", "DOB") that amplify weights of nearby terms.
-
Scoring
Sum of Layer 2 + Layer 3 compared to pack thresholds
warn,sanitise,block. Multiple packs merge to the most-sensitive thresholds.
At the sanitise tier the rewrite is auto-applied before forwarding. If the rewriter cannot preserve intent safely, the decision becomes blocked.
Provider routing
The provider is selected from the model prefix.
| Provider | Model prefix | Upstream endpoint |
|---|---|---|
openai | gpt-*, o1-*, o3-*, chatgpt-* | https://api.openai.com/v1/chat/completions |
anthropic | claude-* | https://api.anthropic.com/v1/messages — system messages are collected into the system field |
gemini | gemini-*, models/gemini-* | generativelanguage.googleapis.com/v1beta/models/{model}:generateContent |
An unmatched prefix returns invalid_request. A prefix outside the key's allowedProviders returns the same.
Selecting a specific provider credential
A workspace can store multiple credentials per provider (e.g. three OpenAI project keys). The credential used for a request is resolved in this order:
metadata.provider_key_idin the request body.X-Ventrin-Provider-Keyheader.- The provider key bound to the authenticating API key (set in API Access → API Keys).
- The
isDefaultprovider key for the inferred provider, scoped to the workspace. - Any enabled provider key for that provider.
Find the id in API Access → Providers. The key must belong to the workspace and match the provider inferred from model; otherwise the request returns provider_not_configured with a message naming the key id.
{
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "..." }],
"metadata": {
"service": "contract-review",
"provider_key_id": "gK2nW...9pQ"
}
}
Rate limits
Two limits are enforced. Both use per-minute Firestore-transactional counters.
- Per key —
rpmLimiton the key, or the workspacedefaultRpmLimit(default 60). - Per workspace —
workspaceRpmLimit(default 600).
Exceeding either limit returns HTTP 429 with error.code: "rate_limited". The message includes seconds until the minute bucket rolls over.
Error codes
| code | HTTP | Cause |
|---|---|---|
unauthenticated | 401 | Missing or revoked API key. |
forbidden | 403 | gateway.enabled is false for this workspace. |
invalid_request | 400, 413 | Missing model or messages, unsupported model, prompt over 60000 chars. |
provider_not_configured | 400 | No active provider key for the resolved provider. |
rate_limited | 429 | Per-key or per-workspace rpm exceeded. |
provider_error | 502 | Upstream provider returned non-2xx. error.message contains the provider message. |
internal | 500 | Gateway failure. Include request_id when reporting. |
Logging
One row per request is written to gatewayLogs. Fields:
workspaceId,apiKeyId,service,requestIdprovider,model,decision,categories,score,hardBlock,sanitisedmaskedPreview— always writtenencryptedFullPrompt,encryptedSanitisedPrompt— only whengateway.logMode = "full", AES-256-GCMtimings:policyMs,rewriteMs,providerMs,totalMsproviderError— set when the upstream call failed
Revealing a full prompt writes an audit row to gatewayRevealLogs with the admin user id and timestamp. Log retention is not yet enforced.
Examples
curl
curl -X POST \
https://www.ventrin.com/v1/chat/completions \
-H "Authorization: Bearer $VENTRIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{ "role": "user", "content": "Summarise arbitration in two sentences." }],
"metadata": { "service": "legal-assistant" }
}'
Node.js
const res = await fetch(
"https://www.ventrin.com/v1/chat/completions",
{
method: "POST",
headers: {
"content-type": "application/json",
"authorization": `Bearer ${process.env.VENTRIN_KEY}`,
},
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [{ role: "user", content: userMessage }],
metadata: { service: "legal-assistant", request_id: crypto.randomUUID() },
}),
},
);
const data = await res.json();
switch (data.status) {
case "allowed":
case "warn":
return data.provider_response;
case "sanitised":
return { answer: data.provider_response, rewritten: data.sanitised_prompt };
case "blocked":
return { refused: true, message: data.message, retry: data.suggested_safe_version };
case "error":
throw Object.assign(new Error(data.error.message), { code: data.error.code });
}
Python
import os, requests, uuid
r = requests.post(
"https://www.ventrin.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['VENTRIN_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": user_message}],
"metadata": {"service": "intake", "request_id": str(uuid.uuid4())},
},
timeout=60,
)
data = r.json()
Streaming
Set "stream": true in the request body. The response is Server-Sent Events (text/event-stream).
Ordering:
- Policy is evaluated on the full request before any bytes are streamed. If the decision is
blocked, the server returns a normal JSON response (HTTP 200, non-streaming). - If the decision permits forwarding, the server emits a single Ventrin event carrying the decision envelope, then pipes the provider's SSE events through unchanged.
event: ventrin.decision
data: {"request_id":"req_019f","decision":{"status":"sanitised","score":72,"categories":["Email address"],"reason":"Rewritten to remove sensitive detail.","hardBlock":false,"thresholds":{"warn":10,"sanitise":40,"block":85},"packIds":["general"]},"provider":{"name":"openai","model":"gpt-4o-mini"},"sanitised_prompt":"Please email the client to confirm.","transformations":[...]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" there"}}]}
data: [DONE]
On upstream failure the server emits:
event: ventrin.error
data: {"message":"upstream read error"}
Consume the ventrin.decision event first to surface the status to the user, then parse the provider events with your existing SDK's streaming parser.
Limitations
- Policy scans the latest
usermessage only. Prior turns inmessagesare forwarded as-is. tools,tool_choice,response_formatare dropped before forwarding.- Token-count context window is not validated; oversize inputs return
provider_errorfrom upstream. - Provider errors are returned verbatim via
provider_error. Content-filter refusals from the provider are distinct from Ventrinblocked. - Token usage is not written to
gatewayLogs; reconcile with the provider's usage export by timestamp. - Log retention is unbounded.
- Webhooks, email, and Slack notifications are not implemented.
- Single region (
europe-west1).
Changelog
- v1.0 — 2026-04-23
- Initial release.
POST /v1/chat/completions, five status values, streaming, OpenAI / Anthropic / Gemini routing, API key and provider key management, audit logs, rate limits.