On this page
  1. Overview
  2. Base URL
  3. Authentication
  4. POST /v1/chat/completions
  5. Request body
  6. Response body
  7. Status values
  8. Policy decisions
  9. Provider routing
  10. Rate limits
  11. Error codes
  12. Logging
  13. Examples
  14. Streaming
  15. Limitations
  16. Changelog
API Reference · v1

Ventrin API

A proxy for LLM chat completions. Requests are evaluated against policy packs, then forwarded to OpenAI, Anthropic, or Gemini using credentials stored in your workspace.

JSON over HTTPS OpenAI-compatible request shape

Overview

The API accepts chat completion requests, runs them through a deterministic policy engine, and proxies them to a provider. Each response includes a status field describing what the gateway did: allowed, sanitised, warn, blocked, or error.

Provider credentials (OpenAI, Anthropic, Gemini) are stored encrypted in the workspace and used to make the upstream call. Inference is not run by Ventrin.

Base URL

url
https://www.ventrin.com

Requests are handled in europe-west1. Provider calls originate from the same region.

Authentication

Every request requires a workspace API key in one header:

headers
Authorization: Bearer vtk_...
# or
X-Ventrin-Api-Key: vtk_...

Keys are prefixed vtk_. Only the SHA-256 of the key is persisted; the plaintext is shown once at creation. Revoke in API Access → API Keys; revocation is immediate.

Key-level controls:

  • packIds — policy packs applied to requests from this key. Empty uses the workspace default.
  • allowedProviders — restrict to a subset of openai, anthropic, gemini. Empty allows all configured providers.
  • rpmLimit — per-key requests per minute. null uses the workspace default.

POST /v1/chat/completions

MethodPOST
Path/v1/chat/completions
Content-Typeapplication/json
Max user-message length60000 characters
Timeout540 seconds

Request body

json
{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful legal assistant." },
    { "role": "user",   "content": "Please draft a reply to the claimant." }
  ],
  "temperature": 0.2,
  "max_tokens": 800,
  "top_p": 1,
  "stream": false,
  "metadata": {
    "service": "internal_legal_tool",
    "request_id": "req_019f"
  }
}
FieldTypeRequiredDescription
modelstringyesProvider-native model id. Determines the provider by prefix.
messagesarrayyesChat messages. Roles: system, user, assistant. Must include at least one user.
temperaturenumbernoPassed through.
max_tokensintegernoPassed through. Required by Anthropic; defaults to 1024 if omitted for that provider.
top_pnumbernoPassed through.
streambooleannoWhen true, response is text/event-stream. See Streaming.
metadata.servicestringnoFree-text tag. Indexed on the log row.
metadata.request_idstringnoEchoed on the response and the log row.
metadata.provider_key_idstringnoPins this request to a specific stored provider key. See Provider routing.

Unrecognised fields (tools, tool_choice, response_format, others) are dropped before forwarding.

Response body

All non-error responses return HTTP 200. The shape is determined by status.

allowed

json
{
  "status": "allowed",
  "request_id": "req_019f",
  "decision": {
    "status": "allowed",
    "score": 0,
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "categories": [],
    "hardBlock": false,
    "hardBlockReasons": [],
    "reason": "No sensitive signals detected.",
    "packIds": ["general"]
  },
  "provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 820 },
  "provider_response": { "id": "chatcmpl-...", "choices": [] }
}

sanitised

The latest user message was rewritten. provider_response answers the rewritten prompt, not the original.

json
{
  "status": "sanitised",
  "decision": {
    "status": "sanitised",
    "score": 72,
    "categories": ["Email address", "Phone number", "UK postcode"],
    "reason": "Rewritten to remove sensitive detail.",
    "hardBlock": false,
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "packIds": ["general", "legal"]
  },
  "sanitised_prompt": "Please email the client to confirm the meeting.",
  "transformations": [
    { "original": "john.smith@example.com", "replacement": "the client", "category": "pii", "strategy": "abstract" },
    { "original": "020 7946 0958",          "replacement": "",           "category": "pii", "strategy": "remove"   }
  ],
  "provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 740 },
  "provider_response": {}
}

blocked

The request was not forwarded. suggested_safe_version is absent on credential hard-blocks.

json
{
  "status": "blocked",
  "decision": {
    "status": "blocked",
    "score": 999,
    "categories": ["OpenAI API key"],
    "reason": "Credentials detected - request refused.",
    "hardBlock": true,
    "hardBlockReasons": ["OpenAI API key"],
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "packIds": ["general"]
  },
  "message": "Request blocked. The content could not be sent safely even after rewriting."
}

error

Returned with HTTP 4xx or 5xx. See Error codes.

json
{
  "status": "error",
  "request_id": "req_019f",
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded (60/60 rpm). Retry in 17s."
  }
}

Status values

statusHTTPForwardedBody contains
allowed200yes, unchangedprovider_response
sanitised200yes, rewrittenprovider_response, sanitised_prompt, transformations
warn200yes, unchangedprovider_response. Escalates to blocked when strict mode is on.
blocked200nomessage, optional suggested_safe_version
error400–502varieserror.code, error.message

Policy decisions

Four layers evaluate the latest user message. Identical layers run in the browser extension against the same packs.

  1. Hard patterns

    Regex for credentials (sk-, AKIA, bearer tokens, private keys, connection strings, password mentions). First match returns blocked with hardBlock: true.

  2. Weighted terms

    Per-pack keyword banks (legal, healthcare, general). Duplicate terms across packs resolve to max weight.

  3. Context boosters

    Adjacency phrases ("our client", "matter number", "DOB") that amplify weights of nearby terms.

  4. Scoring

    Sum of Layer 2 + Layer 3 compared to pack thresholds warn, sanitise, block. Multiple packs merge to the most-sensitive thresholds.

At the sanitise tier the rewrite is auto-applied before forwarding. If the rewriter cannot preserve intent safely, the decision becomes blocked.

Provider routing

The provider is selected from the model prefix.

ProviderModel prefixUpstream endpoint
openaigpt-*, o1-*, o3-*, chatgpt-*https://api.openai.com/v1/chat/completions
anthropicclaude-*https://api.anthropic.com/v1/messagessystem messages are collected into the system field
geminigemini-*, models/gemini-*generativelanguage.googleapis.com/v1beta/models/{model}:generateContent

An unmatched prefix returns invalid_request. A prefix outside the key's allowedProviders returns the same.

Selecting a specific provider credential

A workspace can store multiple credentials per provider (e.g. three OpenAI project keys). The credential used for a request is resolved in this order:

  1. metadata.provider_key_id in the request body.
  2. X-Ventrin-Provider-Key header.
  3. The provider key bound to the authenticating API key (set in API Access → API Keys).
  4. The isDefault provider key for the inferred provider, scoped to the workspace.
  5. Any enabled provider key for that provider.

Find the id in API Access → Providers. The key must belong to the workspace and match the provider inferred from model; otherwise the request returns provider_not_configured with a message naming the key id.

json
{
  "model": "gpt-4o-mini",
  "messages": [{ "role": "user", "content": "..." }],
  "metadata": {
    "service": "contract-review",
    "provider_key_id": "gK2nW...9pQ"
  }
}

Rate limits

Two limits are enforced. Both use per-minute Firestore-transactional counters.

  • Per keyrpmLimit on the key, or the workspace defaultRpmLimit (default 60).
  • Per workspaceworkspaceRpmLimit (default 600).

Exceeding either limit returns HTTP 429 with error.code: "rate_limited". The message includes seconds until the minute bucket rolls over.

Error codes

codeHTTPCause
unauthenticated401Missing or revoked API key.
forbidden403gateway.enabled is false for this workspace.
invalid_request400, 413Missing model or messages, unsupported model, prompt over 60000 chars.
provider_not_configured400No active provider key for the resolved provider.
rate_limited429Per-key or per-workspace rpm exceeded.
provider_error502Upstream provider returned non-2xx. error.message contains the provider message.
internal500Gateway failure. Include request_id when reporting.

Logging

One row per request is written to gatewayLogs. Fields:

  • workspaceId, apiKeyId, service, requestId
  • provider, model, decision, categories, score, hardBlock, sanitised
  • maskedPreview — always written
  • encryptedFullPrompt, encryptedSanitisedPrompt — only when gateway.logMode = "full", AES-256-GCM
  • timings: policyMs, rewriteMs, providerMs, totalMs
  • providerError — set when the upstream call failed

Revealing a full prompt writes an audit row to gatewayRevealLogs with the admin user id and timestamp. Log retention is not yet enforced.

Examples

curl

bash
curl -X POST \
  https://www.ventrin.com/v1/chat/completions \
  -H "Authorization: Bearer $VENTRIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Summarise arbitration in two sentences." }],
    "metadata": { "service": "legal-assistant" }
  }'

Node.js

javascript
const res = await fetch(
  "https://www.ventrin.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      "content-type": "application/json",
      "authorization": `Bearer ${process.env.VENTRIN_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: userMessage }],
      metadata: { service: "legal-assistant", request_id: crypto.randomUUID() },
    }),
  },
);
const data = await res.json();

switch (data.status) {
  case "allowed":
  case "warn":
    return data.provider_response;
  case "sanitised":
    return { answer: data.provider_response, rewritten: data.sanitised_prompt };
  case "blocked":
    return { refused: true, message: data.message, retry: data.suggested_safe_version };
  case "error":
    throw Object.assign(new Error(data.error.message), { code: data.error.code });
}

Python

python
import os, requests, uuid

r = requests.post(
    "https://www.ventrin.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['VENTRIN_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": user_message}],
        "metadata": {"service": "intake", "request_id": str(uuid.uuid4())},
    },
    timeout=60,
)
data = r.json()

Streaming

Set "stream": true in the request body. The response is Server-Sent Events (text/event-stream).

Ordering:

  1. Policy is evaluated on the full request before any bytes are streamed. If the decision is blocked, the server returns a normal JSON response (HTTP 200, non-streaming).
  2. If the decision permits forwarding, the server emits a single Ventrin event carrying the decision envelope, then pipes the provider's SSE events through unchanged.
sse
event: ventrin.decision
data: {"request_id":"req_019f","decision":{"status":"sanitised","score":72,"categories":["Email address"],"reason":"Rewritten to remove sensitive detail.","hardBlock":false,"thresholds":{"warn":10,"sanitise":40,"block":85},"packIds":["general"]},"provider":{"name":"openai","model":"gpt-4o-mini"},"sanitised_prompt":"Please email the client to confirm.","transformations":[...]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" there"}}]}

data: [DONE]

On upstream failure the server emits:

sse
event: ventrin.error
data: {"message":"upstream read error"}

Consume the ventrin.decision event first to surface the status to the user, then parse the provider events with your existing SDK's streaming parser.

Limitations

  • Policy scans the latest user message only. Prior turns in messages are forwarded as-is.
  • tools, tool_choice, response_format are dropped before forwarding.
  • Token-count context window is not validated; oversize inputs return provider_error from upstream.
  • Provider errors are returned verbatim via provider_error. Content-filter refusals from the provider are distinct from Ventrin blocked.
  • Token usage is not written to gatewayLogs; reconcile with the provider's usage export by timestamp.
  • Log retention is unbounded.
  • Webhooks, email, and Slack notifications are not implemented.
  • Single region (europe-west1).

Changelog

v1.0 — 2026-04-23
Initial release. POST /v1/chat/completions, five status values, streaming, OpenAI / Anthropic / Gemini routing, API key and provider key management, audit logs, rate limits.