On this page

Overview
Base URL
Authentication
POST /v1/chat/completions
Request body
Response body
Status values
Policy decisions
Provider routing
Rate limits
Error codes
Logging
Examples
Streaming
Limitations
Changelog

API Reference · v1

Ventrin API

A proxy for LLM chat completions. Requests are evaluated against policy packs, then forwarded to OpenAI, Anthropic, or Gemini using credentials stored in your workspace.

JSON over HTTPS OpenAI-compatible request shape

Overview

The API accepts chat completion requests, runs them through a deterministic policy engine, and proxies them to a provider. Each response includes a status field describing what the gateway did: allowed, sanitised, warn, blocked, or error.

Provider credentials (OpenAI, Anthropic, Gemini) are stored encrypted in the workspace and used to make the upstream call. Inference is not run by Ventrin.

Base URL

url

https://www.ventrin.com

Requests are handled in europe-west1. Provider calls originate from the same region.

Authentication

Every request requires a workspace API key in one header:

headers

Authorization: Bearer vtk_...
# or
X-Ventrin-Api-Key: vtk_...

Keys are prefixed vtk_. Only the SHA-256 of the key is persisted; the plaintext is shown once at creation. Revoke in API Access → API Keys; revocation is immediate.

Key-level controls:

packIds — policy packs applied to requests from this key. Empty uses the workspace default.
allowedProviders — restrict to a subset of openai, anthropic, gemini. Empty allows all configured providers.
rpmLimit — per-key requests per minute. null uses the workspace default.

POST /v1/chat/completions

Method	`POST`
Path	`/v1/chat/completions`
Content-Type	`application/json`
Max user-message length	60000 characters
Timeout	540 seconds

Request body

json

{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful legal assistant." },
    { "role": "user",   "content": "Please draft a reply to the claimant." }
  ],
  "temperature": 0.2,
  "max_tokens": 800,
  "top_p": 1,
  "stream": false,
  "metadata": {
    "service": "internal_legal_tool",
    "request_id": "req_019f"
  }
}

Field	Type	Required	Description
`model`	string	yes	Provider-native model id. Determines the provider by prefix.
`messages`	array	yes	Chat messages. Roles: `system`, `user`, `assistant`. Must include at least one `user`.
`temperature`	number	no	Passed through.
`max_tokens`	integer	no	Passed through. Required by Anthropic; defaults to 1024 if omitted for that provider.
`top_p`	number	no	Passed through.
`stream`	boolean	no	When `true`, response is `text/event-stream`. See Streaming.
`metadata.service`	string	no	Free-text tag. Indexed on the log row.
`metadata.request_id`	string	no	Echoed on the response and the log row.
`metadata.provider_key_id`	string	no	Pins this request to a specific stored provider key. See Provider routing.

Unrecognised fields (tools, tool_choice, response_format, others) are dropped before forwarding.

Response body

All non-error responses return HTTP 200. The shape is determined by status.

allowed

json

{
  "status": "allowed",
  "request_id": "req_019f",
  "decision": {
    "status": "allowed",
    "score": 0,
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "categories": [],
    "hardBlock": false,
    "hardBlockReasons": [],
    "reason": "No sensitive signals detected.",
    "packIds": ["general"]
  },
  "provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 820 },
  "provider_response": { "id": "chatcmpl-...", "choices": [] }
}

sanitised

The latest user message was rewritten. provider_response answers the rewritten prompt, not the original.

json

{
  "status": "sanitised",
  "decision": {
    "status": "sanitised",
    "score": 72,
    "categories": ["Email address", "Phone number", "UK postcode"],
    "reason": "Rewritten to remove sensitive detail.",
    "hardBlock": false,
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "packIds": ["general", "legal"]
  },
  "sanitised_prompt": "Please email the client to confirm the meeting.",
  "transformations": [
    { "original": "john.smith@example.com", "replacement": "the client", "category": "pii", "strategy": "abstract" },
    { "original": "020 7946 0958",          "replacement": "",           "category": "pii", "strategy": "remove"   }
  ],
  "provider": { "name": "openai", "model": "gpt-4o-mini", "latencyMs": 740 },
  "provider_response": {}
}

blocked

The request was not forwarded. suggested_safe_version is absent on credential hard-blocks.

json

{
  "status": "blocked",
  "decision": {
    "status": "blocked",
    "score": 999,
    "categories": ["OpenAI API key"],
    "reason": "Credentials detected - request refused.",
    "hardBlock": true,
    "hardBlockReasons": ["OpenAI API key"],
    "thresholds": { "warn": 10, "sanitise": 40, "block": 85 },
    "packIds": ["general"]
  },
  "message": "Request blocked. The content could not be sent safely even after rewriting."
}

error

Returned with HTTP 4xx or 5xx. See Error codes.

json

{
  "status": "error",
  "request_id": "req_019f",
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded (60/60 rpm). Retry in 17s."
  }
}

Status values

status	HTTP	Forwarded	Body contains
`allowed`	200	yes, unchanged	`provider_response`
`sanitised`	200	yes, rewritten	`provider_response`, `sanitised_prompt`, `transformations`
`warn`	200	yes, unchanged	`provider_response`. Escalates to `blocked` when strict mode is on.
`blocked`	200	no	`message`, optional `suggested_safe_version`
`error`	400–502	varies	`error.code`, `error.message`

Policy decisions

Four layers evaluate the latest user message. Identical layers run in the browser extension against the same packs.

Hard patterns

Regex for credentials (sk-, AKIA, bearer tokens, private keys, connection strings, password mentions). First match returns blocked with hardBlock: true.
Weighted terms

Per-pack keyword banks (legal, healthcare, general). Duplicate terms across packs resolve to max weight.
Context boosters

Adjacency phrases ("our client", "matter number", "DOB") that amplify weights of nearby terms.
Scoring

Sum of Layer 2 + Layer 3 compared to pack thresholds warn, sanitise, block. Multiple packs merge to the most-sensitive thresholds.

At the sanitise tier the rewrite is auto-applied before forwarding. If the rewriter cannot preserve intent safely, the decision becomes blocked.

Provider routing

The provider is selected from the model prefix.

Provider	Model prefix	Upstream endpoint
`openai`	`gpt-`, `o1-`, `o3-`, `chatgpt-`	`https://api.openai.com/v1/chat/completions`
`anthropic`	`claude-*`	`https://api.anthropic.com/v1/messages` — `system` messages are collected into the `system` field
`gemini`	`gemini-`, `models/gemini-`	`generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`

An unmatched prefix returns invalid_request. A prefix outside the key's allowedProviders returns the same.

Selecting a specific provider credential

A workspace can store multiple credentials per provider (e.g. three OpenAI project keys). The credential used for a request is resolved in this order:

metadata.provider_key_id in the request body.
X-Ventrin-Provider-Key header.
The provider key bound to the authenticating API key (set in API Access → API Keys).
The isDefault provider key for the inferred provider, scoped to the workspace.
Any enabled provider key for that provider.

Find the id in API Access → Providers. The key must belong to the workspace and match the provider inferred from model; otherwise the request returns provider_not_configured with a message naming the key id.

json

{
  "model": "gpt-4o-mini",
  "messages": [{ "role": "user", "content": "..." }],
  "metadata": {
    "service": "contract-review",
    "provider_key_id": "gK2nW...9pQ"
  }
}

Rate limits

Two limits are enforced. Both use per-minute Firestore-transactional counters.

Per key — rpmLimit on the key, or the workspace defaultRpmLimit (default 60).
Per workspace — workspaceRpmLimit (default 600).

Exceeding either limit returns HTTP 429 with error.code: "rate_limited". The message includes seconds until the minute bucket rolls over.

Error codes

code	HTTP	Cause
`unauthenticated`	401	Missing or revoked API key.
`forbidden`	403	`gateway.enabled` is false for this workspace.
`invalid_request`	400, 413	Missing `model` or `messages`, unsupported model, prompt over 60000 chars.
`provider_not_configured`	400	No active provider key for the resolved provider.
`rate_limited`	429	Per-key or per-workspace rpm exceeded.
`provider_error`	502	Upstream provider returned non-2xx. `error.message` contains the provider message.
`internal`	500	Gateway failure. Include `request_id` when reporting.

Logging

One row per request is written to gatewayLogs. Fields:

workspaceId, apiKeyId, service, requestId
provider, model, decision, categories, score, hardBlock, sanitised
maskedPreview — always written
encryptedFullPrompt, encryptedSanitisedPrompt — only when gateway.logMode = "full", AES-256-GCM
timings: policyMs, rewriteMs, providerMs, totalMs
providerError — set when the upstream call failed

Revealing a full prompt writes an audit row to gatewayRevealLogs with the admin user id and timestamp. Log retention is not yet enforced.

Examples

curl

bash

curl -X POST \
  https://www.ventrin.com/v1/chat/completions \
  -H "Authorization: Bearer $VENTRIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{ "role": "user", "content": "Summarise arbitration in two sentences." }],
    "metadata": { "service": "legal-assistant" }
  }'

Node.js

javascript

const res = await fetch(
  "https://www.ventrin.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      "content-type": "application/json",
      "authorization": `Bearer ${process.env.VENTRIN_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: userMessage }],
      metadata: { service: "legal-assistant", request_id: crypto.randomUUID() },
    }),
  },
);
const data = await res.json();

switch (data.status) {
  case "allowed":
  case "warn":
    return data.provider_response;
  case "sanitised":
    return { answer: data.provider_response, rewritten: data.sanitised_prompt };
  case "blocked":
    return { refused: true, message: data.message, retry: data.suggested_safe_version };
  case "error":
    throw Object.assign(new Error(data.error.message), { code: data.error.code });
}

Python

python

import os, requests, uuid

r = requests.post(
    "https://www.ventrin.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['VENTRIN_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": user_message}],
        "metadata": {"service": "intake", "request_id": str(uuid.uuid4())},
    },
    timeout=60,
)
data = r.json()

Streaming

Set "stream": true in the request body. The response is Server-Sent Events (text/event-stream).

Ordering:

Policy is evaluated on the full request before any bytes are streamed. If the decision is blocked, the server returns a normal JSON response (HTTP 200, non-streaming).
If the decision permits forwarding, the server emits a single Ventrin event carrying the decision envelope, then pipes the provider's SSE events through unchanged.

sse

event: ventrin.decision
data: {"request_id":"req_019f","decision":{"status":"sanitised","score":72,"categories":["Email address"],"reason":"Rewritten to remove sensitive detail.","hardBlock":false,"thresholds":{"warn":10,"sanitise":40,"block":85},"packIds":["general"]},"provider":{"name":"openai","model":"gpt-4o-mini"},"sanitised_prompt":"Please email the client to confirm.","transformations":[...]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" there"}}]}

data: [DONE]

On upstream failure the server emits:

sse

event: ventrin.error
data: {"message":"upstream read error"}

Consume the ventrin.decision event first to surface the status to the user, then parse the provider events with your existing SDK's streaming parser.

Limitations

Policy scans the latest user message only. Prior turns in messages are forwarded as-is.
tools, tool_choice, response_format are dropped before forwarding.
Token-count context window is not validated; oversize inputs return provider_error from upstream.
Provider errors are returned verbatim via provider_error. Content-filter refusals from the provider are distinct from Ventrin blocked.
Token usage is not written to gatewayLogs; reconcile with the provider's usage export by timestamp.
Log retention is unbounded.
Webhooks, email, and Slack notifications are not implemented.
Single region (europe-west1).

Changelog

v1.0 — 2026-04-23: Initial release. POST /v1/chat/completions, five status values, streaming, OpenAI / Anthropic / Gemini routing, API key and provider key management, audit logs, rate limits.

Ventrin API

Overview

Base URL

Authentication

POST /v1/chat/completions

Request body

Response body

allowed

sanitised

blocked

error

Status values

Policy decisions

Hard patterns

Weighted terms

Context boosters

Scoring

Provider routing

Selecting a specific provider credential

Rate limits

Error codes

Logging

Examples

curl

Node.js

Python

Streaming

Limitations

Changelog