Skip to content

Batch API

OpenAI-compatible asynchronous bulk inference. Submit many chat-completion or embedding requests together and execute them asynchronously via per-item queue fan-out, then poll for status and download results as JSONL.

All endpoints are API-token authenticated (Authorization: Bearer cpeer_…) and use snake_case request/response fields, mirroring OpenAI's /v1/batches surface.

Supported endpoints

A batch runs every line against a single target endpoint:

/v1/chat/completions · /v1/embeddings

Streaming (body.stream: true) is rejected. Each line runs as a non-streaming request and consumes the submitting token's budget quota per item.

Status lifecycle

Batch (status):

StatusMeaning
validatingReserved pre-run state.
in_progressItems are queued/executing. Batches start here on create.
completedAll items finished (terminal).
failedBatch-level failure (terminal).
cancellingCancellation requested; in-flight items drain.
cancelledCancellation finished (terminal).

Item (status in items/results): pending · running · succeeded · failed · cancelled.

The batch is finalized automatically once completed + failed + cancelled items reach the total: to completed normally, or to cancelled if a cancel was requested.


Create

POST /api/client/v1/batches

Provide requests inline via requests, or a JSONL object stored in a Document Store bucket via input_file — not both. The whole submission is rejected on the first invalid line.

json
{
  "endpoint": "/v1/chat/completions",
  "requests": [
    {
      "custom_id": "req-1",
      "body": {
        "model": "gpt-4o-mini",
        "messages": [{ "role": "user", "content": "Hello" }]
      }
    }
  ],
  "completion_window": "24h",
  "metadata": { "job": "nightly-summaries" }
}

Parameters

FieldTypeRequiredDescription
endpointstringYesTarget endpoint: /v1/chat/completions or /v1/embeddings.
requestsobject[]ConditionalInline request lines. Each is { custom_id?, body }; if no body key, the object itself is treated as the body. Required unless input_file is given.
input_fileobjectConditionalJSONL source in a bucket: { bucket_key, object_key }. Mutually exclusive with requests.
output_bucket_keystringNoWhen set, the result JSONL is written to this bucket on completion (object_key filled in by the finalizer).
completion_windowstringNoInformational (OpenAI compat); defaults to 24h. Items run as soon as workers are free.
metadataobjectNoArbitrary key/value metadata echoed back on the batch.

Each request line body must include a model. For /v1/chat/completions, body.messages (array) is required; for /v1/embeddings, body.input is required.

JSONL input lines follow OpenAI's format — { custom_id?, method?, url?, body }. When present, url must match endpoint and method must be POST. The maximum number of request lines per batch is 10,000 (configurable).

Response

201 Created with the batch object.

json
{
  "id": "665f…",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "status": "in_progress",
  "completion_window": "24h",
  "input_file": null,
  "output_file": null,
  "error_message": null,
  "request_counts": { "total": 1, "completed": 0, "failed": 0, "cancelled": 0 },
  "usage": { "input_tokens": 0, "output_tokens": 0, "total_tokens": 0 },
  "metadata": { "job": "nightly-summaries" },
  "created_at": 1718409600,
  "started_at": 1718409600,
  "completed_at": null,
  "cancelled_at": null
}

input_file / output_file are { bucket_key, object_key } when present, otherwise null. Timestamps are Unix seconds (or null).


List

GET /api/client/v1/batches?status=in_progress&limit=50
QueryTypeDescription
statusstringOptional status filter.
limitnumberPage size, default 50, clamped to 1–500.
json
{ "object": "list", "data": [ /* batch objects */ ] }

Retrieve (status)

GET /api/client/v1/batches/:batchId

Returns the batch object (same shape as create). Poll this for status and request_counts. 404 if not found.


Cancel

POST /api/client/v1/batches/:batchId/cancel

Cooperative cancel: pending items drain as cancelled without running; items already running finish normally. The batch moves to cancelling and is finalized to cancelled once the counters drain. Only in_progress / validating batches can be cancelled (otherwise 400). Returns the updated batch object.


Items

GET /api/client/v1/batches/:batchId/items?status=failed&limit=100&skip=0

Per-line execution status.

QueryTypeDescription
statusstringOptional item status filter.
limitnumberDefault 100, clamped to 1–1000.
skipnumberOffset for pagination.
json
{
  "object": "list",
  "data": [
    {
      "id": "665f…",
      "object": "batch.item",
      "index": 0,
      "custom_id": "req-1",
      "status": "succeeded",
      "response_status_code": 200,
      "response_body": { "id": "chatcmpl-…", "choices": [ /* … */ ] },
      "error_message": null,
      "usage": { "input_tokens": 12, "output_tokens": 8, "total_tokens": 20 },
      "started_at": 1718409601,
      "ended_at": 1718409603
    }
  ]
}

usage is null until the item runs. 404 if the batch is not found.


Results

GET /api/client/v1/batches/:batchId/results?status=succeeded

Returns finished items (succeeded and failed) as a JSONL document (Content-Type: application/jsonl), in OpenAI batch-output format — one JSON object per line:

json
{"id":"batch_req_665f…","custom_id":"req-1","response":{"status_code":200,"body":{ /* … */ }},"error":null}
{"id":"batch_req_6660…","custom_id":"req-2","response":{"status_code":500,"body":null},"error":{"code":"failed","message":"…"}}

On success, response.body holds the model output and error is null. On failure, response.body is null and error is { code, message } where code is the item status. The optional status query filters which item statuses are included. 404 if the batch is not found.


Errors

StatusDescription
400Validation error — bad endpoint, both/neither of requests/input_file, missing body.model, missing messages/input, streaming requested, unreadable input file, empty batch, exceeds max requests, or cancelling a non-cancellable batch.
401Missing or invalid API token.
404Batch not found.
500Internal error.

Example

Create a batch:

bash
curl -X POST https://gateway.example.com/api/client/v1/batches \
  -H "Authorization: Bearer cpeer_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "endpoint": "/v1/chat/completions",
    "requests": [
      { "custom_id": "req-1", "body": { "model": "gpt-4o-mini", "messages": [{ "role": "user", "content": "Summarize: ..." }] } },
      { "custom_id": "req-2", "body": { "model": "gpt-4o-mini", "messages": [{ "role": "user", "content": "Translate: ..." }] } }
    ]
  }'

Poll for completion:

bash
curl https://gateway.example.com/api/client/v1/batches/665f... \
  -H "Authorization: Bearer cpeer_your_token"
# -> { "status": "completed", "request_counts": { "total": 2, "completed": 2, ... } }

Download results (JSONL):

bash
curl https://gateway.example.com/api/client/v1/batches/665f.../results \
  -H "Authorization: Bearer cpeer_your_token"

Community edition is AGPL-3.0. Commercial licensing and support are available separately.