Skip to content

OCR API

Extract text, tables, layout, and key/value pairs from PDFs or images.

client.ocr.extract(params)

Pass either a remote URL or a base64-encoded payload as the document field. When a base64 payload is supplied, the SDK sends the request as multipart/form-data.

typescript
import { readFileSync } from 'node:fs';
import { ConsoleClient } from '@cognipeer/console-sdk';

const client = new ConsoleClient({
  apiKey: process.env.COGNIPEER_API_KEY!,
  baseURL: process.env.COGNIPEER_BASE_URL,
});

// 1. From a buffer (multipart upload)
const result = await client.ocr.extract({
  model: 'cognipeer-ocr',
  document: {
    data: readFileSync('invoice.pdf').toString('base64'),
    fileName: 'invoice.pdf',
    contentType: 'application/pdf',
  },
  features: ['text', 'tables', 'kv_pairs'],
  pages: [1, 2],
});

console.log(result.text);

// 2. From a public URL
const remote = await client.ocr.extract({
  model: 'cognipeer-ocr',
  document: { url: 'https://example.com/receipt.png' },
  features: ['text'],
});

Parameters

NameTypeDescription
modelstringOCR model key.
document{ url: string; contentType?: string } | { data: string; fileName?: string; contentType?: string }Document source.
pagesnumber[]1-based page numbers to OCR.
languagestringSource language hint.
featuresArray<'text' | 'tables' | 'kv_pairs' | 'layout' | 'reading_order' | 'handwriting'>OCR features to enable.
promptstringOptional natural-language extraction instructions.

Response shape

typescript
interface OcrExtractResponse {
  text?: string;
  pages?: Array<{
    page: number;
    text?: string;
    blocks?: Array<Record<string, unknown>>;
  }>;
  language?: string;
  features?: Record<string, unknown>;
  request_id?: string;
}

Released under the MIT License.