Skip to main content

Document Engine

Upload Any Document. AI Reads the Rest.

PDFs, Word files, CSVs, scanned images — uploaded, parsed, chunked, and indexed automatically. Your team searches content, not file systems.

Capabilities

From raw files to searchable knowledge in minutes.

Every Format That Matters

PDF, DOCX, CSV, XLSX, and scanned images via OCR. One upload endpoint handles all of them.

Structure-Aware Chunking

Documents split at paragraph and section boundaries — never mid-sentence, never across page breaks.

Background Processing

Upload and walk away. Documents process asynchronously with real-time status updates in the dashboard.

Vector Indexing

1024-dimension embeddings stored in pgvector with HNSW indexing — fast similarity search from day one.

Rich Metadata

File name, page numbers, chunk positions, upload timestamps — all preserved and searchable.

Per-Client Isolation

Each client's documents live in a separate database schema. Structural isolation, not just access control.

The Processing Pipeline

Four steps from raw file to searchable knowledge.

1

Upload

Drop files via the dashboard or send them by email. The API accepts single files or batches.

2

Parse

Unstructured.io extracts text, tables, and structure from any supported format — including OCR for scans.

3

Chunk

Our RecursiveChunker splits content at natural boundaries, preserving headings, lists, and page context.

4

Index

Mistral Embed generates vectors. pgvector stores them with HNSW indexing for sub-second retrieval.

Specs at a Glance

For the engineers doing due diligence.

Supported Formats

PDF, DOCX, CSV, XLSX, images (OCR)

Embeddings

1024 dimensions, Mistral Embed

Vector Index

HNSW via pgvector — works on empty tables

Processing

Async with real-time status via API

Chunking

Recursive, structure-aware, page-boundary safe

Storage

PostgreSQL per-client schema isolation

See document processing in action.

Upload a sample document during your demo — watch it indexed and searchable in real time.

Document Engine | AI Loopwise