Document Service
Orchestrates the full document lifecycle: upload to blob storage, trigger parsing and embedding via RabbitMQ, track processing status, serve downloads, manage folders, and stream real-time status updates via SSE.
- Tech: NestJS 11, TypeORM, PostgreSQL, RabbitMQ, Azure Blob/S3
- Port: 4000
- Auth: JWT, API Key, Public
- Database: Shared document database (shared with parser-service, embedding-service, rag-service)
Document Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/v1/documents/init | JWT | Initialize a new document record (PENDING_UPLOAD) |
| POST | /api/v1/documents/upload | JWT, API Key | Stream-upload file to blob storage |
| POST | /api/v1/documents/process | JWT | Trigger parsing + embedding pipeline |
| GET | /api/v1/documents/tree | JWT | Get full document/folder tree for current user |
| GET | /api/v1/documents/bulk?ids= | JWT, API Key | Get multiple documents by IDs |
| GET | /api/v1/documents/stream?documentIds= | JWT | SSE stream for real-time processing status |
| POST | /api/v1/documents/accessible | JWT | Filter document IDs to only accessible ones |
| GET | /api/v1/documents/:documentId | JWT | Get single document |
| GET | /api/v1/documents/:documentId/status | JWT | Get processing status |
| GET | /api/v1/documents/:documentId/sas-token | API Key | Generate SAS token for blob storage |
| GET | /api/v1/documents/:documentId/download | JWT | Download original file |
| GET | /api/v1/documents/:documentId/base64 | Public | Get document as base64 |
| PUT | /api/v1/documents/:documentId | JWT | Update document metadata |
| DELETE | /api/v1/documents/:documentId | JWT | Soft-delete |
| PUT | /api/v1/documents/bulk | JWT | Bulk update |
| DELETE | /api/v1/documents/bulk | JWT | Bulk soft-delete |
Document Status Flow
PENDING_UPLOAD --> UPLOADED --> PROCESSING --> PROCESSED
|
v
FAILED
POST /api/v1/documents/process
Triggers the processing pipeline for an uploaded document:
- Publishes a job to the
parsing_jobsRabbitMQ queue (or falls back to HTTPPOST /parser/parseif RabbitMQ is unavailable) - Parser service worker processes the document and publishes results to
parsing_results - On successful parsing, document-service triggers auto-summarization via completion-service
- Publishes an embedding job to
embedding_jobs - Embedding service worker processes and publishes results to
embedding_results - Document status changes to PROCESSED
GET /api/v1/documents/stream
SSE endpoint. Client subscribes with document IDs and receives real-time updates as parsing and embedding progress.
Folder Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/v1/folders | JWT | Create folder |
| GET | /api/v1/folders | JWT | List folders (filter by parentId, folderType) |
| GET | /api/v1/folders/roots | JWT | Get root-level folders |
| GET | /api/v1/folders/bulk?ids= | -- | Get folders by IDs |
| POST | /api/v1/folders/bulk | JWT | Bulk create |
| DELETE | /api/v1/folders/bulk | -- | Bulk soft-delete |
| DELETE | /api/v1/folders/bulk/hard | -- | Bulk hard-delete |
| PUT | /api/v1/folders/bulk | JWT | Bulk update |
| GET | /api/v1/folders/:id | -- | Get folder |
| GET | /api/v1/folders/:parentId/children | -- | Get folder children |
| PUT | /api/v1/folders/:id | -- | Update folder |
| DELETE | /api/v1/folders/:id | -- | Soft-delete |
Folder types: document, agent, default.
Chunk Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/v1/chunks/:documentId/content-at-index/:chunkIndex | JWT | Get specific chunk content |
Export Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/v1/export | -- | Export document as PDF or DOCX (returns binary stream) |
Uses Puppeteer for PDF generation and docx library for DOCX generation.
Parsing Technique Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/v1/parsing-techniques | JWT | List all techniques |
| GET | /api/v1/parsing-techniques/enabled | JWT | List enabled only |
| GET | /api/v1/parsing-techniques/:id | JWT | Get by ID |
| POST | /api/v1/parsing-techniques/by-ids | JWT | Get by IDs |
RabbitMQ (Producer and Consumer)
Produces
| Queue | Trigger | Payload |
|---|---|---|
parsing_jobs | POST /documents/process | Document ID, file content, parser method, options |
embedding_jobs | After parsing completes | Document ID, parsed text, chunking config |
Consumes
| Queue | Handler | Action |
|---|---|---|
{prefix}-document-upload | Processes document upload messages | |
parsing_results | Updates document, stores chunks, triggers embedding | |
parsing_progress | Updates SSE stream | |
embedding_results | Marks document as PROCESSED | |
embedding_progress | Updates SSE stream |
Inter-Service Communication
| Target | Protocol | Purpose |
|---|---|---|
| parser-service | RabbitMQ (primary), HTTP (fallback) | Document parsing |
| embedding-service | RabbitMQ | Embedding generation |
| completion-service | HTTP | Document summarization |
| user-service | HTTP | Access control (shared-with-me, bulk users, check-access) |
| admin-base-ms | HTTP | Org parsing technique settings |