Workers

Two services run background workers that consume RabbitMQ queues. Workers are separate processes from the HTTP API servers.

Parser Service Worker

Location: parser-service/worker.py
Runtime: Python (uvloop event loop)
Queue: Consumes from parsing_jobs
Publishes to: parsing_results, parsing_progress

What it does

Picks up a parsing job from the queue
Validates the document metadata
Selects the appropriate parser (Azure Document Intelligence, PyMuPDF, MinerU, or Marker)
Parses the document content to markdown
Optionally extracts images and generates captions via the completion service
Publishes the parsed result back to parsing_results
Sends progress updates to parsing_progress during processing

Scaling

Multiple worker instances can run in parallel
Each worker handles one job at a time
Horizontal scaling: add more worker containers
Idempotency keys prevent duplicate processing

Embedding Service Worker

Location: embedding-service/worker.py
Runtime: Python (asyncio)
Queue: Consumes from embedding_jobs
Publishes to: embedding_results, embedding_progress

What it does

Picks up an embedding job from the queue
Chunks the parsed text using the configured method (recursive, semantic, fixed_size)
Optionally translates non-English chunks to English
Generates vector embeddings via OpenAI/Azure OpenAI embedding API
Stores chunks and embeddings in PostgreSQL (pgvector)
Publishes the result back to embedding_results
Sends progress updates to embedding_progress

Scaling

Multiple worker instances can run in parallel
Uses DLQ for failed messages
Retry with exponential backoff (configurable attempts and delays)
Each worker prefetches a configurable number of messages

Deployment

Both workers are deployed as separate containers from their respective HTTP API services. They share the same codebase but run different entry points:

Parser: python worker.py vs uvicorn app.main:app (API)
Embedding: python worker.py vs uvicorn app.main:app (API)

Parser Service Worker​

What it does​

Scaling​

Embedding Service Worker​

What it does​

Scaling​

Deployment​

Parser Service Worker

What it does

Scaling

Embedding Service Worker

What it does

Scaling

Deployment