Async Processing & Concurrency Limits

Complete documentation for async processing architecture and concurrency limits across all product generation methods.


📋 Table of Contents

  1. Overview
  2. Async Architecture
  3. Concurrency Limits
  4. Timeout Configuration
  5. Rate Limiting
  6. Shared Services
  7. Performance Optimization

Overview

All three product generation methods (PDF Processing, Web Scraping, XML Import) use fully async processing with unified concurrency limits to ensure:

Key Principles

Fully Async: All I/O operations use async/awaitSemaphore-based: Concurrency controlled via asyncio.SemaphoreBatch Processing: Large datasets processed in batches ✅ Retry Logic: Automatic retry with exponential backoff ✅ Circuit Breakers: Prevent cascading failures ✅ Timeout Guards: Prevent infinite hangs


Async Architecture

1. Main Processing Flow

All three methods use AsyncQueueService for background job processing. PDF Processing uses process_pdf_document(), Web Scraping uses process_scraping_session(), and XML Import uses process_import_job(). All are fully async and use the same downstream services for chunking, embedding, and image processing.

2. Background Job Processing

The AsyncQueueService is shared across all three methods. It queues jobs for chunking, embedding generation, and product enrichment via its queue_ai_analysis_jobs() method.

3. Progress Tracking

All methods update the background_jobs table in real-time with the current job status, progress percentage, and metadata about the current stage.


Concurrency Limits

1. Image Classification (AI-based filtering)

Applies to: PDF, Web Scraping, XML Import Service: ImageProcessingService

Limit Value Purpose
Qwen Vision Concurrent 5 Fast material classification
Claude Validation Concurrent 2 Validation for uncertain cases
Batch Size 15 images Memory optimization

Why these limits?


2. Image Upload to Storage

Applies to: PDF, Web Scraping, XML Import Service: ImageProcessingService

Limit Value Purpose
Concurrent Uploads 10 Supabase Storage upload limit

Why 10?


3. CLIP Embeddings Generation

Applies to: PDF, Web Scraping, XML Import Service: ImageProcessingService

Limit Value Purpose
Batch Size 20 images Memory optimization
Max Retries 3 Retry failed embeddings

Why 20?


4. Image Downloads (XML Import Only)

Applies to: XML Import Service: ImageDownloadService

Limit Value Purpose
Concurrent Downloads 5 Network bandwidth optimization
Max Retries 3 Retry failed downloads
Timeout 30 seconds Prevent hanging downloads
Max File Size 10 MB Prevent large file downloads

Why these limits?


5. Product Batch Processing (XML Import Only)

Applies to: XML Import Service: DataImportService

Limit Value Purpose
Batch Size 10 products Memory optimization
Image Downloads per Batch 5 concurrent Network optimization

Why 10 products?


6. PDF Processing Workers

Applies to: PDF Processing Service: PDFProcessor

Limit Value Purpose
Max Workers 2 Memory optimization (reduced from 4)
Pages per Worker 5 Batch size for page processing
Max Pages in Memory 10 2 workers × 5 pages

Why 2 workers?


Timeout Configuration

1. Product Discovery Timeouts

Applies to: PDF, Web Scraping Service: ProductDiscoveryService

Operation Timeout Purpose
Product Discovery 300s (5 min) AI analysis of full document
Per-product Extraction 60s Individual product metadata

2. PDF Extraction Timeouts

Applies to: PDF Processing Service: PDFProcessor

Operation Timeout Purpose
Full PDF Extraction 7200s (2 hours) Large PDFs with OCR
Per-page Extraction Dynamic Based on file size

The per-page timeout is calculated dynamically: max(300, file_size_mb * 10 + num_pages * 5). This means a small PDF (10 pages, 5MB) gets ~300s, while a large PDF (500 pages, 50MB) gets ~3000s (50 min).


3. Image Download Timeouts

Applies to: XML Import Service: ImageDownloadService

Operation Timeout Purpose
Per-image Download 30s Single image download

4. AI Classification Timeouts

Applies to: PDF, Web Scraping, XML Import Service: QwenEndpointService, AIClientService

Operation Timeout Purpose
Qwen Vision Request 120s Image classification
Claude Request 120s Validation

Rate Limiting

1. HuggingFace Endpoint (Qwen3-VL Vision)

Applies to: PDF, Web Scraping, XML Import Service: QwenEndpointService

Limit Value Purpose
Requests per Minute 10 API rate limit
Burst Limit 5 Short-term burst

2. Claude API (Circuit Breaker)

Applies to: PDF, Web Scraping, XML Import Service: CircuitBreaker

Limit Value Purpose
Failure Threshold 5 Open circuit after 5 failures
Recovery Timeout 60s Try again after 60s

3. Image Export Rate Limiting

Applies to: All methods Service: images.py API

Limit Value Purpose
Exports per Hour 5 Prevent abuse

Shared Services

All three methods use the SAME services with SAME limits:

1. ImageProcessingService

Used by: PDF, Web Scraping, XML Import

Shared limits across all methods: 5 concurrent HuggingFace Endpoint (Qwen3-VL) requests, 2 concurrent Claude requests, 10 concurrent uploads, classification batch size of 15 images, and CLIP batch size of 20 images.


2. RealEmbeddingsService

Used by: PDF, Web Scraping, XML Import

Uses SigLIP2 via the SLIG cloud endpoint (HuggingFace Inference Endpoint, 768D) and generates five specialized 768D embedding types written directly to VECS: image_slig_embeddings (visual), image_color_embeddings, image_texture_embeddings, image_style_embeddings, and image_material_embeddings. Plus an understanding embedding (1024D Voyage AI from Qwen3-VL vision_analysis) → image_understanding_embeddings. Updated 2026-04: legacy google/siglip-so400m-patch14-384 (1152D) and CLIP 512D collections were dropped.


3. AsyncQueueService

Used by: PDF, Web Scraping, XML Import

Shared background job processing for chunking, embedding generation, and product enrichment.


4. ChunkingService

Used by: PDF, Web Scraping, XML Import

Shared chunking logic with chunk size of 1000 characters and overlap of 200 characters.


Performance Optimization

1. Memory Optimization

Batch Processing: All methods process data in batches to prevent OOM

Method Batch Size Memory Impact
PDF Image Classification 15 images ~500MB per batch
CLIP Embeddings 20 images ~300MB per batch
XML Product Import 10 products ~200MB per batch

2. Network Optimization

Concurrent Downloads: Controlled via semaphores

Operation Concurrency Throughput
Image Downloads (XML) 5 concurrent ~5 images/sec
Image Uploads 10 concurrent ~10 images/sec

3. API Optimization

Rate Limiting: Prevent API throttling

API Limit Strategy
HuggingFace (Qwen3-VL) 10 req/min Semaphore (5 concurrent)
Claude Circuit breaker Semaphore (2 concurrent)
OpenAI No limit Batch processing

Comparison Table

Async Processing

Feature PDF Web XML
Main Processing ✅ Fully async ✅ Fully async ✅ Fully async
Background Jobs ✅ AsyncQueueService ✅ AsyncQueueService ✅ AsyncQueueService
Product Discovery ✅ Async + timeout ✅ Async + timeout ✅ Async (queued)
Image Processing ✅ Async + semaphores ✅ Async + semaphores ✅ Async + semaphores
Chunking ✅ Async ✅ Async ✅ Async (queued)
Embeddings ✅ Async ✅ Async ✅ Async (queued)

Concurrency Limits

Limit PDF Web XML
Qwen Vision 5 concurrent 5 concurrent 5 concurrent
Claude Validation 2 concurrent 2 concurrent 2 concurrent
Image Classification Batch 15 images 15 images 15 images
Image Uploads 10 concurrent 10 concurrent 10 concurrent
CLIP Batch 20 images 20 images 20 images
Image Downloads N/A N/A 5 concurrent
Product Batch N/A N/A 10 products
PDF Workers 2 workers N/A N/A

Timeout Limits

Timeout PDF Web XML
Product Discovery 300s (5 min) 300s (5 min) N/A
PDF Extraction 7200s (2 hours) N/A N/A
Image Download N/A N/A 30s
AI Classification 120s 120s 120s

Best Practices

1. Monitoring

Always log batch progress including the current batch number, total batches, and progress percentage to enable real-time tracking.

2. Error Handling

Use try/except blocks with detailed logging around every batch processing call. On failure, log the error with context and continue processing the next batch rather than aborting the entire job.

3. Resource Cleanup

After each batch, explicitly delete the batch data reference and call the garbage collector to free memory, particularly important for large image batches.

4. Progress Tracking

Always update the background_jobs table after each stage with the current progress percentage and stage name so the frontend can display accurate real-time progress.


Summary

All methods fully async: PDF, Web Scraping, XML Import ✅ Same concurrency limits: 5 HuggingFace/Qwen, 2 Claude, 10 uploads, 20 SLIG ✅ Same timeout guards: 300s discovery, 120s AI, 30s downloads ✅ Same rate limiting: 10 req/min HuggingFace/Qwen, circuit breaker Claude ✅ Same shared services: ImageProcessingService, RealEmbeddingsService, AsyncQueueService ✅ Memory optimized: Batch processing prevents OOM ✅ Network optimized: Semaphores prevent congestion ✅ API optimized: Rate limiting prevents throttling

The architecture is unified, consistent, and production-ready! 🚀