Image Embedding Generation

Image embedding generation system with batching, retry logic, and checkpoint recovery for reliable CLIP embedding coverage.


Overview

The image embedding system generates visual embeddings for all processed images using CLIP models. The system includes batch processing, automatic retry with exponential backoff, and checkpoint recovery to ensure complete embedding coverage.

Features

1. Batch Processing

Implementation:

Benefits:

2. Retry Logic with Exponential Backoff

Implementation:

Benefits:

3. Checkpoint Recovery

Implementation:

Benefits:

4. Detailed Error Tracking

Implementation:

Benefits:

Implementation Details

New Methods

_get_embedding_checkpoint(document_id: str) -> Optional[int]

Queries the document_images table to count images with has_slig_embedding = TRUE for the given document (updated 2026-04 — the legacy visual_clip_embedding_512 column was dropped; VECS is now the single source of truth for image vectors, and per-image presence is tracked via boolean flags on document_images). Returns the count as an integer checkpoint index.

_process_single_image_with_retry(...) -> Tuple[bool, bool, Optional[str]]

Processes a single image with retry logic using a while loop up to max_retries attempts. On each failure, waits 2^retry_count seconds before retrying (exponential backoff). Returns a tuple of (image_saved, embedding_generated, error_message).

save_images_and_generate_clips(...) -> Dict[str, Any]

Main method with batching + retry + checkpointing. Signature: save_images_and_generate_clips(material_images, document_id, workspace_id, batch_size=20, max_retries=3). First checks the checkpoint to skip already-processed images, then processes remaining images in batches, calling _process_single_image_with_retry for each. Returns a dict with images_saved, clip_embeddings_generated, and failed_images.

Configuration

Default Parameters

Customization

All parameters are configurable via method arguments. For memory-constrained environments, use a smaller batch_size (e.g., 10). For unreliable networks, increase max_retries (e.g., 5).

Performance Impact

Before (Sequential Processing)

After (Batched with Retry)

Resource Usage

Testing Results

NOVA Test Case

Before Fix:

After Fix (Expected):

Error Handling

Retry Scenarios

  1. Network Timeout - Retries with exponential backoff
  2. API Rate Limit - Waits and retries
  3. Temporary Service Unavailable - Retries after delay
  4. Invalid Image Data - Fails after max retries, logs error

Permanent Failures

Images that fail after all retries are:

  1. Logged with detailed error messages
  2. Included in failed_images array
  3. Reported in final summary
  4. Can be manually retried later

Monitoring

Log Output

The log shows progress per batch: saving each image to DB with its UUID, generating CLIP embeddings per image, and batch completion messages. The final summary reports total images saved, total CLIP embeddings generated, and a list of failed images with their page numbers and error reasons (e.g., "Network timeout after 3 retries", "Invalid image format").

Integration

Pipeline Integration

The improved method is automatically used in the PDF processing pipeline at Stage 30: save-images-db (POST /api/internal/save-images-db/{job_id}), which calls save_images_and_generate_clips with the document's material images, document ID, and workspace ID.

Manual Usage

The service can also be called directly for reprocessing existing documents. After calling save_images_and_generate_clips, inspect the returned dict for clip_embeddings_generated, images_saved, and failed_images counts.

Understanding Embeddings (Qwen → Voyage AI)

Overview

Understanding embeddings capture the structured knowledge from Qwen3-VL's vision analysis. Rather than embedding the raw image pixels (which SLIG does), understanding embeddings embed the semantic description of what was detected: material types, colors, textures, dimensions, finishes, and OCR text.

How It Works

  1. Qwen3-VL Analysis → Produces structured JSON (vision_analysis) with material type, colors, textures, properties, OCR text
  2. JSON → Text Conversion → Converts structured fields into descriptive text (e.g., "Material: porcelain tile. Colors: white, grey. Texture: matte. Dimensions: 60x120cm.")
  3. Voyage AI Embedding → Embeds the text via voyage-3.5 with input_type="document" → 1024D vector
  4. VECS Storage → Stored in image_understanding_embeddings collection (1024D, HNSW index)

Search Flow

  1. Query → Embedded via Voyage AI with input_type="query" → 1024D vector
  2. VECS Search → Similarity search against understanding collection
  3. Score Fusion → Combined with 6 other embedding scores using weighted fusion

Pipeline Integration (updated 2026-04)

Benefits

Future Enhancements

  1. Parallel Batch Processing - Process multiple batches concurrently
  2. Adaptive Batch Size - Adjust batch size based on available memory
  3. Smart Retry Strategy - Different retry logic for different error types
  4. Automatic Reprocessing - Background job to retry failed images
  5. Metrics Dashboard - Real-time monitoring of embedding generation

Related Documentation