This plan outlines a comprehensive strategy for extracting metadata from multiple sources to ensure the highest quality and completeness of product metadata. The system will leverage:
text_1536 OpenAI)image_slig_embeddingsimage_color_embeddingsimage_texture_embeddingsimage_material_embeddingsimage_style_embeddingsimage_understanding_embeddingsLegacy 1152D SigLIP-SO400M and 512D CLIP collections, as well as the fused multimodal_2048 vector, were dropped in 2026-04.
The specialized embeddings are generated using text prompts that focus the model's attention: color embeddings focus on "color palette and color relationships", texture embeddings on "surface patterns and texture details", material embeddings on "material type and physical properties", and style embeddings on "design style and aesthetic elements".
Current Issue: Specialized embeddings (color, texture, material, style) are generated but NOT converted to text metadata. This means:
Impact: Search and filtering rely on text metadata, so visual information is not fully utilized.
┌─────────────────────────────────────────────────────────────┐ │ METADATA SOURCES │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. AI Text Extraction (Claude/GPT) │ │ ├─ Product Discovery (Stage 0) │ │ ├─ Dynamic Metadata Extractor (Stage 4) │ │ └─ Confidence: 0.85-0.95 │ │ │ │ 2. Visual Embedding Analysis (SigLIP) │ │ ├─ Color Embedding → Color Text │ │ ├─ Texture Embedding → Finish/Texture Text │ │ ├─ Material Embedding → Material Type Text │ │ ├─ Style Embedding → Design Style Text │ │ └─ Confidence: 0.75-0.90 │ │ │ │ 3. Pattern Matching (Chunks) │ │ ├─ Regex patterns for technical specs │ │ ├─ NLP extraction from chunk text │ │ └─ Confidence: 0.60-0.80 │ │ │ │ 4. Factory-Level Defaults │ │ ├─ Global metadata from factory documents │ │ ├─ Applied when product-specific data missing │ │ └─ Confidence: 0.50-0.70 │ │ │ │ 5. Manual Overrides (Admin) │ │ ├─ User-provided corrections │ │ └─ Confidence: 1.00 │ │ │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ METADATA CONSOLIDATION ENGINE │ ├─────────────────────────────────────────────────────────────┤ │ • Merge metadata from all sources │ │ • Resolve conflicts using confidence scores │ │ • Track extraction source for each field │ │ • Generate final product.metadata JSONB │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ FINAL PRODUCT METADATA │ ├─────────────────────────────────────────────────────────────┤ │ { │ │ "color": "beige", │ │ "finish": "matte", │ │ "material": "ceramic", │ │ "style": "modern minimalist", │ │ "slip_resistance": "R11", │ │ "_extraction_metadata": { │ │ "color": { │ │ "source": "visual_embedding", │ │ "confidence": 0.88, │ │ "alternatives": ["warm beige", "sand"] │ │ }, │ │ "slip_resistance": { │ │ "source": "ai_text_extraction", │ │ "confidence": 0.95 │ │ } │ │ } │ │ } │ └─────────────────────────────────────────────────────────────┘
File: mivaa-pdf-extractor/app/services/embedding_to_text_service.py
Purpose: Convert specialized embeddings to textual metadata
Method: Use SigLIP's text-image matching in reverse:
["beige", "warm tones", "sand"] with scores [0.92, 0.85, 0.78]Predefined Vocabularies:
File: mivaa-pdf-extractor/app/services/metadata_consolidation_service.py
Purpose: Merge metadata from all sources with conflict resolution
Algorithm: The consolidate_metadata(sources) function receives a dictionary of source names to their extracted metadata dictionaries. For each metadata field, it collects all candidate values from all sources along with their confidence scores (determined by source type). The candidate with the highest confidence becomes the final value, with alternatives tracked. The output is a flat metadata dictionary plus _extraction_metadata tracking source, confidence, and alternatives for each field.
Confidence Levels by Source:
| Source | Confidence Range | Use Case |
|---|---|---|
| Manual Overrides | 1.00 | Admin corrections |
| AI Text Extraction (Claude/GPT) | 0.85-0.95 | Explicit text in PDF |
| Visual Embedding Analysis | 0.75-0.90 | Image-based inference |
| Pattern Matching | 0.60-0.80 | Regex/NLP from chunks |
| Factory Defaults | 0.50-0.70 | Fallback values |
Confidence Modifiers:
Current Pipeline (9 stages):
New Stage 6.5: EMBEDDING_TO_TEXT_CONVERSION
Enhanced Stage 8: PRODUCTS_CREATED
metadata_vocabularyA new metadata_vocabulary table stores: id (UUID), field_name (e.g., 'color', 'texture', 'material', 'style'), value (e.g., 'beige', 'matte', 'ceramic', 'modern'), embedding (HALFVEC(768) — pre-computed SLIG / SigLIP2 embedding, updated 2026-04), category (e.g., 'warm_colors', 'neutral_colors'), and synonyms (TEXT array). Indexes are created on field_name and on the embedding column using ivfflat with halfvec_cosine_ops.
products.metadata StructureThe enriched metadata includes the core fields (color, finish, material, style, slip_resistance, fire_rating), plus an _extraction_metadata dictionary tracking source, confidence, alternatives, and extraction timestamp for each field, plus _sources_used (array of source names used) and _overall_confidence (float).
Action: Create AI prompts for embedding interpretation and metadata consolidation
Prompts Created:
Embedding-to-Text Interpretation (stage: image_analysis, category: embedding_to_text) — Contains vocabulary of 50+ colors, 30+ finishes, 40+ materials, 25+ styles. AI interprets embedding patterns and returns structured JSON with confidence scoring 0.60–1.00.
Metadata Consolidation (stage: entity_creation, category: metadata_consolidation) — Priority order: manual > AI text > visual > pattern > factory defaults. Agreement bonus: +0.05 when sources agree. Conflict penalty: -0.10 when sources disagree. Returns consolidated metadata with extraction tracking.
File: mivaa-pdf-extractor/app/services/embedding_to_text_service.py
Architecture: Prompt-based AI interpretation (not vocabulary similarity search)
Key Methods:
convert_embeddings_to_metadata(image_id, embeddings) - Main conversion using AI_load_prompt() - Load prompt from database_calculate_cost(usage) - Track AI costsHow It Works:
prompts table (category: embedding_to_text)File: mivaa-pdf-extractor/app/services/metadata_consolidation_service.py
Architecture: Prompt-based AI consolidation (not hardcoded rules)
Key Methods:
consolidate_metadata(product_id, sources, existing_metadata) - Main consolidation using AI_load_prompt() - Load prompt from database_calculate_cost(usage) - Track AI costsHow It Works:
prompts table (category: metadata_consolidation)Migration: Added JSONB column visual_metadata (default {}) to document_images to store AI-extracted metadata from embeddings. The structure contains per-field objects with primary value, secondary alternatives array, and confidence score.
Stage 3.5: Embedding-to-Text Conversion (added to stage_3_images.py)
EmbeddingToTextService to convert to textdocument_images.visual_metadataStage 4: Metadata Consolidation (modified stage_4_products.py)
MetadataConsolidationService to merge intelligentlyproducts.metadataProduct metadata only contains explicitly stated fields like designer, dimensions, and slip_resistance — missing color, finish, material, and style even though the visual embeddings for those properties exist.
Product metadata includes all the above plus visually-derived fields (color, finish, material, style, texture), each tracked in _extraction_metadata with their source and confidence, and an _overall_confidence summary score.
Gained: 5 additional metadata fields from visual analysis!
Original Plan: Use vocabulary database with similarity search Actual Implementation: Use AI with database prompts (follows platform standards)
Why Changed:
Test End-to-End with Harmony PDF
Monitor Performance
Iterate on Prompts