The MIVAA platform extracts 200+ metadata fields from PDF catalogs using AI-powered dynamic discovery. All metadata is organized into 9 functional categories and stored in the products.metadata JSONB field in the database.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Stage 0: Product Discovery & Metadata Extraction β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β 0A: Product Discovery (Claude/GPT) β β βββ Identify product names β β βββ Extract page ranges β β βββ Extract basic metadata (designer, dimensions) β β βββ Classify content by category β β β β 0B: Metadata Enrichment (DynamicMetadataExtractor) β β βββ For each discovered product: β β β βββ Extract product-specific text from PDF β β β βββ Call DynamicMetadataExtractor (Claude/GPT) β β β βββ Extract 200+ fields across 9 categories β β β βββ Merge with discovery metadata β β β β β βββ Store enriched products in database β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
claude-sonnet-4-5)gpt-4o)When merging metadata from multiple sources, the system uses this priority:
Discovery Metadata (Highest Priority)
Critical Metadata (High Priority)
material_category, factory_name, factory_group_nameDiscovered Metadata (Standard Priority)
Purpose: Physical and structural characteristics of the material
Fields (11 total):
material_type - Type of material (e.g., "ceramic", "porcelain", "wood")composition - Material composition (e.g., "100% ceramic", "oak wood")type - Specific type classificationblend - Material blend informationfiber_content - Fiber composition (for textiles)texture - Surface texture (e.g., "smooth", "rough", "embossed")finish - Surface finish (e.g., "matte", "glossy", "satin")pattern - Pattern type (e.g., "wood grain", "marble veins")weight - Material weight (e.g., "800 kg/mΒ³")density - Material densitydurability_rating - Durability classificationPurpose: Physical measurements and sizing information
Fields (8 total):
size - Overall size (e.g., "15Γ38 cm", "20Γ40 cm")length - Length measurementwidth - Width measurementheight - Height measurementthickness - Thickness (e.g., "8mm", "10mm")diameter - Diameter (for circular products)area - Surface area (e.g., "0.57 mΒ²")volume - Volume measurementPurpose: Visual and aesthetic characteristics
Fields (7 total):
color - Color name (e.g., "beige", "white", "gray")color_code - Color code (e.g., "RAL 9010", "#F5F5DC")gloss_level - Gloss percentage (e.g., "60%", "matte")sheen - Sheen level (e.g., "satin", "semi-gloss")transparency - Transparency levelgrain - Grain pattern (e.g., "wood grain", "marble veins")visual_effect - Special visual effectsPurpose: Regulatory compliance and environmental certifications
Fields (6 total):
certifications - Certifications held (e.g., "ISO 9001:2015", "CE certified")standards - Standards compliance (e.g., "EN 14411", "ISO 10545")eco_friendly - Eco-friendly status (true/false)sustainability_rating - Sustainability ratingvoc_rating - VOC (Volatile Organic Compounds) rating (e.g., "low VOC", "zero VOC")safety_rating - Safety ratingPurpose: Design attribution and aesthetic classification
Fields (6 total):
designer - Designer name (e.g., "SG NY", "Patricia Urquiola")studio - Design studiocollection - Collection name (e.g., "Harmony Collection", "Urban Series")series - Series nameaesthetic_style - Aesthetic style (e.g., "contemporary", "minimalist", "rustic")design_era - Design era (e.g., "modern", "vintage")Purpose: Production and sourcing information
Fields (6 total):
factory - Factory name (e.g., "CastellΓ³n Factory")manufacturer - Manufacturer namefactory_group - Factory group/parent company (e.g., "Harmony Group")country_of_origin - Country of origin (e.g., "Spain", "Italy")manufacturing_process - Manufacturing process descriptionconstruction - Construction methodPurpose: Business and commercial information
Fields (5 total):
pricing - Price information (e.g., "β¬45/mΒ²", "$50/sqft")availability - Availability status (e.g., "in stock", "made to order")supplier - Supplier namesku - SKU/product codewarranty - Warranty information (e.g., "5-year warranty", "lifetime warranty")All metadata is stored in the products table in the metadata JSONB field. The products table has columns: id (UUID), sku, name, description, category, type, status, metadata (JSONB β all 200+ metadata fields), properties (JSONB), specifications (JSONB), created_at, and updated_at.
A complete product record has a metadata JSONB field containing fields from all 9 categories: material properties (material_type, composition, texture, finish, pattern, weight, density), dimensions (size, thickness, area), appearance (color, color_code, gloss_level, grain), performance (water_absorption, fire_rating, slip_resistance, wear_rating, breaking_strength), application (recommended_use, installation_method, room_type, traffic_level), compliance (certifications, standards, eco_friendly, voc_rating), design (designer, collection, aesthetic_style), manufacturing (factory, factory_group, country_of_origin), commercial (pricing, availability, warranty), and _extraction_metadata (extraction_timestamp, extraction_method, model_used, confidence_score, validation_passed).
Endpoint: POST /api/rag/process-pdf
Upload a PDF file with extract_categories parameter. The response contains a job_id, status, message, products_discovered count, and metadata_extraction status.
Endpoint: GET /api/products/{product_id}
Returns the product record with its complete metadata object containing all extracted fields.
Endpoint: POST /api/search/products
Send a filters object with dot-notation keys like "metadata.slip_resistance": "R11", "metadata.fire_rating": "A1", or "metadata.country_of_origin": "Spain" to filter products by their metadata values.
The frontend displays metadata organized by category in the ProductDetailModal component:
Location: src/components/AI/ProductDetailModal.tsx
Features:
Example UI: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β NOVA - Product Details β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β [Product Image] β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Material Properties β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Material Type: ceramic β β β β Texture: smooth β β β β Finish: matte β β β β Weight: 800 kg/mΒ³ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Dimensions β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Size: 15Γ38 cm β β β β Thickness: 8mm β β β β Area: 0.057 mΒ² β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Performance β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Slip Resistance: R11 β β β β Fire Rating: A1 β β β β Water Absorption: Class 3 β β β β Breaking Strength: 1200 N β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ... (6 more categories) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User uploads PDF β MIVAA API receives file β Job created
The ProductDiscoveryService analyzes the PDF and returns products with basic metadata including name, page_range, and initial fields (designer, dimensions, variants).
For each product, the system extracts product-specific text from the page range, initializes DynamicMetadataExtractor, and runs extraction to get 200+ fields organized into critical (material_category, factory_name, factory_group_name), discovered (all dynamic fields), and metadata (extraction tracking info).
Metadata is merged with this priority: discovered fields as base, then critical fields override, then discovery metadata (highest priority) overrides those, plus _extraction_metadata added separately.
The product record is stored with its complete metadata JSONB containing all 200+ fields.
The ProductDetailModal component reads the metadata object and renders each category section dynamically, showing only categories that have data.
Each extracted metadata field has a confidence score (0.0-1.0):
Confidence scores are stored alongside field values, tracking both the value and the source location (e.g., "page 6, line 23" or "inferred from image description").
Issue: Metadata not extracted
Issue: Incorrect metadata values
Issue: Missing metadata fields
Issue: Low confidence scores
Last Updated: 2025-01-12 Version: 2.0 (Comprehensive Metadata Extraction)
Purpose: Technical performance metrics and ratings
Fields (8 total):
water_resistance - Water resistance ratingwater_absorption - Water absorption class (e.g., "Class 3", "<0.5%")fire_rating - Fire resistance rating (e.g., "A1", "B-s1,d0")slip_resistance - Slip resistance (e.g., "R11", "R10")wear_rating - Wear resistance rating (e.g., "PEI 4", "Class 3")abrasion_resistance - Abrasion resistance leveltensile_strength - Tensile strength measurementbreaking_strength - Breaking strength (e.g., "1200 N")hardness - Material hardness (e.g., "Mohs 7")Purpose: Usage recommendations and installation guidance
Fields (6 total):
recommended_use - Recommended applications (e.g., "residential flooring", "wall cladding")application - Application typeinstallation_method - Installation method (e.g., "adhesive", "floating", "nailed")room_type - Suitable room types (e.g., "bathroom", "kitchen", "living room")traffic_level - Traffic level suitability (e.g., "high traffic", "residential")care_instructions - Care and maintenance instructionsmaintenance - Maintenance requirements