Edge Function for parsing XML files and orchestrating product imports into Material-KAI platform. This function handles the initial XML parsing, validation, and job creation, then delegates to the Python API for batch processing.
┌─────────────────────────────────────────────────────────────┐ │ LAYER 1: DATA INGESTION (EDGE FUNCTION) │ │ ├─ Parse XML file (Deno XML parser) │ │ ├─ Validate structure │ │ ├─ Extract product elements │ │ ├─ Create data_import_jobs record │ │ └─ Return job_id to frontend │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ LAYER 2: DATA PROCESSING (PYTHON API) │ │ ├─ Batch process products (10 at a time) │ │ ├─ Download images (5 concurrent) │ │ ├─ Extract metadata (AI-based) │ │ ├─ Normalize to NormalizedProductData │ │ ├─ Queue for product creation │ │ └─ Update job status in real-time │ └─────────────────────────────────────────────────────────────┘
The function supports multiple common XML schemas. Each format uses a different root and item element name (e.g., <products>/<product>, <items>/<item>, <materials>/<material>), and each item may contain varying field names for name, manufacturer, category, description, and image URLs.
Each product must have:
POST /functions/v1/xml-import-orchestrator
The request body is a JSON object containing: workspace_id (UUID), category (e.g., "materials"), xml_content (base64-encoded XML string), and optionally source_name (e.g., "supplier_catalog.xml").
The success response contains: success (true), job_id (UUID), message confirming the import job was created and processing started, and total_products count.
The error response contains: success (false) and an error message describing the failure (e.g., "Product validation failed: Product 1: Missing factory_name").
After receiving the job_id, track progress by querying the data_import_jobs table from Supabase, selecting all fields for the given job ID. The record exposes status (pending, processing, completed, failed) and progress as processed_products divided by total_products.
Tracks import job status and progress. Query the table with fields id, status, total_products, processed_products, failed_products, created_at, and completed_at, filtering by workspace_id and ordering by created_at descending.
Tracks individual product imports. Query with fields id, job_id, product_id, processing_status, source_data, and normalized_data, filtering by job_id.
The function validates:
Common errors:
Missing required parameters - Check request bodyAuthentication failed - Check authorization headerXML parsing error - Invalid XML formatProduct validation failed - Missing required fieldsNo product elements found - Unsupported XML schemaFor larger files, split into multiple smaller files.
Required in Supabase Edge Function settings:
SUPABASE_URL - Supabase project URLSUPABASE_SERVICE_ROLE_KEY - Service role key for database accessPYTHON_API_URL - Python API endpoint (default: https://v1api.materialshub.gr)After job creation:
See Python API documentation for details on batch processing.
XML Import implements complete production hardening for reliability and monitoring:
Every product, chunk, and image is tagged with source information. When inserting records into the products, document_chunks, and document_images tables, each record includes source_type: 'xml_import' and source_job_id linking to the originating import job.
Benefits:
Updates last_heartbeat field every batch (10 products) to detect stuck jobs. The background_jobs table record is updated with the current timestamp, current progress percentage, and processing counts (processed, failed, total).
Implementation:
data_import_service.py line 584Comprehensive error tracking and performance monitoring using Sentry transactions for the overall import job, breadcrumbs for each batch, and exception capture with full stack traces.
Features:
| Feature | Status | Details |
|---|---|---|
| Source Tracking | ✅ COMPLETE | All tables have source_type='xml_import' and source_job_id |
| Heartbeat Monitoring | ✅ COMPLETE | Updates every batch (10 products), 30-minute stuck threshold |
| Sentry Tracking | ✅ COMPLETE | Transactions, breadcrumbs, exception capture |
| Error Handling | ✅ COMPLETE | Comprehensive try-catch with Sentry integration |
| Progress Tracking | ✅ COMPLETE | Real-time progress updates via background_jobs table |
| Checkpoint Recovery | ✅ COMPLETE | Resume from last successful batch |
| Auto-Recovery | ✅ COMPLETE | Automatic retry of stuck/failed jobs |