The Material KAI Vision Platform uses a unified job tracking system across all data import pipelines:
All jobs are tracked in the background_jobs table with links to specialized tables for each job type.
Primary table for all background jobs across the platform.
Key columns: id, job_type ('pdf_processing', 'web_scraping', 'xml_import'), status ('pending', 'processing', 'completed', 'failed'), progress (0-100), current_stage, last_heartbeat (updated every 30s during processing), document_id (for PDF jobs), filename (for PDF jobs), metadata JSONB (job-specific data), created_at, started_at, completed_at, failed_at, updated_at, error, and retry_count.
Tracks web scraping sessions with page-level details.
Key columns: id, background_job_id (references background_jobs), source_url, status ('pending', 'processing', 'scraping', 'completed', 'failed'), total_pages, completed_pages, failed_pages, materials_processed, progress_percentage, scraping_config JSONB (service, max_pages, categories, model), created_at, updated_at, and error_message.
Tracks XML import jobs with product-level details.
Key columns: id, background_job_id (references background_jobs), source_name, import_type ('xml', 'csv', 'json'), status ('pending', 'processing', 'completed', 'failed'), total_products, processed_products, failed_products, field_mappings JSONB (XML field to DB field mappings), created_at, updated_at, and error_message.
Tracks webhook/API calls made during job processing.
Key columns: id, job_id (links to background_jobs or data_import_jobs), job_type, webhook_url, request_body JSONB, response_status, response_body JSONB, response_time_ms, status ('pending', 'success', 'failed', 'retrying'), retry_count, next_retry_at, created_at, completed_at, and error_message.
pending → processing → completed/failed ↓ (9 checkpoint stages)
Stages:
pdf_loaded - PDF file loadedtext_extracted - Text extraction completetiles_generated - Image tiles createdembeddings_created - Vector embeddings generatedmaterials_extracted - Materials discovered and savedMonitoring:
pending → processing → scraping → completed/failed ↓ (page-by-page processing)
Flow:
Monitoring:
pending → processing → completed/failed ↓ (product-by-product processing)
Flow:
Monitoring:
/admin/async-queue-monitor)Currently shows only PDF processing jobs.
Features:
/scraper)Dedicated UI for web scraping.
Features:
/admin/data-import)Handles PDF and XML imports.
Features:
Unified Job Monitor - Extend AsyncJobQueueMonitor to show all job types:
All job types can be queried through Supabase using the background_jobs table. Scraping sessions can be queried from scraping_sessions with a join to background_jobs. Import jobs can be queried from data_import_jobs with joins to both background_jobs and webhook_calls.
All job failures are automatically reported to Sentry with full context.
See monitoring-and-alerting.md for details.
All admin UIs use Supabase real-time subscriptions for live updates on the background_jobs table, automatically updating the UI when any job status changes.