MIVAA API Endpoints Reference

Last Updated: 2026-01-21 API Version: v2.6.0 Total Endpoints: 140+

Complete reference of all consolidated API endpoints with detailed usage information, database operations, and integration points.

Recent Updates (v2.6.0 - January 2026):

Previous Updates (v2.5.0 - December 30, 2025):

Previous Updates (v2.3.0 - November 22, 2025):

Previous Updates (v2.3.0 - Knowledge Base System):

Previous Updates (v2.2.0):

Total API Endpoints: 140+ endpoints across 19 categories


πŸ“‹ Table of Contents

✨ CONSOLIDATED ENDPOINTS (One Endpoint, One Purpose, No Duplicates)

  1. Core Endpoints - Health, Status
  2. Knowledge Base Routes - Document Management, Semantic Search, Categories, Attachments ✨ NEW v2.3.0
  3. RAG Routes - Document Upload, Search, Query (CONSOLIDATED)
  4. Admin Routes - Admin management
  5. Search Routes - Semantic, Vector, Hybrid Search (CONSOLIDATED)
  6. Document Entities Routes - Certificates, Logos, Specifications
  7. Products Routes - Product management
  8. Images Routes - Image processing
  9. Embeddings Routes - Embedding generation
  10. HuggingFace/Qwen Routes - Qwen3-VL integration
  11. Anthropic Routes - Anthropic integration
  12. Monitoring Routes - System monitoring
  13. AI Metrics Routes - AI performance metrics
  14. Duplicate Detection Routes - Duplicate detection and product merging
  15. Data Import Routes - XML import, web scraping, batch processing
  16. Job Health Routes - Job monitoring and health checks
  17. Suggestions Routes - Search suggestions and auto-complete
  18. Messaging Routes - SMS, WhatsApp via Twilio ✨ NEW v2.6.0

1. Core Endpoints

1.1 GET /health

Purpose: Unified health check for all MIVAA services Replaces: 10+ individual health check endpoints (/api/pdf/health, /api/rag/health, /api/search/health, etc.)

The response includes status, timestamp, per-service health details with response times (database, storage, AI models including Claude/GPT/QWEN, and RAG), and a version field.

Benefits:


2. Knowledge Base Routes ✨ NEW v2.3.0

Base Path: /api/kb Purpose: Document management system with AI embeddings, semantic search, and product attachments Philosophy: Complete knowledge base for documentation, guides, specifications, and product information

2.1 POST /api/kb/documents

Purpose: Create or upsert a knowledge base document with automatic embedding generation Used In: Knowledge Base admin panel, Documentation editor, Pricing doc ingestion Flow: User creates document β†’ Generate 1024D Voyage AI embedding β†’ Store in database

Upsert semantics (2026-04): if a doc with the same (workspace_id, title, category_id) already exists, this endpoint updates it in place and only re-embeds when content changes. Re-uploading a quarterly price list with the same title refreshes prices without creating duplicates.

Request fields: workspace_id, title, content, content_markdown, summary, category_id, seo_keywords, status, visibility, metadata, price_doc_type (optional, pricing category only β€” one of price_list | discount_rule | contract_terms | promotion)

Response fields: id, workspace_id, title, content, text_embedding, embedding_status, embedding_generated_at, embedding_model, created_at, view_count, price_doc_type

Database Operations:


2.2 GET /api/kb/documents/{doc_id}

Purpose: Retrieve a single knowledge base document by ID Used In: Document viewer, Edit modal

Request: GET /api/kb/documents/{doc_id}?workspace_id=uuid

Response fields: id, workspace_id, title, content, content_markdown, summary, category_id, embedding_status, created_at, updated_at, view_count


2.3 PATCH /api/kb/documents/{doc_id}

Purpose: Update document with smart embedding regeneration Smart Detection: Only regenerates embedding if content changed (title, content, summary, keywords, category) Used In: Document editor

Request fields: title, content, content_markdown, summary, category_id, seo_keywords, status, visibility, metadata, price_doc_type (2026-04)

Response fields: id, title, content, embedding_status, embedding_generated_at, updated_at, price_doc_type

Database Operations:


2.4 DELETE /api/kb/documents/{doc_id}

Purpose: Delete a knowledge base document Used In: Document management, Admin panel

Request: DELETE /api/kb/documents/{doc_id}?workspace_id=uuid

Response: 204 No Content

Database Operations:


2.5 POST /api/kb/documents/from-pdf

Purpose: Create document from PDF with text extraction Used In: PDF upload modal in Knowledge Base Flow: Upload PDF β†’ Extract text using PyMuPDF β†’ Generate embedding β†’ Store document

Request: Multipart form-data with fields: file (PDF), workspace_id, title, category_id (optional), status

Response fields: id, title, content (extracted text), embedding_status, created_at

Database Operations:


2.6 POST /api/kb/search

Purpose: Search knowledge base documents using semantic, full-text, or hybrid search Used In: Knowledge Base search interface, AI agent queries Flow: Frontend β†’ MIVAA API β†’ Generate query embedding β†’ Supabase vector search β†’ Return results

Architecture:

  1. Frontend calls MIVAA API with search query
  2. MIVAA generates embedding for query using Voyage AI voyage-3.5 (updated 2026-04)
  3. MIVAA calls Supabase kb_match_docs() RPC function with query embedding
  4. Supabase performs vector similarity search using pgvector <=> operator
  5. Returns ranked results with similarity scores

Why MIVAA Backend is Required:

Request fields: workspace_id, query, search_type (default semantic), limit, category_id (optional), category_slug (optional, e.g. "pricing"), price_doc_type (optional, pricing sub-type filter), allowed_access_levels (optional, defaults to admin+agent+public), require_published (default false for admin mgmt β€” set true to exclude drafts), match_threshold (default 0.5 for semantic)

Search Types:

Response fields: results (array with id, title, content, summary, category_id, category_slug, category_name, status, visibility, embedding_status, price_doc_type, similarity), search_time_ms, total_results

Database Operations:


2.7 POST /api/kb/categories

Purpose: Create a new category Used In: Category management UI, one-time Pricing category setup

Request fields: workspace_id, name, slug (recommended β€” used by search category filters, e.g. "pricing"), description, parent_category_id, color, icon, sort_order, access_level (admin | agent | public, default agent), trigger_keyword (optional agent gate)

Response fields: id, name, slug, access_level, trigger_keyword, description, color, icon, created_at

Pricing category seed: to enable the price_lookup agent tool, create a category with slug: "pricing", access_level: "admin", trigger_keyword: "price". Docs under this category accept the price_doc_type field on Β§2.1/Β§2.3.


2.8 GET /api/kb/categories

Purpose: List all categories for a workspace Used In: Category dropdown, Category management

Request: GET /api/kb/categories?workspace_id=uuid

Response: success and a categories array with fields: id, name, description, parent_category_id, color, icon, sort_order, document_count


2.9 POST /api/kb/attachments

Purpose: Attach a document to one or more products Used In: Product attachment modal

Request fields: workspace_id, document_id, product_id, relationship_type, relevance_score

Relationship Types:

Response fields: id, document_id, product_id, relationship_type, relevance_score, created_at


2.10 GET /api/kb/documents/{doc_id}/attachments

Purpose: Get all products attached to a document Used In: Document viewer, Product links section

Request: GET /api/kb/documents/{doc_id}/attachments?workspace_id=uuid

Response: success and attachments array with fields: id, product_id, product_name, relationship_type, relevance_score


2.11 GET /api/kb/products/{product_id}/documents

Purpose: Get all documents attached to a product Used In: Product page documentation tab

Request: GET /api/kb/products/{product_id}/documents?workspace_id=uuid

Response: success and documents array with fields: id, title, summary, relationship_type, relevance_score, view_count


2.12 GET /api/kb/health

Purpose: Health check for Knowledge Base service Used In: System monitoring

Request: GET /api/kb/health

Response fields: status, service, features (document_crud, embedding_generation, pdf_extraction, semantic_search, categories, attachments), endpoints


3. RAG Routes (CONSOLIDATED)

Base Path: /api/rag or /api/v1/rag Purpose: Core RAG (Retrieval-Augmented Generation) functionality for document processing, querying, and management Philosophy: One endpoint per function with parameters for different modes/strategies

2.1 POST /api/rag/documents/upload

Purpose: CONSOLIDATED upload endpoint for all document processing scenarios Replaces:

Used In: Main PDF upload modal, Product catalog processing, Simple document upload Flow: User uploads PDF β†’ AI discovery β†’ Category extraction β†’ Chunking β†’ Image processing β†’ Product creation

Request: Multipart form-data. Choose one source: file (PDF file) or file_url (URL to PDF). Additional parameters:

All uploads use deep processing mode with complete AI analysis, image embeddings (CLIP), advanced product enrichment, quality validation, and full RAG pipeline.

Response fields: job_id, document_id, status: "processing", message

Database Operations:

Metadata Fields Set:

Processing Stages:

  1. Stage 0 (0-15%): Product Discovery - Claude/GPT analyzes entire PDF
  2. Stage 1 (15-30%): Focused Extraction - Extract only product pages
  3. Stage 2 (30-50%): Chunking - Create chunks for vector DB
  4. Stage 3 (50-70%): Image Processing - Qwen3-VL + CLIP embeddings
  5. Stage 4 (70-90%): Product Creation - Create product records
  6. Stage 5 (90-100%): Quality Enhancement - Claude validation (async)

Frontend Integration:


2.2 GET /api/rag/documents/job/{job_id}

Purpose: Get job status and metadata for async processing Used In: Progress tracking, completion detection, error handling Flow: Frontend polls this endpoint every 2 seconds during processing

Request: GET /api/rag/documents/job/{job_id}

Response fields: job_id, status (processing | completed | failed | interrupted), document_id, progress, error, metadata (chunks_created, products_created, images_extracted, processing_time, current_stage, pages_completed, pages_failed, pages_skipped), checkpoints, created_at, updated_at

Database Operations:


1.3 POST /documents/upload-focused

Purpose: Upload PDF and extract only pages containing a specific product Used In: Single product extraction from multi-product catalogs Flow: User specifies product β†’ PDF scanned β†’ Extract matching pages β†’ Process focused PDF

Request: Multipart form-data with fields: file (required), product_name (required), designer (optional), search_terms (optional), title, description, tags

Response fields: job_id, document_id, status: "processing", product_name, pages_found

Database Operations:

Use Case: Extract "NOVA" product from Harmony PDF (pages 5-11)


1.4 GET /documents/job/{job_id}

Purpose: Get job status and metadata for async processing Used In: Progress tracking, completion detection, error handling Flow: Frontend polls this endpoint every 2 seconds during processing

Request: GET /api/rag/documents/job/{job_id}

Response fields: job_id, status, document_id, progress, error, metadata, checkpoints, created_at, updated_at

Database Operations: SELECT FROM background_jobs WHERE id = ?

Critical Fields: βœ… VERIFIED

Frontend Integration:


1.5 GET /chunks

Purpose: Get document chunks with pagination Used In: Knowledge Base viewer, Chunk inspector, Admin dashboard Flow: User views document β†’ Fetch chunks β†’ Display in UI

Request: GET /api/rag/chunks?document_id={uuid}&limit=100&offset=0

Response: chunks array (id, document_id, content, chunk_index, metadata, quality_score, created_at) and total count

Database Operations:

Frontend Integration:


1.6 GET /images

Purpose: Get document images with analysis results Used In: Image gallery, Image inspector, Admin dashboard Flow: User views document β†’ Fetch images β†’ Display gallery

Request: GET /api/rag/images?document_id={uuid}&limit=100&offset=0

Response: images array (id, document_id, image_url, page_number, QWEN_analysis, clip_embedding, quality_score, created_at) and total count

Database Operations:

Frontend Integration:


1.7 GET /products

Purpose: Get products extracted from document Used In: Products tab, Product catalog, Materials page Flow: User views products β†’ Fetch from database β†’ Display cards

Request: GET /api/rag/products?document_id={uuid}&limit=100&offset=0

Response: products array (id, name, description, source_document_id, metadata, quality_score, created_at) and total count

Database Operations:

Frontend Integration:


1.8 GET /product-image-relationships ✨ NEW

Purpose: Get product-to-image relationships for validation and testing Used In: Test scripts, Admin dashboard, Relationship viewer Flow: Query relationships β†’ Return product-image links with scores

Request: GET /api/rag/product-image-relationships?document_id={uuid}&limit=100&offset=0&min_score=0.0

Query Parameters:

Response: document_id, product_id, relationships array (with product and image details), count, limit, offset, and statistics (total_relationships, by_relationship_type, min_score_filter)

Database Operations:

Frontend Integration:


1.9 GET /chunk-product-relationships ✨ NEW

Purpose: Get chunk-to-product relationships for validation and testing Used In: Test scripts, Admin dashboard, Content analysis Flow: Query relationships β†’ Return chunk-product links

Request: GET /api/rag/chunk-product-relationships?document_id={uuid}&limit=100&offset=0

Query Parameters:

Response: document_id, product_id, relationships array (with chunk content and product name details), count, limit, offset

Database Operations:

Frontend Integration:


1.8 POST /query

Purpose: Query documents using RAG (Retrieval-Augmented Generation) Used In: Main search interface, Q&A functionality Flow: User asks question β†’ Semantic search β†’ Retrieve relevant chunks β†’ Generate answer with AI

Request fields: query, document_ids, top_k, model

Response fields: answer, sources (chunk_id, content, score, document_id), model_used

Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchInterface.tsx, QAModal.tsx


1.9 POST /chat

Purpose: Conversational interface for document Q&A with context Used In: Chat interface, conversational search Flow: User sends message β†’ Maintain conversation history β†’ Generate contextual response

Request fields: message, conversation_id, document_ids

Response fields: response, conversation_id, sources, model_used

Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: ChatInterface.tsx


1.10 POST /search

Purpose: Semantic search across document collection Used In: Search page, knowledge base search Flow: User enters search term β†’ Semantic/hybrid/keyword search β†’ Return ranked results

Request fields: query, search_type, filters (document_ids, tags), top_k

Response: results array (chunk_id, content, score, metadata) and total

Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchPage.tsx, KnowledgeBase.tsx


1.11 POST /search/advanced

Purpose: Advanced query search with query expansion and optimization Used In: Advanced search interface Flow: User query β†’ Query expansion β†’ Multi-strategy search β†’ Ranked results

Request fields: query, expand_query, rerank, filters

Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: AdvancedSearch.tsx


1.12 POST /search/mmr

Purpose: MMR (Maximal Marginal Relevance) search for diverse results Used In: Search with diversity requirements Flow: User query β†’ Semantic search β†’ MMR reranking β†’ Diverse results

Request fields: query, lambda_param, top_k

Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchPage.tsx (diversity mode)


1.13 GET /documents

Purpose: List and filter documents in collection Used In: Documents page, admin dashboard Flow: User views documents β†’ Fetch with filters β†’ Display list

Request: GET /api/rag/documents?limit=20&offset=0&search=harmony&tags=catalog

Response: documents array (id, title, filename, page_count, chunks_count, images_count, products_count, created_at) and total

Database Operations: SELECT FROM documents Frontend Integration: DocumentsPage.tsx, AdminDashboard.tsx


1.14 DELETE /documents/{document_id}

Purpose: Delete document and all associated data Used In: Document management, cleanup Flow: User deletes document β†’ Remove from database β†’ Delete from storage β†’ Cleanup embeddings

Request: DELETE /api/rag/documents/{document_id}

Response: success and deleted counts (document, chunks, images, products, embeddings)

Database Operations:

Frontend Integration: DocumentsPage.tsx (delete button)


1.15 GET /health

Purpose: Health check for RAG services Used In: Monitoring, admin dashboard Flow: System checks β†’ Verify all services β†’ Return status

Request: GET /api/rag/health

Response fields: status, services (rag, embeddings, vector_store, database), timestamp

Database Operations: None (service checks only) Frontend Integration: AdminDashboard.tsx (health monitor)


1.16 GET /stats

Purpose: Get RAG system statistics Used In: Admin dashboard, analytics Flow: Fetch system metrics β†’ Calculate statistics β†’ Return summary

Request: GET /api/rag/stats

Response fields: documents, chunks, images, products, embeddings, storage_used_mb, avg_processing_time

Database Operations: SELECT COUNT FROM documents, document_chunks, document_images, products, embeddings Frontend Integration: AdminDashboard.tsx (statistics panel)


1.17 GET /job/{job_id}/ai-tracking

Purpose: Get detailed AI model tracking for a job Used In: Job monitoring, AI usage analytics Flow: Fetch job β†’ Get AI tracking data β†’ Return model usage details

Request: GET /api/rag/job/{job_id}/ai-tracking

Response: job_id and models_used (per-model: calls, tokens, cost, stages) and total_cost

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx, AIUsagePanel.tsx


1.18 GET /job/{job_id}/ai-tracking/model/{model_name}

Purpose: Get AI tracking for specific model Used In: Model-specific analytics Flow: Fetch job β†’ Filter by model β†’ Return model-specific data

Request: GET /api/rag/job/{job_id}/ai-tracking/model/QWEN

Database Operations: SELECT FROM background_jobs Frontend Integration: AIUsagePanel.tsx (model filter)


1.19 GET /job/{job_id}/ai-tracking/stage/{stage}

Purpose: Get AI tracking for specific processing stage Used In: Stage-specific analytics Flow: Fetch job β†’ Filter by stage β†’ Return stage-specific AI usage

Request: GET /api/rag/job/{job_id}/ai-tracking/stage/image_analysis

Database Operations: SELECT FROM background_jobs Frontend Integration: StageMonitor.tsx


1.20 GET /jobs/{job_id}/checkpoints

Purpose: Get all checkpoints for a job Used In: Job recovery, debugging Flow: Fetch job β†’ Get checkpoint history β†’ Return checkpoint data

Request: GET /api/rag/jobs/{job_id}/checkpoints

Response: checkpoints array (stage, progress, data, completed_at) and count

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (checkpoint viewer)


1.21 POST /jobs/{job_id}/restart

Purpose: Manually restart job from last checkpoint Used In: Job recovery, error handling Flow: User triggers restart β†’ Load checkpoint β†’ Resume processing

Request: POST /api/rag/jobs/{job_id}/restart

Response: success, job_id, resumed_from, progress

Database Operations:

Frontend Integration: JobMonitor.tsx (restart button)


1.22 POST /documents/job/{job_id}/resume

Purpose: Resume job from last checkpoint (alias for restart) Used In: Job recovery Flow: Same as /jobs/{job_id}/restart

Database Operations: Same as restart endpoint Frontend Integration: JobMonitor.tsx


1.23 GET /documents/jobs

Purpose: List all background jobs with filtering Used In: Admin dashboard, job management Flow: Fetch jobs β†’ Apply filters β†’ Return paginated list

Request: GET /api/rag/documents/jobs?limit=20&offset=0&status=processing

Response: jobs array (id, document_id, filename, status, progress, created_at) and total

Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (jobs panel)


1.24 GET /documents/documents/{document_id}/content

Purpose: Get complete document content with all AI analysis Used In: Document viewer, export functionality Flow: Fetch document β†’ Get all related data β†’ Return comprehensive content

Request: GET /api/rag/documents/documents/{document_id}/content?include_chunks=true&include_images=true&include_products=true

Response: document, chunks, images, products, embeddings

Database Operations:

Frontend Integration: DocumentViewer.tsx, ExportModal.tsx


1.25 POST /documents/upload

Purpose: Upload and process document for RAG Used In: Simple document upload Flow: Upload β†’ Process β†’ Generate embeddings β†’ Complete

Request: Multipart form-data with fields: file (PDF), title, chunk_size (default: 2048), chunk_overlap (default: 200)

Database Operations:

Frontend Integration: SimpleUploadForm.tsx


2. Admin Routes (18 endpoints)

Base Path: /api/admin Purpose: Administrative functions for system management Used In: Admin dashboard, system configuration, job management

2.1 GET /jobs

Purpose: List all jobs with filtering and pagination Used In: Admin dashboard jobs panel Flow: Admin views jobs β†’ Apply filters β†’ Display paginated list

Request: GET /api/admin/jobs?status=processing&limit=20&offset=0

Response: jobs array and total count

Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (jobs panel)


2.2 GET /jobs/statistics

Purpose: Get comprehensive job statistics and metrics Used In: Admin dashboard analytics Flow: Fetch all jobs β†’ Calculate metrics β†’ Return statistics

Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (statistics panel)


2.3 GET /jobs/{job_id}

Purpose: Get detailed status for specific job Used In: Job monitoring, debugging Flow: Fetch job by ID β†’ Return full details

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx


2.4 GET /jobs/{job_id}/status

Purpose: Alternative endpoint for job status Used In: Job monitoring (alternative path) Flow: Same as /jobs/{job_id}

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx


2.5 DELETE /jobs/{job_id}

Purpose: Cancel a running job Used In: Job management, error recovery Flow: User cancels job β†’ Update status β†’ Stop processing

Database Operations: UPDATE background_jobs Frontend Integration: JobMonitor.tsx (cancel button)


2.6 POST /bulk/process

Purpose: Process multiple documents in bulk Used In: Bulk upload, batch processing Flow: Upload multiple URLs β†’ Queue jobs β†’ Process in parallel

Database Operations: INSERT INTO documents, background_jobs Frontend Integration: BulkUploadModal.tsx


2.7 GET /system/health

Purpose: Get comprehensive system health status Used In: Monitoring dashboard, health checks Flow: Check all services β†’ Return health status

Database Operations: None (service checks only) Frontend Integration: AdminDashboard.tsx (health monitor)


2.8 GET /system/metrics

Purpose: Get detailed system performance metrics Used In: Performance monitoring, analytics Flow: Collect metrics β†’ Calculate statistics β†’ Return data

Database Operations: SELECT FROM background_jobs, documents Frontend Integration: AdminDashboard.tsx (metrics panel)


2.9 DELETE /data/cleanup

Purpose: Clean up old data from system Used In: Data maintenance, storage management Flow: Find old data β†’ Delete records β†’ Return summary

Database Operations: DELETE FROM documents, document_chunks, document_images, products, embeddings Frontend Integration: AdminDashboard.tsx (cleanup button)


2.10 POST /data/backup

Purpose: Create backup of system data Used In: Data backup, disaster recovery Flow: Export data β†’ Create backup file β†’ Return download link

Database Operations: SELECT FROM all tables Frontend Integration: AdminDashboard.tsx (backup button)


2.11 GET /data/export

Purpose: Export system data in various formats Used In: Data export, reporting Flow: Fetch data β†’ Format (JSON/CSV) β†’ Return file

Database Operations: SELECT FROM background_jobs, documents Frontend Integration: AdminDashboard.tsx (export button)


2.12 GET /packages/status

Purpose: Get status of all system packages and dependencies Used In: System diagnostics, dependency management Flow: Check installed packages β†’ Return versions and status

Database Operations: None (system checks only) Frontend Integration: AdminDashboard.tsx (packages panel)


2.13 GET /jobs/{job_id}/progress

Purpose: Get detailed progress for specific job Used In: Real-time job monitoring Flow: Fetch job β†’ Extract progress data β†’ Return details

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (progress bar)


2.14 GET /jobs/progress/active

Purpose: Get progress for all active jobs Used In: Multi-job monitoring Flow: Fetch active jobs β†’ Return progress summary

Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (active jobs panel)


2.15 GET /jobs/{job_id}/progress/pages

Purpose: Get page-by-page progress for job Used In: Detailed progress tracking Flow: Fetch job β†’ Extract page progress β†’ Return details

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (page progress)


2.16 GET /jobs/{job_id}/progress/stream

Purpose: Stream real-time progress updates (SSE) Used In: Real-time monitoring Flow: Open SSE connection β†’ Stream progress updates

Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (real-time updates)


2.17 POST /test-product-creation

Purpose: Test endpoint for enhanced product creation Used In: Testing, development Flow: Test product detection β†’ Return results

Database Operations: SELECT/INSERT products Frontend Integration: Development tools


2.18 POST /admin/images/{image_id}/process-ocr

Purpose: Manually reprocess image with OCR Used In: Image reprocessing, error recovery Flow: Fetch image β†’ Run OCR β†’ Update database

Database Operations: UPDATE document_images, document_chunks Frontend Integration: ImageViewer.tsx (reprocess button)


2.19 POST /api/rag/metadata/detect-scope

Purpose: Detect metadata scope for text chunks (product-specific vs catalog-general) Used In: PDF processing pipeline, metadata classification Flow: Analyze chunk β†’ Classify scope β†’ Return scope with confidence

Request fields: chunk_content, product_names, document_context

Response: success and data with scope, confidence, reasoning, applies_to, extracted_metadata, is_override

Scope Types:

Database Operations: None (AI-powered classification) Frontend Integration: Admin metadata management, PDF processing monitoring


2.20 POST /api/rag/metadata/apply-to-products

Purpose: Apply metadata to products with scope-aware override logic Used In: PDF processing pipeline (Stage 4), metadata management Flow: Detect scope β†’ Apply in order β†’ Track overrides β†’ Update database

Request fields: document_id, chunks_with_scope (array of chunk_id, content, scope, metadata, applies_to)

Response: success and data with products_updated, metadata_fields_applied, overrides_tracked, catalog_general_count, product_specific_count, processing_time_ms

Processing Order:

  1. Catalog-general (explicit) - Lowest priority
  2. Catalog-general (implicit) - Low priority
  3. Category-specific - Medium priority
  4. Product-specific - Highest priority (can override)

Database Operations:

Frontend Integration: PDF processing pipeline, admin metadata management


2.21 GET /api/rag/metadata/list

Purpose: List metadata with filtering and pagination Used In: Admin metadata viewer, metadata analytics Flow: Query database β†’ Filter β†’ Paginate β†’ Return results

Request: GET /api/rag/metadata/list?document_id=uuid&scope=catalog_general_implicit&limit=50&offset=0

Query Parameters:

Response: success and data with items array (product_id, product_name, metadata_key, metadata_value, scope, source_chunk_id, is_override, created_at), total, limit, offset

Database Operations:

Frontend Integration: Admin metadata management page, metadata analytics dashboard


2.22 GET /api/rag/metadata/statistics

Purpose: Get metadata statistics and analytics Used In: Admin dashboard, metadata analytics Flow: Aggregate metadata β†’ Calculate stats β†’ Return summary

Request: GET /api/rag/metadata/statistics?document_id=uuid

Query Parameters:

Response: success and data with total_products, total_metadata_fields, catalog_general_count, product_specific_count, category_specific_count, override_count, most_common_fields, scope_distribution

Database Operations:

Frontend Integration: Admin dashboard, metadata analytics page


3. Search Routes (CONSOLIDATED)

Base Path: /api/rag Purpose: Unified search and query functionality across documents Philosophy: Single search endpoint with strategy parameter instead of multiple separate endpoints

3.1 POST /search

Purpose: CONSOLIDATED search endpoint for all 6 search strategies βœ… Status: All strategies implemented (100% complete) Replaces:

Available Strategies:

Strategy Status Use Case Performance
semantic βœ… Natural language queries <150ms
vector βœ… Exact similarity matching <100ms
multi_vector βœ… Text + visual understanding <200ms
hybrid βœ… Technical terms + semantics <180ms
material βœ… Property-based filtering <50ms
image βœ… Visual similarity <150ms
all βœ… All strategies combined <800ms

Request: POST /api/rag/search?strategy={strategy} with body fields:

Response: query, enhanced_query, results array (id, name, description, relevance_score, metadata, score_breakdown for multi_vector, found_in_strategies for all), total_results, search_type, processing_time, strategies_executed, strategies_count

Usage Examples:

1. Semantic Search (Natural Language): POST to ?strategy=semantic with query and workspace_id

2. Multi-Vector Search (Text + Visual): POST to ?strategy=multi_vector with query, workspace_id, and optional weight overrides

3. Hybrid Search (Semantic + Keyword): POST to ?strategy=hybrid with query, workspace_id, semantic_weight, keyword_weight

4. Material Property Search: POST to ?strategy=material with query: "", workspace_id, and material_filters object

5. Image Search (Visual Similarity): POST to ?strategy=image with query: "", workspace_id, and image_url


3.X POST /api/rag/search/knowledge-base ✨ NEW

Purpose: Search existing knowledge base without uploading a PDF Added: 2025-12-03 (v2.4.0) Used In: Knowledge base search, entity discovery, product search Flow: User searches β†’ Multi-vector search across products/entities/chunks β†’ Return unified results

Features:

Request fields:

Response fields: query, total_results, products array, entities array, chunks array, images array, processing_time, search_metadata

KB-doc chunk shape (2026-04): each chunks[i] from kb_docs now includes category_slug, category_name, and price_doc_type alongside the existing fields.

Database Operations:

Frontend Integration:

Usage Examples:

Database Operations:

Frontend Integration: SearchPage.tsx, KnowledgeBase.tsx, ProductDiscovery.tsx

Related Documentation: Search Strategies Guide


3.2 POST /query

Purpose: CONSOLIDATED query endpoint with auto-detecting modality Replaces: Multiple query endpoints with different modalities

Request fields: query, modality (auto | text | image | multimodal), limit, workspace_id

Response fields: success, answer, sources (chunk_id, content, relevance_score), modality_detected, processing_time_ms


PDF Extraction - DEPRECATED ⚠️ (REMOVED)

All /api/pdf/extract/* endpoints have been removed as of November 7, 2025.

Removed Endpoints:

Replacement: Use POST /api/rag/documents/upload

The RAG endpoint provides identical functionality using the same PyMuPDF4LLM library. It accepts multipart/form-data with a file (PDF) and workspace_id and returns markdown, tables, images, and status.

Benefits of consolidation:


2. Document Management - DEPRECATED ⚠️

All /api/documents/* endpoints have been removed. Use /api/rag/* endpoints instead.

See Section 2 (RAG System) for current endpoints:


3. Search APIs (8 endpoints)

Semantic Search β€” POST /api/search/semantic β€” body: query, workspace_id, limit, threshold β€” response: results array (id, title, score, content)

Vector Search β€” POST /api/search/vector β€” body: embedding (float array), workspace_id, limit, metric β€” response: results array (id, similarity_score)

Hybrid Search β€” POST /api/search/hybrid β€” body: query, embedding, workspace_id, limit, semantic_weight β€” response: results array

Visual Search β€” POST /api/search/visual β€” multipart: image file, workspace_id, limit β€” response: results array (id, similarity_score, image_url)

Material Search β€” POST /api/search/materials β€” body: query, filters (material_type, color, texture), limit β€” response: materials array

Search Recommendations β€” GET /api/search/recommendations β€” query params: query, workspace_id β€” response: suggestions array

Search Analytics β€” GET /api/analytics β€” query params: workspace_id, date_range β€” response: top_queries, search_volume, avg_response_time


4. Image Analysis (6 endpoints)

Analyze Image β€” POST /api/images/analyze β€” multipart: image file, analysis_type β€” response: materials, colors, textures, quality_score

Batch Image Analysis β€” POST /api/images/analyze/batch β€” multipart: multiple images β€” response: results array (image_id, analysis)

Search Similar Images β€” POST /api/images/search β€” multipart: image file, limit β€” response: similar_images array

Upload & Analyze β€” POST /api/images/upload-and-analyze β€” multipart β€” response: image_id, url, analysis

Re-classify Image ✨ NEW β€” POST /api/images/reclassify/{image_id} β€” params: image_id, force_validation (optional boolean) β€” response: success, image_id, classification (is_material, confidence, reason, model), updated_data, message


5. RAG System (7 endpoints)

Upload Document β€” POST /api/v1/rag/documents/upload β€” multipart: file, title, metadata β€” response: document_id, chunks_created, embeddings_generated

Query RAG β€” POST /api/v1/rag/query β€” body: query, workspace_id, top_k β€” response: results array (chunk_id, content, score)

Chat with RAG β€” POST /api/v1/rag/chat β€” body: message, conversation_id, workspace_id β€” response: response, sources array

Search RAG β€” POST /api/v1/rag/search β€” body: query, filters, limit β€” response: results array

List RAG Documents β€” GET /api/v1/rag/documents β€” query params: workspace_id, limit β€” response: documents array

RAG Health β€” GET /api/v1/rag/health β€” response: status, indices_count, memory_usage

RAG Statistics β€” GET /api/v1/rag/stats β€” response: document_count, chunk_count, embedding_count


6. Embeddings (3 endpoints)

Generate Embedding β€” POST /api/embeddings/generate β€” body: text β€” response: embedding (float array), dimension

Batch Embeddings β€” POST /api/embeddings/batch β€” body: texts (string array) β€” response: embeddings (array of float arrays)

CLIP Embeddings β€” POST /api/embeddings/clip-generate β€” multipart: image file, embedding_type β€” response: embedding, type, dimension


7. Products (6 endpoints)

Create Product β€” POST /api/products β€” body: name, description, metafields, images, chunks β€” response: product_id, created_at

Get Product β€” GET /api/products/{id} β€” response: id, name, description, metafields, images, chunks

Update Product β€” PATCH /api/products/{id} β€” body: name, description, metafields β€” response: success, updated_at

Delete Product β€” DELETE /api/products/{id} β€” response: success

List Products β€” GET /api/products β€” query params: workspace_id, limit, offset β€” response: products array, total_count

Find Similar Products β€” GET /api/products/{id}/similar β€” query params: limit β€” response: similar_products array


8. Admin & Monitoring (8 endpoints)

Get Job Progress β€” GET /api/admin/jobs/{id}/progress β€” response: job_id, status, progress_percent, current_stage

Get Page Progress β€” GET /api/admin/jobs/{id}/progress/pages β€” response: pages array (page_number, status, progress)

Stream Progress β€” GET /api/admin/jobs/{id}/progress/stream β€” response: Server-Sent Events

Get Chunk Quality β€” GET /api/admin/chunks/quality β€” query params: workspace_id β€” response: chunks array (id, quality_score, status)

AI Metrics β€” GET /api/admin/ai-metrics β€” response: models_used, total_tokens, cost_estimate, processing_time

System Health β€” GET /health β€” response: status, uptime, database, api_latency

Performance Metrics β€” GET /metrics β€” response: requests_per_second, avg_latency, error_rate

Performance Summary β€” GET /performance/summary β€” response: summary_stats


7. Document Entities Routes (5 endpoints)

Base Path: /api/document-entities Purpose: Manage document entities (certificates, logos, specifications) as separate knowledge base Used In: Docs Admin Page, Agentic queries, Product-document relationships Architecture: Document entities are stored separately from products and linked via relationships

7.1 GET /api/document-entities/

Purpose: Get all document entities for a workspace with filtering Used In: Docs Admin Page, Agentic queries Flow: Query entities β†’ Apply filters β†’ Return paginated results

Request: GET /api/document-entities/?workspace_id={uuid}&entity_type=certificate&factory_name=CastellΓ³n Factory&limit=100&offset=0

Query Parameters:

Response: Array of entities with fields: id, entity_type, name, description, page_range, factory_name, factory_group, manufacturer, metadata, created_at

Database Operations:

Frontend Integration: DocsManagement.tsx (Docs Admin Page)

Agentic Query Examples:


7.2 GET /api/document-entities/{entity_id}

Purpose: Get a specific document entity by ID Used In: Entity detail view, relationship management Flow: Fetch entity by ID β†’ Return entity details

Request: GET /api/document-entities/{entity_id}

Response fields: id, entity_type, name, description, page_range, factory_name, metadata, created_at

Database Operations:

Frontend Integration: EntityDetailModal.tsx


7.3 GET /api/document-entities/product/{product_id}

Purpose: Get all document entities linked to a specific product Used In: Product detail page, agentic queries Flow: Fetch product relationships β†’ Get linked entities β†’ Return entities

Request: GET /api/document-entities/product/{product_id}?entity_type=certificate

Query Parameters:

Response: Array of entities with fields: entity_type, name, description, page_range, factory_name, metadata

Database Operations:

Frontend Integration: ProductDetailPage.tsx

Agentic Query Example:


7.4 GET /api/document-entities/factory/{factory_name}

Purpose: Get all document entities for a specific factory Used In: Factory-specific queries, compliance reports Flow: Query by factory name β†’ Filter by entity type β†’ Return entities

Request: GET /api/document-entities/factory/CastellΓ³n Factory?entity_type=certificate

Query Parameters:

Response: Array of entities with fields: entity_type, name, factory_name, factory_group, metadata

Database Operations:

Frontend Integration: FactoryComplianceReport.tsx

Agentic Query Example:


7.5 GET /api/document-entities/relationships/product/{product_id}

Purpose: Get all product-document relationships for a product Used In: Relationship management, linking visualization Flow: Fetch relationships β†’ Return relationship details with scores

Request: GET /api/document-entities/relationships/product/{product_id}

Response: Array of relationships with fields: id, product_id, document_entity_id, relationship_type, relevance_score, metadata (linking_method, confidence), created_at

Database Operations:

Frontend Integration: RelationshipViewer.tsx


πŸ” Authentication

All endpoints require one of:

  1. Supabase JWT (Frontend) β€” Authorization: Bearer {supabase_jwt_token}
  2. MIVAA JWT (Internal) β€” Authorization: Bearer {mivaa_jwt_token}
  3. API Key (External) β€” X-API-Key: {api_key}

πŸ“Š Response Format

All endpoints return JSON with fields: success (boolean), data (object), error (null or string), timestamp.


⚑ Rate Limiting


15. Duplicate Detection Routes

Purpose: Detect and merge duplicate products from the same factory/manufacturer

CRITICAL RULE: Duplicates are ONLY detected when products have the same factory/manufacturer in metadata. Visual similarity alone does NOT constitute a duplicate.

15.1 POST /api/duplicates/detect

Purpose: Detect potential duplicates for a specific product

Request fields: product_id, workspace_id, similarity_threshold

Response: success, product_id, duplicates_found, duplicates array (product_id, name, factory, overall_similarity, confidence_level)


15.2 POST /api/duplicates/batch-detect

Purpose: Scan entire workspace for duplicate products

Request fields: workspace_id, similarity_threshold, limit

Response: success, workspace_id, duplicate_pairs_found, duplicate_pairs array


15.3 GET /api/duplicates/cached

Purpose: Get cached duplicate detections

Query Parameters:

Response: success, workspace_id, cached_duplicates, duplicates array


15.4 POST /api/duplicates/update-status

Purpose: Update duplicate detection status

Request fields: cache_id, status, user_id

Valid Statuses: 'pending', 'reviewed', 'merged', 'dismissed'


15.5 POST /api/duplicates/merge

Purpose: Merge duplicate products into a single product

Request fields: target_product_id, source_product_ids, workspace_id, user_id, merge_strategy, merge_reason

Response: success, history_id, target_product, merged_count, message


15.6 POST /api/duplicates/undo-merge

Purpose: Undo a product merge operation

Request fields: history_id, user_id

Response: success, message, restored_products


15.7 GET /api/duplicates/merge-history

Purpose: Get merge history for a workspace

Query Parameters:

Response: success, workspace_id, merge_count, merges array


16. Data Import Routes

Category: Data Import (XML, Web Scraping) Total Endpoints: 4 Status: βœ… Phase 1 & 2 Complete (XML Import with Dynamic Mapping & Backend Processing)

16.1 POST /api/import/process

Purpose: Start processing an import job (called by Edge Function)

Request fields: job_id, workspace_id

Response: success, message, job_id

Features:

Database Operations:


16.2 GET /api/import/jobs/{job_id}

Purpose: Get import job status and progress

Path Parameters:

Response fields: job_id, status, import_type, source_name, total_products, processed_products, failed_products, progress_percentage, current_stage, started_at, completed_at, error_message, estimated_time_remaining

Status Values:

Database Operations:


16.3 GET /api/import/history

Purpose: Get import history for a workspace with pagination and filters

Query Parameters:

Response: imports array (job_id, import_type, source_name, status, total_products, processed_products, failed_products, created_at, completed_at, is_scheduled, next_run_at), total_count, page, page_size

Database Operations:


16.4 GET /api/import/health

Purpose: Health check for data import API

Response fields: status, service, version, features (xml_import, web_scraping, batch_processing, concurrent_image_downloads, checkpoint_recovery, real_time_progress)

Features Status:


Edge Function Endpoints

POST /xml-import-orchestrator

Purpose: Parse XML, detect fields, suggest mappings, create import jobs

Hosted: Supabase Edge Function (Deno)

Request fields: workspace_id, category, xml_content (base64 encoded), preview_only, field_mappings, mapping_template_id, parent_job_id

Response (Preview Mode): success, detected_fields array (xml_field, suggested_mapping, confidence, sample_values), total_products

Response (Import Mode): success, job_id, total_products

Features:

Database Operations:


POST /scheduled-import-runner

Purpose: Run scheduled imports via Supabase Cron

Hosted: Supabase Edge Function (Deno)

Trigger: Supabase Cron (every 15 minutes)

Features:

Database Operations:


Total Endpoints: 119 (115 + 4 Data Import) Last Updated: November 10, 2025

See Also:



19. Messaging Routes (SMS, WhatsApp) ✨ NEW v2.6.0

Base Path: Supabase Edge Function /functions/v1/messaging-api Purpose: Multi-channel messaging via Twilio (SMS, WhatsApp) Provider: Twilio - Single API for all channels Philosophy: Unified messaging API with templates, campaigns, analytics, and compliance

Environment Variables (Supabase Secrets)

Required secrets: TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN

19.1 POST /functions/v1/messaging-api (action: send)

Purpose: Send a single message via SMS or WhatsApp Used In: Test messages, transactional notifications, OTP delivery

Request fields: action: "send", channel (sms | whatsapp), to (phone number), content, from (optional), messageType (transactional | marketing | otp | notification), variables (for template rendering), templateSlug (optional), mediaUrl (optional), tags (optional), whatsappContentSid (optional, for WhatsApp pre-approved templates)

Response: success, messageId, logId

Twilio API Endpoints Used:


19.2 POST /functions/v1/messaging-api (action: send-bulk)

Purpose: Send messages to multiple recipients in bulk Used In: Marketing campaigns, mass notifications

Request fields: action: "send-bulk", channel, recipients (array of {to, variables}), content, templateSlug (optional), messageType, from (optional)

Response: success, bulkId, total, sent, failed, optedOut, results array (to, status, messageId or error)


19.3 POST /functions/v1/messaging-api (action: channels)

Purpose: List all configured messaging channels Used In: Channel management UI, channel selection dropdowns

Request fields: action: "channels", channelType (optional filter: sms | whatsapp)

Response: success, channels array (id, channel_type, provider, sender_id, display_name, is_active, is_default, daily_quota, max_send_rate, config)

Database Table: messaging_channels


19.4 POST /functions/v1/messaging-api (action: templates)

Purpose: List all messaging templates Used In: Template management UI, campaign creation

Request fields: action: "templates", channelType (optional filter)

Response: success, templates array (id, name, slug, channel_type, content, variables, category, whatsapp_template_name, is_approved, is_active)

Database Table: messaging_templates


19.5 POST /functions/v1/messaging-api (action: logs)

Purpose: Get message delivery logs with filtering Used In: Message logs tab, delivery tracking, debugging

Request fields: action: "logs", channelType (optional), status (optional: queued | sent | delivered | read | failed | rejected), messageType (optional), limit

Response: success, logs array (id, channel_type, provider_message_id, from_number, to_number, content, status, sent_at, delivered_at, cost, currency)

Database Table: messaging_logs


19.6 POST /functions/v1/messaging-api (action: analytics)

Purpose: Get aggregated messaging analytics Used In: Analytics dashboard, reporting

Request fields: action: "analytics", channelType (optional), dateRange (start, end)

Response: success, totalSent, totalDelivered, totalRead (WhatsApp only), totalFailed, totalCost, deliveryRate, readRate, failureRate, dailyData array

Database Table: messaging_analytics


19.7 POST /functions/v1/messaging-api (action: balance)

Purpose: Get Twilio account balance Used In: Header balance display, billing monitoring

Request fields: action: "balance"

Response: success, balance, currency

Twilio API Endpoint: GET /2010-04-01/Accounts/{AccountSid}/Balance.json


19.8 POST /functions/v1/messaging-api (action: sync-senders)

Purpose: Sync senders/numbers from Twilio account to local database Used In: Channel sync button, initial setup

Request fields: action: "sync-senders", autoImport (boolean β€” set to true to automatically import to database)

Response: success, senders (sms and whatsapp arrays with sender_id, display_name, status), total, imported

Twilio API Endpoints Used:


19.9 POST /functions/v1/messaging-api (action: whatsapp-templates)

Purpose: Fetch WhatsApp templates from Twilio Content API Used In: WhatsApp template selection, template sync

Request fields: action: "whatsapp-templates"

Response: success, templates array (sid, friendly_name, language, types with body)

Twilio API Endpoint: GET /v1/Content (Twilio Content API)


19.10 POST /functions/v1/messaging-api (action: send-test)

Purpose: Send a test message for a campaign Used In: Campaign testing, preview verification

Request fields: action: "send-test", campaignId, testNumber

Response: success, messageId


Twilio API Reference (Internal)

Base URL: https://api.twilio.com Authentication: HTTP Basic Auth (Account SID + Auth Token)

Endpoint Method Purpose
/2010-04-01/Accounts/{AccountSid}/Messages.json POST Send SMS/WhatsApp messages
/2010-04-01/Accounts/{AccountSid}/IncomingPhoneNumbers.json GET List phone numbers
/2010-04-01/Accounts/{AccountSid}/Balance.json GET Get account balance
/v1/Content GET List WhatsApp content templates (Content API)

Documentation: https://www.twilio.com/docs/messaging/api


Database Tables

The messaging system uses the following tables:


Summary

Total Endpoints: 140+ Latest Version: v2.6.0 Last Updated: January 2026

New in v2.6.0:

New in v2.5.0:

Key Features: