Last Updated: 2026-01-21 API Version: v2.6.0 Total Endpoints: 140+
Complete reference of all consolidated API endpoints with detailed usage information, database operations, and integration points.
Recent Updates (v2.6.0 - January 2026):
POST /functions/v1/messaging-api - Unified messaging API (action-based routing)Previous Updates (v2.5.0 - December 30, 2025):
POST /api/images/reclassify/{image_id} - Re-run material vs non-material classificationPrevious Updates (v2.3.0 - November 22, 2025):
GET /api/rag/product-image-relationships - Query product-to-image relationshipsGET /api/rag/chunk-product-relationships - Query chunk-to-product relationshipsPrevious Updates (v2.3.0 - Knowledge Base System):
Previous Updates (v2.2.0):
/api/pdf/extract/* endpoints removed - use /api/rag/documents/upload/api/rag/documents/upload endpoint replaces 3 separate upload endpoints/api/rag/search endpoint with strategy parameter replaces 8+ search endpoints/health endpoint replaces 10+ individual health checksTotal API Endpoints: 140+ endpoints across 19 categories
β¨ CONSOLIDATED ENDPOINTS (One Endpoint, One Purpose, No Duplicates)
Purpose: Unified health check for all MIVAA services
Replaces: 10+ individual health check endpoints (/api/pdf/health, /api/rag/health, /api/search/health, etc.)
The response includes status, timestamp, per-service health details with response times (database, storage, AI models including Claude/GPT/QWEN, and RAG), and a version field.
Benefits:
Base Path: /api/kb
Purpose: Document management system with AI embeddings, semantic search, and product attachments
Philosophy: Complete knowledge base for documentation, guides, specifications, and product information
Purpose: Create or upsert a knowledge base document with automatic embedding generation Used In: Knowledge Base admin panel, Documentation editor, Pricing doc ingestion Flow: User creates document β Generate 1024D Voyage AI embedding β Store in database
Upsert semantics (2026-04): if a doc with the same (workspace_id, title, category_id) already exists, this endpoint updates it in place and only re-embeds when content changes. Re-uploading a quarterly price list with the same title refreshes prices without creating duplicates.
Request fields: workspace_id, title, content, content_markdown, summary, category_id, seo_keywords, status, visibility, metadata, price_doc_type (optional, pricing category only β one of price_list | discount_rule | contract_terms | promotion)
Response fields: id, workspace_id, title, content, text_embedding, embedding_status, embedding_generated_at, embedding_model, created_at, view_count, price_doc_type
Database Operations:
kb_docskb_doc_versions (version history)Purpose: Retrieve a single knowledge base document by ID Used In: Document viewer, Edit modal
Request: GET /api/kb/documents/{doc_id}?workspace_id=uuid
Response fields: id, workspace_id, title, content, content_markdown, summary, category_id, embedding_status, created_at, updated_at, view_count
Purpose: Update document with smart embedding regeneration Smart Detection: Only regenerates embedding if content changed (title, content, summary, keywords, category) Used In: Document editor
Request fields: title, content, content_markdown, summary, category_id, seo_keywords, status, visibility, metadata, price_doc_type (2026-04)
Response fields: id, title, content, embedding_status, embedding_generated_at, updated_at, price_doc_type
Database Operations:
kb_docs with new contentkb_doc_versions (version history)Purpose: Delete a knowledge base document Used In: Document management, Admin panel
Request: DELETE /api/kb/documents/{doc_id}?workspace_id=uuid
Response: 204 No Content
Database Operations:
kb_docs (cascades to attachments, versions, comments)Purpose: Create document from PDF with text extraction Used In: PDF upload modal in Knowledge Base Flow: Upload PDF β Extract text using PyMuPDF β Generate embedding β Store document
Request: Multipart form-data with fields: file (PDF), workspace_id, title, category_id (optional), status
Response fields: id, title, content (extracted text), embedding_status, created_at
Database Operations:
kb_docs with extracted textPurpose: Search knowledge base documents using semantic, full-text, or hybrid search Used In: Knowledge Base search interface, AI agent queries Flow: Frontend β MIVAA API β Generate query embedding β Supabase vector search β Return results
Architecture:
kb_match_docs() RPC function with query embedding<=> operatorWhy MIVAA Backend is Required:
kb_docs.text_embedding (generated when doc created)Request fields: workspace_id, query, search_type (default semantic), limit, category_id (optional), category_slug (optional, e.g. "pricing"), price_doc_type (optional, pricing sub-type filter), allowed_access_levels (optional, defaults to admin+agent+public), require_published (default false for admin mgmt β set true to exclude drafts), match_threshold (default 0.5 for semantic)
Search Types:
semantic - Vector similarity using pgvector cosine distance (default)full_text - ILIKE-based keyword matchinghybrid - Combination of semantic + full-textResponse fields: results (array with id, title, content, summary, category_id, category_slug, category_name, status, visibility, embedding_status, price_doc_type, similarity), search_time_ms, total_results
Database Operations:
<=> operatorkb_search_analyticsPurpose: Create a new category Used In: Category management UI, one-time Pricing category setup
Request fields: workspace_id, name, slug (recommended β used by search category filters, e.g. "pricing"), description, parent_category_id, color, icon, sort_order, access_level (admin | agent | public, default agent), trigger_keyword (optional agent gate)
Response fields: id, name, slug, access_level, trigger_keyword, description, color, icon, created_at
Pricing category seed: to enable the price_lookup agent tool, create a category with slug: "pricing", access_level: "admin", trigger_keyword: "price". Docs under this category accept the price_doc_type field on Β§2.1/Β§2.3.
Purpose: List all categories for a workspace Used In: Category dropdown, Category management
Request: GET /api/kb/categories?workspace_id=uuid
Response: success and a categories array with fields: id, name, description, parent_category_id, color, icon, sort_order, document_count
Purpose: Attach a document to one or more products Used In: Product attachment modal
Request fields: workspace_id, document_id, product_id, relationship_type, relevance_score
Relationship Types:
primary - Main documentation for productsupplementary - Additional informationrelated - Related documentationcertification - Certification documentsspecification - Technical specificationsResponse fields: id, document_id, product_id, relationship_type, relevance_score, created_at
Purpose: Get all products attached to a document Used In: Document viewer, Product links section
Request: GET /api/kb/documents/{doc_id}/attachments?workspace_id=uuid
Response: success and attachments array with fields: id, product_id, product_name, relationship_type, relevance_score
Purpose: Get all documents attached to a product Used In: Product page documentation tab
Request: GET /api/kb/products/{product_id}/documents?workspace_id=uuid
Response: success and documents array with fields: id, title, summary, relationship_type, relevance_score, view_count
Purpose: Health check for Knowledge Base service Used In: System monitoring
Request: GET /api/kb/health
Response fields: status, service, features (document_crud, embedding_generation, pdf_extraction, semantic_search, categories, attachments), endpoints
Base Path: /api/rag or /api/v1/rag
Purpose: Core RAG (Retrieval-Augmented Generation) functionality for document processing, querying, and management
Philosophy: One endpoint per function with parameters for different modes/strategies
Purpose: CONSOLIDATED upload endpoint for all document processing scenarios Replaces:
/api/documents/process (removed)/api/documents/process-url (removed)/api/documents/upload (removed)/api/documents/{document_id}/query (removed)/api/documents/{document_id}/related (removed)/api/documents/{document_id}/summarize (removed)/api/documents/{document_id}/extract-entities (removed)/api/documents/compare (removed)Used In: Main PDF upload modal, Product catalog processing, Simple document upload Flow: User uploads PDF β AI discovery β Category extraction β Chunking β Image processing β Product creation
Request: Multipart form-data. Choose one source: file (PDF file) or file_url (URL to PDF). Additional parameters:
categories β products | certificates | logos | specifications | all | extract_onlydiscovery_model β claude | gpt | haiku (default: claude)agent_prompt β Custom prompt for AI processing (optional)enable_prompt_enhancement β true | false (default: true)title, description, tags, workspace_idchunk_size (default: 2048), chunk_overlap (default: 200)All uploads use deep processing mode with complete AI analysis, image embeddings (CLIP), advanced product enrichment, quality validation, and full RAG pipeline.
Response fields: job_id, document_id, status: "processing", message
Database Operations:
Metadata Fields Set:
chunks_created (int) - Number of chunks createdproducts_created (int) - Number of products identifiedimages_extracted (int) - Number of images extracted β
FIXEDprocessing_time (float) - Total processing time in secondsProcessing Stages:
Frontend Integration:
PDFUploadModal.tsxGET /api/rag/documents/job/{job_id} for progressPurpose: Get job status and metadata for async processing Used In: Progress tracking, completion detection, error handling Flow: Frontend polls this endpoint every 2 seconds during processing
Request: GET /api/rag/documents/job/{job_id}
Response fields: job_id, status (processing | completed | failed | interrupted), document_id, progress, error, metadata (chunks_created, products_created, images_extracted, processing_time, current_stage, pages_completed, pages_failed, pages_skipped), checkpoints, created_at, updated_at
Database Operations:
Purpose: Upload PDF and extract only pages containing a specific product Used In: Single product extraction from multi-product catalogs Flow: User specifies product β PDF scanned β Extract matching pages β Process focused PDF
Request: Multipart form-data with fields: file (required), product_name (required), designer (optional), search_terms (optional), title, description, tags
Response fields: job_id, document_id, status: "processing", product_name, pages_found
Database Operations:
Use Case: Extract "NOVA" product from Harmony PDF (pages 5-11)
Purpose: Get job status and metadata for async processing Used In: Progress tracking, completion detection, error handling Flow: Frontend polls this endpoint every 2 seconds during processing
Request: GET /api/rag/documents/job/{job_id}
Response fields: job_id, status, document_id, progress, error, metadata, checkpoints, created_at, updated_at
Database Operations: SELECT FROM background_jobs WHERE id = ?
Critical Fields: β VERIFIED
metadata.chunks_created - Used by test validationmetadata.products_created - Used by test validationmetadata.images_extracted - Used by test validation (FIXED: was images_stored)Frontend Integration:
PDFUploadModal.tsx, ProcessingStatus.tsxPurpose: Get document chunks with pagination Used In: Knowledge Base viewer, Chunk inspector, Admin dashboard Flow: User views document β Fetch chunks β Display in UI
Request: GET /api/rag/chunks?document_id={uuid}&limit=100&offset=0
Response: chunks array (id, document_id, content, chunk_index, metadata, quality_score, created_at) and total count
Database Operations:
Frontend Integration:
KnowledgeBase.tsx, ChunkViewer.tsxPurpose: Get document images with analysis results Used In: Image gallery, Image inspector, Admin dashboard Flow: User views document β Fetch images β Display gallery
Request: GET /api/rag/images?document_id={uuid}&limit=100&offset=0
Response: images array (id, document_id, image_url, page_number, QWEN_analysis, clip_embedding, quality_score, created_at) and total count
Database Operations:
Frontend Integration:
ImageGallery.tsx, ImageViewer.tsxPurpose: Get products extracted from document Used In: Products tab, Product catalog, Materials page Flow: User views products β Fetch from database β Display cards
Request: GET /api/rag/products?document_id={uuid}&limit=100&offset=0
Response: products array (id, name, description, source_document_id, metadata, quality_score, created_at) and total count
Database Operations:
Frontend Integration:
ProductsTab.tsx, MaterialsPage.tsxPurpose: Get product-to-image relationships for validation and testing Used In: Test scripts, Admin dashboard, Relationship viewer Flow: Query relationships β Return product-image links with scores
Request: GET /api/rag/product-image-relationships?document_id={uuid}&limit=100&offset=0&min_score=0.0
Query Parameters:
document_id (optional) - Filter by document IDproduct_id (optional) - Filter by product IDlimit (optional) - Maximum results (default: 100, max: 1000)offset (optional) - Pagination offset (default: 0)min_score (optional) - Minimum relevance score (default: 0.0, range: 0.0-1.0)Response: document_id, product_id, relationships array (with product and image details), count, limit, offset, and statistics (total_relationships, by_relationship_type, min_score_filter)
Database Operations:
Frontend Integration:
Purpose: Get chunk-to-product relationships for validation and testing Used In: Test scripts, Admin dashboard, Content analysis Flow: Query relationships β Return chunk-product links
Request: GET /api/rag/chunk-product-relationships?document_id={uuid}&limit=100&offset=0
Query Parameters:
document_id (optional) - Filter by document IDproduct_id (optional) - Filter by product IDlimit (optional) - Maximum results (default: 100, max: 1000)offset (optional) - Pagination offset (default: 0)Response: document_id, product_id, relationships array (with chunk content and product name details), count, limit, offset
Database Operations:
Frontend Integration:
Purpose: Query documents using RAG (Retrieval-Augmented Generation) Used In: Main search interface, Q&A functionality Flow: User asks question β Semantic search β Retrieve relevant chunks β Generate answer with AI
Request fields: query, document_ids, top_k, model
Response fields: answer, sources (chunk_id, content, score, document_id), model_used
Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchInterface.tsx, QAModal.tsx
Purpose: Conversational interface for document Q&A with context Used In: Chat interface, conversational search Flow: User sends message β Maintain conversation history β Generate contextual response
Request fields: message, conversation_id, document_ids
Response fields: response, conversation_id, sources, model_used
Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: ChatInterface.tsx
Purpose: Semantic search across document collection Used In: Search page, knowledge base search Flow: User enters search term β Semantic/hybrid/keyword search β Return ranked results
Request fields: query, search_type, filters (document_ids, tags), top_k
Response: results array (chunk_id, content, score, metadata) and total
Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchPage.tsx, KnowledgeBase.tsx
Purpose: Advanced query search with query expansion and optimization Used In: Advanced search interface Flow: User query β Query expansion β Multi-strategy search β Ranked results
Request fields: query, expand_query, rerank, filters
Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: AdvancedSearch.tsx
Purpose: MMR (Maximal Marginal Relevance) search for diverse results Used In: Search with diversity requirements Flow: User query β Semantic search β MMR reranking β Diverse results
Request fields: query, lambda_param, top_k
Database Operations: SELECT FROM document_chunks, embeddings Frontend Integration: SearchPage.tsx (diversity mode)
Purpose: List and filter documents in collection Used In: Documents page, admin dashboard Flow: User views documents β Fetch with filters β Display list
Request: GET /api/rag/documents?limit=20&offset=0&search=harmony&tags=catalog
Response: documents array (id, title, filename, page_count, chunks_count, images_count, products_count, created_at) and total
Database Operations: SELECT FROM documents Frontend Integration: DocumentsPage.tsx, AdminDashboard.tsx
Purpose: Delete document and all associated data Used In: Document management, cleanup Flow: User deletes document β Remove from database β Delete from storage β Cleanup embeddings
Request: DELETE /api/rag/documents/{document_id}
Response: success and deleted counts (document, chunks, images, products, embeddings)
Database Operations:
Frontend Integration: DocumentsPage.tsx (delete button)
Purpose: Health check for RAG services Used In: Monitoring, admin dashboard Flow: System checks β Verify all services β Return status
Request: GET /api/rag/health
Response fields: status, services (rag, embeddings, vector_store, database), timestamp
Database Operations: None (service checks only) Frontend Integration: AdminDashboard.tsx (health monitor)
Purpose: Get RAG system statistics Used In: Admin dashboard, analytics Flow: Fetch system metrics β Calculate statistics β Return summary
Request: GET /api/rag/stats
Response fields: documents, chunks, images, products, embeddings, storage_used_mb, avg_processing_time
Database Operations: SELECT COUNT FROM documents, document_chunks, document_images, products, embeddings Frontend Integration: AdminDashboard.tsx (statistics panel)
Purpose: Get detailed AI model tracking for a job Used In: Job monitoring, AI usage analytics Flow: Fetch job β Get AI tracking data β Return model usage details
Request: GET /api/rag/job/{job_id}/ai-tracking
Response: job_id and models_used (per-model: calls, tokens, cost, stages) and total_cost
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx, AIUsagePanel.tsx
Purpose: Get AI tracking for specific model Used In: Model-specific analytics Flow: Fetch job β Filter by model β Return model-specific data
Request: GET /api/rag/job/{job_id}/ai-tracking/model/QWEN
Database Operations: SELECT FROM background_jobs Frontend Integration: AIUsagePanel.tsx (model filter)
Purpose: Get AI tracking for specific processing stage Used In: Stage-specific analytics Flow: Fetch job β Filter by stage β Return stage-specific AI usage
Request: GET /api/rag/job/{job_id}/ai-tracking/stage/image_analysis
Database Operations: SELECT FROM background_jobs Frontend Integration: StageMonitor.tsx
Purpose: Get all checkpoints for a job Used In: Job recovery, debugging Flow: Fetch job β Get checkpoint history β Return checkpoint data
Request: GET /api/rag/jobs/{job_id}/checkpoints
Response: checkpoints array (stage, progress, data, completed_at) and count
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (checkpoint viewer)
Purpose: Manually restart job from last checkpoint Used In: Job recovery, error handling Flow: User triggers restart β Load checkpoint β Resume processing
Request: POST /api/rag/jobs/{job_id}/restart
Response: success, job_id, resumed_from, progress
Database Operations:
Frontend Integration: JobMonitor.tsx (restart button)
Purpose: Resume job from last checkpoint (alias for restart) Used In: Job recovery Flow: Same as /jobs/{job_id}/restart
Database Operations: Same as restart endpoint Frontend Integration: JobMonitor.tsx
Purpose: List all background jobs with filtering Used In: Admin dashboard, job management Flow: Fetch jobs β Apply filters β Return paginated list
Request: GET /api/rag/documents/jobs?limit=20&offset=0&status=processing
Response: jobs array (id, document_id, filename, status, progress, created_at) and total
Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (jobs panel)
Purpose: Get complete document content with all AI analysis Used In: Document viewer, export functionality Flow: Fetch document β Get all related data β Return comprehensive content
Request: GET /api/rag/documents/documents/{document_id}/content?include_chunks=true&include_images=true&include_products=true
Response: document, chunks, images, products, embeddings
Database Operations:
Frontend Integration: DocumentViewer.tsx, ExportModal.tsx
Purpose: Upload and process document for RAG Used In: Simple document upload Flow: Upload β Process β Generate embeddings β Complete
Request: Multipart form-data with fields: file (PDF), title, chunk_size (default: 2048), chunk_overlap (default: 200)
Database Operations:
Frontend Integration: SimpleUploadForm.tsx
Base Path: /api/admin
Purpose: Administrative functions for system management
Used In: Admin dashboard, system configuration, job management
Purpose: List all jobs with filtering and pagination Used In: Admin dashboard jobs panel Flow: Admin views jobs β Apply filters β Display paginated list
Request: GET /api/admin/jobs?status=processing&limit=20&offset=0
Response: jobs array and total count
Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (jobs panel)
Purpose: Get comprehensive job statistics and metrics Used In: Admin dashboard analytics Flow: Fetch all jobs β Calculate metrics β Return statistics
Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (statistics panel)
Purpose: Get detailed status for specific job Used In: Job monitoring, debugging Flow: Fetch job by ID β Return full details
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx
Purpose: Alternative endpoint for job status Used In: Job monitoring (alternative path) Flow: Same as /jobs/{job_id}
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx
Purpose: Cancel a running job Used In: Job management, error recovery Flow: User cancels job β Update status β Stop processing
Database Operations: UPDATE background_jobs Frontend Integration: JobMonitor.tsx (cancel button)
Purpose: Process multiple documents in bulk Used In: Bulk upload, batch processing Flow: Upload multiple URLs β Queue jobs β Process in parallel
Database Operations: INSERT INTO documents, background_jobs Frontend Integration: BulkUploadModal.tsx
Purpose: Get comprehensive system health status Used In: Monitoring dashboard, health checks Flow: Check all services β Return health status
Database Operations: None (service checks only) Frontend Integration: AdminDashboard.tsx (health monitor)
Purpose: Get detailed system performance metrics Used In: Performance monitoring, analytics Flow: Collect metrics β Calculate statistics β Return data
Database Operations: SELECT FROM background_jobs, documents Frontend Integration: AdminDashboard.tsx (metrics panel)
Purpose: Clean up old data from system Used In: Data maintenance, storage management Flow: Find old data β Delete records β Return summary
Database Operations: DELETE FROM documents, document_chunks, document_images, products, embeddings Frontend Integration: AdminDashboard.tsx (cleanup button)
Purpose: Create backup of system data Used In: Data backup, disaster recovery Flow: Export data β Create backup file β Return download link
Database Operations: SELECT FROM all tables Frontend Integration: AdminDashboard.tsx (backup button)
Purpose: Export system data in various formats Used In: Data export, reporting Flow: Fetch data β Format (JSON/CSV) β Return file
Database Operations: SELECT FROM background_jobs, documents Frontend Integration: AdminDashboard.tsx (export button)
Purpose: Get status of all system packages and dependencies Used In: System diagnostics, dependency management Flow: Check installed packages β Return versions and status
Database Operations: None (system checks only) Frontend Integration: AdminDashboard.tsx (packages panel)
Purpose: Get detailed progress for specific job Used In: Real-time job monitoring Flow: Fetch job β Extract progress data β Return details
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (progress bar)
Purpose: Get progress for all active jobs Used In: Multi-job monitoring Flow: Fetch active jobs β Return progress summary
Database Operations: SELECT FROM background_jobs Frontend Integration: AdminDashboard.tsx (active jobs panel)
Purpose: Get page-by-page progress for job Used In: Detailed progress tracking Flow: Fetch job β Extract page progress β Return details
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (page progress)
Purpose: Stream real-time progress updates (SSE) Used In: Real-time monitoring Flow: Open SSE connection β Stream progress updates
Database Operations: SELECT FROM background_jobs Frontend Integration: JobMonitor.tsx (real-time updates)
Purpose: Test endpoint for enhanced product creation Used In: Testing, development Flow: Test product detection β Return results
Database Operations: SELECT/INSERT products Frontend Integration: Development tools
Purpose: Manually reprocess image with OCR Used In: Image reprocessing, error recovery Flow: Fetch image β Run OCR β Update database
Database Operations: UPDATE document_images, document_chunks Frontend Integration: ImageViewer.tsx (reprocess button)
Purpose: Detect metadata scope for text chunks (product-specific vs catalog-general) Used In: PDF processing pipeline, metadata classification Flow: Analyze chunk β Classify scope β Return scope with confidence
Request fields: chunk_content, product_names, document_context
Response: success and data with scope, confidence, reasoning, applies_to, extracted_metadata, is_override
Scope Types:
product_specific - Mentions specific product namecatalog_general_explicit - Explicitly says "all products"catalog_general_implicit - Metadata mentioned without product contextcategory_specific - Applies to product categoryDatabase Operations: None (AI-powered classification) Frontend Integration: Admin metadata management, PDF processing monitoring
Purpose: Apply metadata to products with scope-aware override logic Used In: PDF processing pipeline (Stage 4), metadata management Flow: Detect scope β Apply in order β Track overrides β Update database
Request fields: document_id, chunks_with_scope (array of chunk_id, content, scope, metadata, applies_to)
Response: success and data with products_updated, metadata_fields_applied, overrides_tracked, catalog_general_count, product_specific_count, processing_time_ms
Processing Order:
Database Operations:
_overrides arrayFrontend Integration: PDF processing pipeline, admin metadata management
Purpose: List metadata with filtering and pagination Used In: Admin metadata viewer, metadata analytics Flow: Query database β Filter β Paginate β Return results
Request: GET /api/rag/metadata/list?document_id=uuid&scope=catalog_general_implicit&limit=50&offset=0
Query Parameters:
document_id (optional) - Filter by documentproduct_id (optional) - Filter by productscope (optional) - Filter by scope typemetadata_key (optional) - Filter by specific metadata fieldlimit (optional) - Results per page (default: 50)offset (optional) - Pagination offset (default: 0)Response: success and data with items array (product_id, product_name, metadata_key, metadata_value, scope, source_chunk_id, is_override, created_at), total, limit, offset
Database Operations:
Frontend Integration: Admin metadata management page, metadata analytics dashboard
Purpose: Get metadata statistics and analytics Used In: Admin dashboard, metadata analytics Flow: Aggregate metadata β Calculate stats β Return summary
Request: GET /api/rag/metadata/statistics?document_id=uuid
Query Parameters:
document_id (optional) - Filter by documentproduct_id (optional) - Filter by productResponse: success and data with total_products, total_metadata_fields, catalog_general_count, product_specific_count, category_specific_count, override_count, most_common_fields, scope_distribution
Database Operations:
Frontend Integration: Admin dashboard, metadata analytics page
Base Path: /api/rag
Purpose: Unified search and query functionality across documents
Philosophy: Single search endpoint with strategy parameter instead of multiple separate endpoints
Purpose: CONSOLIDATED search endpoint for all 6 search strategies β Status: All strategies implemented (100% complete) Replaces:
/api/search/semantic (deprecated)/api/search/similarity (deprecated)/api/search/multimodal (deprecated)/api/unified-search (deprecated)/api/search/materials/visual (deprecated)Available Strategies:
| Strategy | Status | Use Case | Performance |
|---|---|---|---|
semantic |
β | Natural language queries | <150ms |
vector |
β | Exact similarity matching | <100ms |
multi_vector |
β | Text + visual understanding | <200ms |
hybrid |
β | Technical terms + semantics | <180ms |
material |
β | Property-based filtering | <50ms |
image |
β | Visual similarity | <150ms |
all |
β | All strategies combined | <800ms |
Request: POST /api/rag/search?strategy={strategy} with body fields:
query β search query textworkspace_idtop_ksimilarity_thresholdtext_weight, visual_weight, multimodal_weight (for multi_vector strategy)semantic_weight, keyword_weight (for hybrid strategy)material_filters β material_type, slip_resistance, finish (for material strategy)image_url or image_base64 (for image strategy)Response: query, enhanced_query, results array (id, name, description, relevance_score, metadata, score_breakdown for multi_vector, found_in_strategies for all), total_results, search_type, processing_time, strategies_executed, strategies_count
Usage Examples:
1. Semantic Search (Natural Language): POST to ?strategy=semantic with query and workspace_id
2. Multi-Vector Search (Text + Visual): POST to ?strategy=multi_vector with query, workspace_id, and optional weight overrides
3. Hybrid Search (Semantic + Keyword): POST to ?strategy=hybrid with query, workspace_id, semantic_weight, keyword_weight
4. Material Property Search: POST to ?strategy=material with query: "", workspace_id, and material_filters object
5. Image Search (Visual Similarity): POST to ?strategy=image with query: "", workspace_id, and image_url
Purpose: Search existing knowledge base without uploading a PDF Added: 2025-12-03 (v2.4.0) Used In: Knowledge base search, entity discovery, product search Flow: User searches β Multi-vector search across products/entities/chunks β Return unified results
Features:
Request fields:
query (required)workspace_id (required)search_types (optional) β ["products", "entities", "chunks", "images", "kb_docs"] (default: ["products","entities","chunks"])categories (optional) β ["product", "certificate", "logo", "specification", "general"]entity_types (optional) β ["certificate", "logo", "specification"]top_k (optional, default: 10)similarity_threshold (optional, default: 0.7)caller (optional) β "admin" | "agent" | "public" β controls KB category access gatingcategory_id (optional, added 2026-04) β restrict KB search to a single category UUIDcategory_slug (optional, added 2026-04) β restrict by slug, e.g. "pricing"price_doc_type (optional, added 2026-04) β filter to one of price_list | discount_rule | contract_terms | promotionResponse fields: query, total_results, products array, entities array, chunks array, images array, processing_time, search_metadata
KB-doc chunk shape (2026-04): each chunks[i] from kb_docs now includes category_slug, category_name, and price_doc_type alongside the existing fields.
Database Operations:
Frontend Integration:
Usage Examples:
search_types: ["products"]search_types: ["entities"], entity_types: ["certificate"]search_types: ["products", "chunks"], categories: ["product"]?strategy=all with queryDatabase Operations:
Frontend Integration: SearchPage.tsx, KnowledgeBase.tsx, ProductDiscovery.tsx
Related Documentation: Search Strategies Guide
Purpose: CONSOLIDATED query endpoint with auto-detecting modality Replaces: Multiple query endpoints with different modalities
Request fields: query, modality (auto | text | image | multimodal), limit, workspace_id
Response fields: success, answer, sources (chunk_id, content, relevance_score), modality_detected, processing_time_ms
All /api/pdf/extract/* endpoints have been removed as of November 7, 2025.
Removed Endpoints:
POST /api/pdf/extract/markdown β DELETEDPOST /api/pdf/extract/tables β DELETEDPOST /api/pdf/extract/images β DELETEDReplacement: Use POST /api/rag/documents/upload
The RAG endpoint provides identical functionality using the same PyMuPDF4LLM library. It accepts multipart/form-data with a file (PDF) and workspace_id and returns markdown, tables, images, and status.
Benefits of consolidation:
All /api/documents/* endpoints have been removed. Use /api/rag/* endpoints instead.
See Section 2 (RAG System) for current endpoints:
POST /api/rag/documents/uploadGET /api/rag/documentsGET /api/rag/documents/{id}DELETE /api/rag/documents/{id}POST /api/rag/queryGET /api/rag/searchSemantic Search β POST /api/search/semantic β body: query, workspace_id, limit, threshold β response: results array (id, title, score, content)
Vector Search β POST /api/search/vector β body: embedding (float array), workspace_id, limit, metric β response: results array (id, similarity_score)
Hybrid Search β POST /api/search/hybrid β body: query, embedding, workspace_id, limit, semantic_weight β response: results array
Visual Search β POST /api/search/visual β multipart: image file, workspace_id, limit β response: results array (id, similarity_score, image_url)
Material Search β POST /api/search/materials β body: query, filters (material_type, color, texture), limit β response: materials array
Search Recommendations β GET /api/search/recommendations β query params: query, workspace_id β response: suggestions array
Search Analytics β GET /api/analytics β query params: workspace_id, date_range β response: top_queries, search_volume, avg_response_time
Analyze Image β POST /api/images/analyze β multipart: image file, analysis_type β response: materials, colors, textures, quality_score
Batch Image Analysis β POST /api/images/analyze/batch β multipart: multiple images β response: results array (image_id, analysis)
Search Similar Images β POST /api/images/search β multipart: image file, limit β response: similar_images array
Upload & Analyze β POST /api/images/upload-and-analyze β multipart β response: image_id, url, analysis
Re-classify Image β¨ NEW β POST /api/images/reclassify/{image_id} β params: image_id, force_validation (optional boolean) β response: success, image_id, classification (is_material, confidence, reason, model), updated_data, message
Upload Document β POST /api/v1/rag/documents/upload β multipart: file, title, metadata β response: document_id, chunks_created, embeddings_generated
Query RAG β POST /api/v1/rag/query β body: query, workspace_id, top_k β response: results array (chunk_id, content, score)
Chat with RAG β POST /api/v1/rag/chat β body: message, conversation_id, workspace_id β response: response, sources array
Search RAG β POST /api/v1/rag/search β body: query, filters, limit β response: results array
List RAG Documents β GET /api/v1/rag/documents β query params: workspace_id, limit β response: documents array
RAG Health β GET /api/v1/rag/health β response: status, indices_count, memory_usage
RAG Statistics β GET /api/v1/rag/stats β response: document_count, chunk_count, embedding_count
Generate Embedding β POST /api/embeddings/generate β body: text β response: embedding (float array), dimension
Batch Embeddings β POST /api/embeddings/batch β body: texts (string array) β response: embeddings (array of float arrays)
CLIP Embeddings β POST /api/embeddings/clip-generate β multipart: image file, embedding_type β response: embedding, type, dimension
Create Product β POST /api/products β body: name, description, metafields, images, chunks β response: product_id, created_at
Get Product β GET /api/products/{id} β response: id, name, description, metafields, images, chunks
Update Product β PATCH /api/products/{id} β body: name, description, metafields β response: success, updated_at
Delete Product β DELETE /api/products/{id} β response: success
List Products β GET /api/products β query params: workspace_id, limit, offset β response: products array, total_count
Find Similar Products β GET /api/products/{id}/similar β query params: limit β response: similar_products array
Get Job Progress β GET /api/admin/jobs/{id}/progress β response: job_id, status, progress_percent, current_stage
Get Page Progress β GET /api/admin/jobs/{id}/progress/pages β response: pages array (page_number, status, progress)
Stream Progress β GET /api/admin/jobs/{id}/progress/stream β response: Server-Sent Events
Get Chunk Quality β GET /api/admin/chunks/quality β query params: workspace_id β response: chunks array (id, quality_score, status)
AI Metrics β GET /api/admin/ai-metrics β response: models_used, total_tokens, cost_estimate, processing_time
System Health β GET /health β response: status, uptime, database, api_latency
Performance Metrics β GET /metrics β response: requests_per_second, avg_latency, error_rate
Performance Summary β GET /performance/summary β response: summary_stats
Base Path: /api/document-entities
Purpose: Manage document entities (certificates, logos, specifications) as separate knowledge base
Used In: Docs Admin Page, Agentic queries, Product-document relationships
Architecture: Document entities are stored separately from products and linked via relationships
Purpose: Get all document entities for a workspace with filtering Used In: Docs Admin Page, Agentic queries Flow: Query entities β Apply filters β Return paginated results
Request: GET /api/document-entities/?workspace_id={uuid}&entity_type=certificate&factory_name=CastellΓ³n Factory&limit=100&offset=0
Query Parameters:
workspace_id β UUID (required)entity_type β certificate | logo | specification | marketing | bank_statement (optional)factory_name β Filter by factory name (optional)factory_group β Filter by factory group (optional)limit β Maximum results (default: 100)offset β Pagination offset (default: 0)Response: Array of entities with fields: id, entity_type, name, description, page_range, factory_name, factory_group, manufacturer, metadata, created_at
Database Operations:
Frontend Integration: DocsManagement.tsx (Docs Admin Page)
Agentic Query Examples:
?entity_type=certificate&factory_name=CastellΓ³n Factory?entity_type=logo&factory_group=Harmony GroupPurpose: Get a specific document entity by ID Used In: Entity detail view, relationship management Flow: Fetch entity by ID β Return entity details
Request: GET /api/document-entities/{entity_id}
Response fields: id, entity_type, name, description, page_range, factory_name, metadata, created_at
Database Operations:
Frontend Integration: EntityDetailModal.tsx
Purpose: Get all document entities linked to a specific product Used In: Product detail page, agentic queries Flow: Fetch product relationships β Get linked entities β Return entities
Request: GET /api/document-entities/product/{product_id}?entity_type=certificate
Query Parameters:
entity_type β Filter by entity type (optional)Response: Array of entities with fields: entity_type, name, description, page_range, factory_name, metadata
Database Operations:
Frontend Integration: ProductDetailPage.tsx
Agentic Query Example:
/product/{nova_id}?entity_type=certificatePurpose: Get all document entities for a specific factory Used In: Factory-specific queries, compliance reports Flow: Query by factory name β Filter by entity type β Return entities
Request: GET /api/document-entities/factory/CastellΓ³n Factory?entity_type=certificate
Query Parameters:
entity_type β Filter by entity type (optional)Response: Array of entities with fields: entity_type, name, factory_name, factory_group, metadata
Database Operations:
Frontend Integration: FactoryComplianceReport.tsx
Agentic Query Example:
/factory/CastellΓ³n Factory?entity_type=certificatePurpose: Get all product-document relationships for a product Used In: Relationship management, linking visualization Flow: Fetch relationships β Return relationship details with scores
Request: GET /api/document-entities/relationships/product/{product_id}
Response: Array of relationships with fields: id, product_id, document_entity_id, relationship_type, relevance_score, metadata (linking_method, confidence), created_at
Database Operations:
Frontend Integration: RelationshipViewer.tsx
All endpoints require one of:
Authorization: Bearer {supabase_jwt_token}Authorization: Bearer {mivaa_jwt_token}X-API-Key: {api_key}All endpoints return JSON with fields: success (boolean), data (object), error (null or string), timestamp.
Purpose: Detect and merge duplicate products from the same factory/manufacturer
CRITICAL RULE: Duplicates are ONLY detected when products have the same factory/manufacturer in metadata. Visual similarity alone does NOT constitute a duplicate.
Purpose: Detect potential duplicates for a specific product
Request fields: product_id, workspace_id, similarity_threshold
Response: success, product_id, duplicates_found, duplicates array (product_id, name, factory, overall_similarity, confidence_level)
Purpose: Scan entire workspace for duplicate products
Request fields: workspace_id, similarity_threshold, limit
Response: success, workspace_id, duplicate_pairs_found, duplicate_pairs array
Purpose: Get cached duplicate detections
Query Parameters:
workspace_id (required)status (optional): 'pending', 'reviewed', 'merged', 'dismissed'min_similarity (optional): default 0.60Response: success, workspace_id, cached_duplicates, duplicates array
Purpose: Update duplicate detection status
Request fields: cache_id, status, user_id
Valid Statuses: 'pending', 'reviewed', 'merged', 'dismissed'
Purpose: Merge duplicate products into a single product
Request fields: target_product_id, source_product_ids, workspace_id, user_id, merge_strategy, merge_reason
Response: success, history_id, target_product, merged_count, message
Purpose: Undo a product merge operation
Request fields: history_id, user_id
Response: success, message, restored_products
Purpose: Get merge history for a workspace
Query Parameters:
workspace_id (required)limit (optional): default 50Response: success, workspace_id, merge_count, merges array
Category: Data Import (XML, Web Scraping) Total Endpoints: 4 Status: β Phase 1 & 2 Complete (XML Import with Dynamic Mapping & Backend Processing)
Purpose: Start processing an import job (called by Edge Function)
Request fields: job_id, workspace_id
Response: success, message, job_id
Features:
Database Operations:
data_import_jobs status to 'processing'data_import_historyproducts tabledocument_images tablechunks tablePurpose: Get import job status and progress
Path Parameters:
job_id (required): Import job IDResponse fields: job_id, status, import_type, source_name, total_products, processed_products, failed_products, progress_percentage, current_stage, started_at, completed_at, error_message, estimated_time_remaining
Status Values:
pending - Job created, waiting to startprocessing - Job is being processedcompleted - Job completed successfullyfailed - Job failed with errorsDatabase Operations:
data_import_jobs tablePurpose: Get import history for a workspace with pagination and filters
Query Parameters:
workspace_id (required)page (optional, default: 1)page_size (optional, default: 20)status (optional): pending, processing, completed, failedimport_type (optional): xml, web_scrapingResponse: imports array (job_id, import_type, source_name, status, total_products, processed_products, failed_products, created_at, completed_at, is_scheduled, next_run_at), total_count, page, page_size
Database Operations:
data_import_jobs table with filterscreated_at DESCPurpose: Health check for data import API
Response fields: status, service, version, features (xml_import, web_scraping, batch_processing, concurrent_image_downloads, checkpoint_recovery, real_time_progress)
Features Status:
xml_import - XML import with dynamic field mappingweb_scraping - Firecrawl integration (Phase 4)batch_processing - Process 10 products at a timeconcurrent_image_downloads - Download 5 images in parallelcheckpoint_recovery - Resume from last successful batchreal_time_progress - Real-time progress updates in databasePurpose: Parse XML, detect fields, suggest mappings, create import jobs
Hosted: Supabase Edge Function (Deno)
Request fields: workspace_id, category, xml_content (base64 encoded), preview_only, field_mappings, mapping_template_id, parent_job_id
Response (Preview Mode): success, detected_fields array (xml_field, suggested_mapping, confidence, sample_values), total_products
Response (Import Mode): success, job_id, total_products
Features:
Database Operations:
data_import_jobs tablePurpose: Run scheduled imports via Supabase Cron
Hosted: Supabase Edge Function (Deno)
Trigger: Supabase Cron (every 15 minutes)
Features:
next_run_at timestampsparent_job_idDatabase Operations:
data_import_jobs for scheduled importslast_run_at and next_run_at timestampsTotal Endpoints: 119 (115 + 4 Data Import) Last Updated: November 10, 2025
See Also:
Base Path: Supabase Edge Function /functions/v1/messaging-api
Purpose: Multi-channel messaging via Twilio (SMS, WhatsApp)
Provider: Twilio - Single API for all channels
Philosophy: Unified messaging API with templates, campaigns, analytics, and compliance
Required secrets: TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN
Purpose: Send a single message via SMS or WhatsApp Used In: Test messages, transactional notifications, OTP delivery
Request fields: action: "send", channel (sms | whatsapp), to (phone number), content, from (optional), messageType (transactional | marketing | otp | notification), variables (for template rendering), templateSlug (optional), mediaUrl (optional), tags (optional), whatsappContentSid (optional, for WhatsApp pre-approved templates)
Response: success, messageId, logId
Twilio API Endpoints Used:
POST /2010-04-01/Accounts/{AccountSid}/Messages.jsonPOST /2010-04-01/Accounts/{AccountSid}/Messages.json (with whatsapp: prefix)Purpose: Send messages to multiple recipients in bulk Used In: Marketing campaigns, mass notifications
Request fields: action: "send-bulk", channel, recipients (array of {to, variables}), content, templateSlug (optional), messageType, from (optional)
Response: success, bulkId, total, sent, failed, optedOut, results array (to, status, messageId or error)
Purpose: List all configured messaging channels Used In: Channel management UI, channel selection dropdowns
Request fields: action: "channels", channelType (optional filter: sms | whatsapp)
Response: success, channels array (id, channel_type, provider, sender_id, display_name, is_active, is_default, daily_quota, max_send_rate, config)
Database Table: messaging_channels
Purpose: List all messaging templates Used In: Template management UI, campaign creation
Request fields: action: "templates", channelType (optional filter)
Response: success, templates array (id, name, slug, channel_type, content, variables, category, whatsapp_template_name, is_approved, is_active)
Database Table: messaging_templates
Purpose: Get message delivery logs with filtering Used In: Message logs tab, delivery tracking, debugging
Request fields: action: "logs", channelType (optional), status (optional: queued | sent | delivered | read | failed | rejected), messageType (optional), limit
Response: success, logs array (id, channel_type, provider_message_id, from_number, to_number, content, status, sent_at, delivered_at, cost, currency)
Database Table: messaging_logs
Purpose: Get aggregated messaging analytics Used In: Analytics dashboard, reporting
Request fields: action: "analytics", channelType (optional), dateRange (start, end)
Response: success, totalSent, totalDelivered, totalRead (WhatsApp only), totalFailed, totalCost, deliveryRate, readRate, failureRate, dailyData array
Database Table: messaging_analytics
Purpose: Get Twilio account balance Used In: Header balance display, billing monitoring
Request fields: action: "balance"
Response: success, balance, currency
Twilio API Endpoint: GET /2010-04-01/Accounts/{AccountSid}/Balance.json
Purpose: Sync senders/numbers from Twilio account to local database Used In: Channel sync button, initial setup
Request fields: action: "sync-senders", autoImport (boolean β set to true to automatically import to database)
Response: success, senders (sms and whatsapp arrays with sender_id, display_name, status), total, imported
Twilio API Endpoints Used:
GET /2010-04-01/Accounts/{AccountSid}/IncomingPhoneNumbers.jsonPurpose: Fetch WhatsApp templates from Twilio Content API Used In: WhatsApp template selection, template sync
Request fields: action: "whatsapp-templates"
Response: success, templates array (sid, friendly_name, language, types with body)
Twilio API Endpoint: GET /v1/Content (Twilio Content API)
Purpose: Send a test message for a campaign Used In: Campaign testing, preview verification
Request fields: action: "send-test", campaignId, testNumber
Response: success, messageId
Base URL: https://api.twilio.com
Authentication: HTTP Basic Auth (Account SID + Auth Token)
| Endpoint | Method | Purpose |
|---|---|---|
/2010-04-01/Accounts/{AccountSid}/Messages.json |
POST | Send SMS/WhatsApp messages |
/2010-04-01/Accounts/{AccountSid}/IncomingPhoneNumbers.json |
GET | List phone numbers |
/2010-04-01/Accounts/{AccountSid}/Balance.json |
GET | Get account balance |
/v1/Content |
GET | List WhatsApp content templates (Content API) |
Documentation: https://www.twilio.com/docs/messaging/api
The messaging system uses the following tables:
messaging_channels β Stores SMS and WhatsApp sender configurations (channel_type, provider, sender_id, display_name, is_active, is_default, config JSONB, daily_quota, max_send_rate)messaging_templates β Message templates with variables (name, slug, channel_type, content, variables array, category, whatsapp_content_sid, is_approved, is_active)messaging_logs β Per-message delivery records (channel_type, provider_message_id, from/to numbers, content, status, timestamps, cost, currency)messaging_analytics β Daily aggregated analytics per channel (date, channel_type, total_sent, total_delivered, total_read, total_failed, total_cost)messaging_optouts β Compliance opt-out records (phone_number, channel_type, source, opted_out_at)Total Endpoints: 140+ Latest Version: v2.6.0 Last Updated: January 2026
New in v2.6.0:
New in v2.5.0:
Key Features:
/docs and /redoc