Knowledge Base & Documentation System
📋 Overview
The Knowledge Base & Documentation System provides a comprehensive solution for managing product documentation, technical guides, and knowledge articles with AI-powered semantic search and intelligent organization.
Database Schema
Tables Created (6 total)
kb_docs - Main documents table
- Embeddings support (1024D vector with ivfflat index)
- Embedding metadata (model, timestamp, status, error tracking)
- Content fields (title, content, markdown, summary)
- Status & visibility control (draft/published/archived, public/private/workspace)
- View tracking and engagement metrics
price_doc_type (2026-04) — optional enum (price_list | discount_rule | contract_terms | promotion) for docs filed under the Pricing category; drives how the price_lookup agent tool combines documents
- RLS policies for workspace isolation
kb_categories - Category hierarchy
- Parent/child relationships for nested categories
- Color coding and icons for visual organization
- Workspace isolation with RLS
- Sort order for custom arrangement
kb_doc_attachments - Product/material links
- Multi-product linking (1 doc → many products)
- Relationship types (primary, supplementary, related, certification, specification)
- Relevance scoring (1-5 scale)
- Workspace isolation
kb_doc_versions - Version history
- Track all changes with timestamps
- Change type and description
- Changed fields tracking
- Immutable (no updates, only inserts)
- Creator tracking
kb_doc_comments - Comments & suggestions
- Section-level feedback
- Threading support (parent/child comments)
- @mentions support (mentioned_users array)
- Status tracking (open, resolved, archived)
- Workspace isolation
kb_search_analytics - Search tracking
- Query tracking with search type
- Click tracking (which document was clicked)
- Performance metrics (search_time_ms)
- Immutable (no updates, only inserts)
- User tracking
Indexes Created
- Vector Search: ivfflat index on
kb_docs.text_embedding for fast similarity search
- Workspace Isolation: Indexes on all
workspace_id columns
- Category Hierarchy: Index on
parent_category_id
- Document Relationships: Indexes on
document_id and product_id
- User Tracking: Indexes on
created_by and user_id
- Performance: Indexes on
created_at for time-based queries
RLS Policies
- Workspace Isolation: Users only see data from their workspace
- Creator-Based Access: Users can edit their own documents
- Admin Override: Admins can manage all documents in their workspace
- Immutable Records: Versions and analytics cannot be updated (only inserted)
- Category Management: Only admins can create/update/delete categories
Backend API Endpoints
API Routes Created (16+ endpoints)
Base Path: /api/kb
Document Management (5 endpoints)
POST /api/kb/documents - Create document
- Automatic embedding generation (1024D)
- Smart embedding status tracking
- Error handling with retry support
- Returns: Document with embedding status
GET /api/kb/documents/{doc_id} - Get document
- Retrieve single document by ID
- Returns: Full document with metadata
PATCH /api/kb/documents/{doc_id} - Update document
- Smart content change detection
- Regenerates embedding ONLY if content changed
- Skips embedding if only metadata changed
- Returns: Updated document with embedding status
DELETE /api/kb/documents/{doc_id} - Delete document
- Cascading delete (removes attachments, versions, comments)
- Returns: 204 No Content
POST /api/kb/documents/from-pdf - Create from PDF
- Extract text using PyMuPDF (text only, no chunking)
- Automatic embedding generation
- Returns: Document with extracted text
Search (1 endpoint)
POST /api/kb/search - Search documents
- Semantic Search: Vector similarity using pgvector cosine distance
- Generates embedding for search query using Voyage AI voyage-3.5 (updated 2026-04)
- Compares against stored document embeddings using
<=> operator
- Returns results with similarity scores (0.0 - 1.0)
- Minimum threshold: 0.5 (configurable)
- Full-Text Search: ILIKE-based keyword matching
- Searches title and content fields
- Case-insensitive matching
- Hybrid Search: Combination of semantic + full-text
- Weighted scoring for best results
- Category filtering (optional)
- Pagination support (default: 20 results)
- Returns: Results with search time metrics (ms)
The request body takes workspace_id, query, search_type (semantic, full_text, or hybrid), and optional limit. Additional filters added 2026-04: category_id, category_slug (e.g. "pricing"), price_doc_type (price_list | discount_rule | contract_terms | promotion), allowed_access_levels, require_published (default false for admin management). The response includes results with category_slug, category_name, price_doc_type, and similarity, plus search_time_ms and total_results.
Architecture:
- Frontend → MIVAA API
/api/kb/search
- MIVAA generates query embedding (Voyage AI)
- MIVAA calls Supabase
kb_match_docs() RPC function (unified 2026-04; accepts match_category_id, match_category_slug, match_price_doc_type, require_published)
- Supabase performs vector similarity search using pgvector
- Returns ranked results with similarity scores
See also: Pricing API for the admin-only flow that ingests docs under a "Pricing" category with price_doc_type sub-types and retrieves them via either the price_lookup agent tool (AI reasoning mode) or search_knowledge_base gateway action (quick-pick direct mode).
Categories (2 endpoints)
POST /api/kb/categories - Create category
- Hierarchical support (parent/child)
- Color and icon customization
- Returns: Created category
GET /api/kb/categories - List categories
- Workspace filtering
- Ordered by sort_order
- Returns: All categories for workspace
Product Attachments (3 endpoints)
POST /api/kb/attachments - Attach document to product
- Link document to 1+ products
- Relationship type specification
- Relevance scoring (1-5)
- Returns: Attachment record
GET /api/kb/documents/{doc_id}/attachments - Get document attachments
- List all products linked to document
- Returns: Array of attachments
GET /api/kb/products/{product_id}/documents - Get product documents
- List all documents linked to product
- Returns: Array of documents
Health Check (1 endpoint)
- GET
/api/kb/health - Health check
- Service status
- Feature availability
- Endpoint listing
- Returns: Health status
🔄 Embedding Generation Lifecycle
When Embeddings Are Generated
CREATE Document
- User creates new doc → Backend generates embedding (1024D)
- Sync operation (happens immediately)
- Status:
pending → success or failed
PDF Upload
- User uploads PDF → Extract text → Generate embedding
- Sync operation
- Status tracked in database
EDIT/MODIFY Document (Smart Detection)
- User edits content → Check if content changed
- IF content changed: Generate NEW embedding
- IF only metadata changed: Skip embedding
- Content fields that trigger re-embedding:
title, content, summary, seo_keywords, category_id
- Metadata fields that DON'T trigger re-embedding:
status, visibility, view_count, timestamps
SEARCH
- User searches → Generate query embedding
- Perform vector similarity search
- Returns top N results
Embedding Metadata Tracking
Stored in kb_docs table:
text_embedding - The 1024D vector
embedding_model - 'voyage-3.5' (updated 2026-04)
embedding_generated_at - Timestamp
embedding_status - 'pending', 'success', 'failed'
embedding_error_message - Error details if failed
Error Handling
- If embedding generation fails → Document saved WITHOUT embedding
- Embedding status set to
failed
- Error message stored in
embedding_error_message
- Frontend can provide "Retry Embedding" button
- Admin can regenerate all embeddings via batch endpoint (future)
📊 API Response Formats
Success responses include document fields such as id, workspace_id, title, content, embedding_status, embedding_generated_at, created_at, and view_count. Error responses include a detail message and status_code. Search responses include success, results, total_count, search_time_ms, and search_type.
Implementation Files
Backend Files
mivaa-pdf-extractor/app/api/knowledge_base.py - API endpoints (605 lines)
mivaa-pdf-extractor/app/main.py - Router registration
Database
- 6 tables created via Supabase MCP
- 15+ indexes created
- RLS policies enabled on all tables
Documentation
docs/knowledge-base-implementation.md - This file
Key Features
- Automatic Embedding Generation - Text embeddings (1024D) for semantic search
- Smart Content Detection - Only regenerate embeddings when content changes
- PDF Text Extraction - PyMuPDF integration for text-only extraction
- Semantic Search - Vector similarity search using embeddings
- Product Attachment - Link documents to multiple products
- Category Hierarchy - Parent/child category relationships
- Version History - Track all document changes
- Comments System - Section-level feedback with threading
- Search Analytics - Track queries and clicks
- Workspace Isolation - RLS policies for multi-tenant security
📈 Metrics
- Database Tables: 6 created
- API Endpoints: 15+ created
- Indexes: 15+ created
- RLS Policies: 24 created
- Lines of Code: 605 (backend API)
- Embedding Dimension: 1024D
- Search Types: 3 (semantic, full-text, hybrid)
- Relationship Types: 5 (primary, supplementary, related, certification, specification)
🔧 Technical Stack
- Backend: FastAPI (Python)
- Database: Supabase (PostgreSQL)
- Embeddings: Voyage AI voyage-3.5 (1024D, updated 2026-04)
- PDF Extraction: PyMuPDF (fitz)
- Vector Search: pgvector with ivfflat index
- Security: Row Level Security (RLS)
- Error Tracking: Sentry
Frontend Components
Components (6 total)
KnowledgeBaseManagement.tsx - Main admin page
- Tabbed interface (Documents, Search, Categories, Product Links, Analytics)
- Stats dashboard with real-time metrics
- Integrated with GlobalAdminHeader for consistent UI
- Route:
/admin/knowledge-base
DocumentList.tsx - Document management
- Table view with status, embedding status, views, created date
- Status filter (all, draft, published, archived)
- Search filtering by title/content
- Edit and delete actions
- Direct Supabase queries for performance
DocumentEditor.tsx - Document creation/editing
- Modal dialog with full-screen editing
- Title, content, summary, category selection
- PDF upload with automatic text extraction
- Edit/Preview tabs for content
- Status and visibility controls
- Smart embedding generation on save
CategoryManager.tsx - Category management
- Table view with icon, name, description, document count
- Create category dialog
- Color picker and icon selector
- Edit and delete actions
SearchInterface.tsx - Semantic search
- Search type selector (semantic, full-text, hybrid)
- Real-time search with performance metrics
- Results display with similarity scores
- AI indexed badge for documents with embeddings
ProductAttachments.tsx - Product linking
- Link documents to products
- Relationship type selection (primary, supplementary, related, certification, specification)
- Relevance scoring (1-5 stars)
- Table view with product name, relationship, relevance
Service Layer
knowledgeBaseService.ts - API integration service
- Singleton pattern for consistent API access
- All 13 Knowledge Base endpoints integrated
- MIVAA Gateway routing via Supabase Edge Functions
- TypeScript interfaces for type safety
- Error handling and toast notifications
Integration Points
App.tsx - Route registration
- Updated
/admin/knowledge-base route to use new component
- Removed old MaterialKnowledgeBase import
- Added AuthGuard and AdminGuard protection
AdminDashboard.tsx - Navigation link
- Updated "PDF Knowledge Base" to "Knowledge Base & Documentation"
- Updated description to reflect new features
- Badge shows "NEW v2.3.0"
MIVAA Gateway - API routing
- 13 Knowledge Base endpoints registered
- Proper path and method mapping
- Version updated to v2.3.0
UI/UX Features
- Consistent admin header with breadcrumbs
- Glass morphism design matching platform style
- Real-time stats dashboard
- Toast notifications for user feedback
- Loading states and error handling
- Responsive design
- Badge indicators for status and embedding state
- Icon-based navigation
- Color-coded categories
- Star rating for relevance scores
System Metrics
- Database Tables: 6 created
- API Endpoints: 15+ created
- Frontend Components: 6 created
- Service Layer: 1 service with 13 methods
- Indexes: 15+ created
- RLS Policies: 24 created
- Lines of Code: 605 (backend) + 1,200+ (frontend)
- Embedding Dimension: 1024D
- Search Types: 3 (semantic, full-text, hybrid)
- Relationship Types: 5 (primary, supplementary, related, certification, specification)