Material Kai Vision Platform - Documentation
AI-Powered Material Intelligence System
Production-grade platform serving 5,000+ users with 99.5%+ uptime. Transforms material catalog PDFs into searchable knowledge using 12 AI models across a 14-stage processing pipeline.
📚 Documentation
🎯 Start Here
INDEX.md - Complete documentation index with learning paths
overview.md - Platform overview and key features
📖 Core Documentation
overview.md - Complete platform overview
- Executive summary with key metrics
- Architecture overview
- AI models integration (8 models across 4 providers)
- 14-stage PDF processing pipeline
- Multi-modal search capabilities
- Database architecture
- Production metrics
system-architecture.md - System architecture & design
- Three-tier architecture
- Hybrid architecture pattern
- Technology stack
- Authentication & security
- API endpoints (115)
- Scalability & monitoring
duplicate-detection-merging.md - Duplicate detection & merging
- Factory-based duplicate detection
- Product merging with undo capability
- Similarity scoring algorithm
- Merge history tracking
- 7 API endpoints
- Database schema
data-import-system.md - Data import system ✨ NEW
- XML import with AI-powered field mapping
- Dynamic field mapping (Claude Sonnet 4.5)
- Batch processing (10 products at a time)
- Concurrent image downloads (5 parallel)
- Cron-based scheduling for recurring imports
- Manual re-run functionality
- Checkpoint recovery
- Real-time progress tracking
- 4 API endpoints
- Phase 1 & 2 complete
ai-models-guide.md - AI models reference
- 12+ AI models across 5 providers
- Anthropic: Claude Sonnet 4.5, Claude Haiku 4.5
- OpenAI: GPT-4o, GPT-4o-mini
- Voyage AI: voyage-3.5 (primary text + understanding embeddings, 1024D)
- HuggingFace Endpoint: Qwen3-VL 32B Vision + SigLIP2 (768D visual embeddings)
- WorldLabs Marble: 3D Gaussian Splat generation
- Model usage by stage
- Cost optimization
agent-system.md - AI Agent system ✨ NEW
- Database-driven agent prompts
- 3 specialized agents (PDF Processor, Search, Product)
- Admin UI for prompt management
- LangChain.js tool orchestration
- Real-time updates (no deployment)
- Role-based access control
- Best practices & troubleshooting
search-strategies.md - Search system guide
- 6 search strategies (100% complete)
- Semantic, Vector, Multi-Vector, Hybrid, Material, Image
- All strategies combined mode
- Performance metrics (<800ms for all)
- Database schema and indexes
- Usage examples and best practices
comprehensive-metadata-fields-guide.md - Comprehensive metadata fields guide ✨ NEW
- 200+ metadata fields across 9 categories
- Material Properties, Dimensions, Appearance, Performance
- Application, Compliance, Design, Manufacturing, Commercial
- AI-powered dynamic extraction (Claude/GPT)
- Complete field reference with examples
- API usage and frontend display
- Step-by-step extraction process
- Confidence scoring system
- Best practices and troubleshooting
pdf-processing-pipeline.md - PDF processing pipeline
- 14-stage pipeline breakdown
- Products + Metadata extraction (inseparable)
- Document entities (certificates, logos, specs)
- Stage-by-stage details
- Checkpoint recovery (9 checkpoints)
- Performance metrics
- API endpoint
api-endpoints.md - API reference
- 119 endpoints across 16 categories
- RAG Routes (27)
- Admin Routes (18)
- Search Routes (6)
- Document Entities Routes (5)
- Products Routes (3)
- Images Routes (5)
- Embeddings Routes (3)
- AI Services Routes (10)
- Background Jobs (7)
- HuggingFace/Qwen Routes (3)
- Anthropic Routes (3)
- Monitoring Routes (3)
- AI Metrics Routes (2)
- Duplicate Detection Routes (7)
- Data Import Routes (4) ✨ NEW
database-schema-complete.md - Database schema
- Core tables (products, chunks, images, document_entities)
- Products + Metadata architecture (JSONB)
- Document entities (certificates, logos, specifications)
- Product-document relationships
- Relationship tables with relevance scores
- Row-Level Security (RLS)
- Indexes & performance
- Storage capacity
- Backup & recovery
relevancy-system.md - Relevancy & entity linking
- Chunk → Product relationships
- Product → Image relationships
- Chunk → Image relationships
- Relevance scoring algorithms (3 algorithms)
- Relationship types and priorities
- Implementation details
- Best practices
job-queue-system.md - Job queue & async processing
- Supabase-native job queue
- Checkpoint-based recovery
- Auto-recovery for stuck jobs
- Real-time progress tracking
- Priority queuing
- Health monitoring
monitoring-analytics-system.md - Monitoring & analytics ✨ NEW
- Real-time PDF processing monitor
- Comprehensive analytics dashboard
- AI model usage and cost tracking
- Search analytics and query patterns
- Agent chat analytics
- System performance metrics
- Sentry integration
- 9 checkpoint stages with full metrics
- Real-time updates via Supabase subscriptions
email-system.md - Email system with Amazon SES
- Domain verification and management
- React Email template system
- Delivery tracking and analytics
- Bounce and complaint handling
- SNS webhook integration
- Admin dashboard at /admin/emails
- Complete API reference
campaign-system.md - Email campaign management ✨ NEW
- Bulk email campaigns with targeting
- Recipient tracking and analytics
- Campaign scheduling and automation
- Template personalization
- Real-time delivery monitoring
- Admin interface for campaign management
- Complete workflow documentation
quotes-system-architecture.md - Quotes management system
- Multiple independent quotes per user
- Custom requests and product quotes
- Status tags and timeline tracking
- Upsells/extras management
- Quote acceptance workflow
- Admin management interface
features-guide.md - Platform features
- Intelligent PDF processing
- Multi-modal search
- Materials catalog
- Product management
- Admin dashboard
- RAG system
- Real-time monitoring
- Metadata management
- Image management
- Workspace isolation
- Batch processing
- Security features
deployment-guide.md - Production deployment
- Deployment architecture
- Frontend (Vercel)
- Backend (Self-hosted)
- Database (Supabase)
- CI/CD pipeline
- Secrets management
- Monitoring & alerts
- Rollback procedures
supabase-types-automation.md - Supabase types automation
- Automated TypeScript type generation
- GitHub Actions integration
- Weekly scheduled updates
- Type validation scripts
- Manual generation commands
- Setup instructions
troubleshooting-guide.md - Common issues & solutions
- Critical issues (API down, database, OOM)
- Common issues (PDF processing, search, latency, auth)
- Performance optimization
- Support resources
product-discovery-architecture.md - Product discovery system
- Products + Metadata architecture (inseparable)
- Document entities (certificates, logos, specifications)
- Factory/group identification for agentic queries
- Product-document relationships
- API endpoints for entity management
- Future extensibility (marketing, bank statements)
metadata-management-system.md - Metadata management system
- Dynamic metadata extraction (250+ attributes)
- Scope detection (product-specific vs catalog-general)
- Implicit catalog-general metadata detection
- Override logic and processing order
- Metadata API endpoints
- Integration with PDF processing pipeline
metadata-normalization-system.md - Metadata normalization system ✨ NEW
- Two-layer normalization architecture (prevention + correction)
- Semantic similarity-based field standardization (60% threshold)
- Automatic consolidation (individual fields → objects, single → arrays)
- Integrated into extraction pipeline
- Migration script for existing products
- 95%+ field standardization accuracy
prompt-enhancement-system.md - Dynamic prompt system ✨ NEW
- Database-driven extraction prompts (extraction_prompts table)
- Custom vs default prompt priority (is_custom flag)
- Version control and workspace isolation
- 4 stages: discovery, chunking, image_analysis, entity_creation
- 4 categories: products, certificates, logos, specifications
- Automatic placeholder replacement
- Phase 1 complete (metadata extraction), Phases 2-4 pending
🎓 Learning Paths
For New Developers
- overview.md - Understand the platform
- system-architecture.md - Learn the architecture
- pdf-processing-pipeline.md - Understand the pipeline
- agent-system.md - Learn the AI agent system
- job-queue-system.md - Learn async job processing
- api-endpoints.md - Learn the API
- deployment-guide.md - Understand deployment
For API Integration
- api-endpoints.md - All endpoints
- ai-models-guide.md - AI models used
- database-schema-complete.md - Data structure
- system-architecture.md - Authentication
For Operations
- deployment-guide.md - Deployment process
- job-queue-system.md - Job monitoring & recovery
- troubleshooting-guide.md - Common issues
- system-architecture.md - Monitoring
- database-schema-complete.md - Backup strategy
For Product Managers
- overview.md - Platform overview
- features-guide.md - All features
- pdf-processing-pipeline.md - Processing pipeline
- ai-models-guide.md - AI capabilities
📊 Quick Reference
Key Numbers
- 5,000+ users in production
- 99.5%+ uptime SLA
- 8 AI models across 4 providers
- 14 processing pipeline stages
- 108 API endpoints (14 categories)
- 6 embedding types
- 200+ metafield types
- 95%+ product detection accuracy
- 85%+ search relevance
- 90%+ material recognition accuracy
Technology Stack
- Frontend: React 18, TypeScript, Vite, Shadcn/ui, Vercel
- Backend: FastAPI, Python 3.11, Uvicorn, self-hosted
- Database: PostgreSQL 15, pgvector, Supabase
- AI: Claude 4.5, GPT-4o, Qwen3-VL, SigLIP, Voyage AI, Multi-Vector CLIP
API Categories
- PDF Processing (12 endpoints)
- Document Management (13 endpoints)
- Search APIs (8 endpoints)
- Image Analysis (5 endpoints)
- RAG System (7 endpoints)
- Embeddings (3 endpoints)
- Products (6 endpoints)
- Admin & Monitoring (8 endpoints)
- AI Services (11+ endpoints)
🔗 External Resources
API Documentation:
- Swagger UI:
/docs
- ReDoc:
/redoc
- OpenAPI Schema:
/openapi.json
Dashboards:
Repositories:
📝 Documentation Standards
All documentation follows these standards:
- Clear, concise language
- Code examples where applicable
- Structured with headers
- Links to related docs
- No task lists or planning documents
- Production-focused content
- Updated regularly
� Support
For questions or issues:
Last Updated: January 18, 2026
Version: 3.0.0
Status: Production
Total Documentation: 55+ comprehensive guides
Total Lines: 15,000+
Coverage: 100% of platform features
API Endpoints: 150+ across 16 categories
Recent Additions:
- ✨ NEW: web-scraping-integration.md - Firecrawl integration guide
- ✨ NEW: price-monitoring-system.md - Competitive price tracking
- ✨ NEW: price-monitoring-deployment-guide.md - Setup instructions
- ✨ NEW: saved-searches-deduplication.md - AI-powered search deduplication
- ✨ NEW: interior-design-models.md - 14 AI models inventory
- ✨ NEW: interior-design-data-flow.md - Generation workflow
- ✨ NEW: interior-designer-agent-user-guide.md - User guide
- ✨ NEW: internal-pricing-credit-system.md - Credit system documentation
- ✨ Updated README.md and CHANGELOG.md with all new features
- ✨ Documented async processing and concurrency limits
- ✨ Added production hardening documentation