Recently Implemented Features and Deployment Updates
New Features and Their Deployment Configurations
The KAI platform has recently been enhanced with several new features that require specific deployment configurations. This section provides details on these features and their deployment requirements.
Property-Based Recommendation Engine
The Property-Based Recommendation Engine is integrated with the API server and provides intelligent material recommendations based on property requirements, user preferences, and project context.
Deployment Configuration
The recommendation engine is deployed as part of the API server with the following configuration:
- Deployment Method: Integrated with the API server deployment
- Configuration: ConfigMap mounted to the API server
- Resource Requirements: Included in the API server resource allocation
- Scaling: Scales with the API server
Implementation Details
The recommendation engine is implemented as a ConfigMap that is mounted to the API server:
# ConfigMap for the recommendation engine
apiVersion: v1
kind: ConfigMap
metadata:
name: recommendation-engine-config
labels:
app: api-server
component: recommendation
data:
recommendation_engine_enabled: "true"
recommendation_cache_size: "1000"
recommendation_refresh_interval: "300" # 5 minutes
recommendation_max_concurrent_jobs: "5"
recommendation_similarity_threshold: "0.7"
recommendation_config.json: |
{
"weightFactors": {
"propertyMatch": 0.6,
"userPreference": 0.3,
"projectContext": 0.1
},
"cacheStrategy": "lru",
"defaultLimit": 20,
"enableFeedbackLoop": true,
"minConfidenceScore": 0.65
}
The API server deployment is patched to include the recommendation engine configuration:
# Patch for the API server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
containers:
- name: api-server
env:
- name: RECOMMENDATION_ENGINE_ENABLED
valueFrom:
configMapKeyRef:
name: recommendation-engine-config
key: recommendation_engine_enabled
# Additional environment variables...
volumeMounts:
- name: recommendation-config
mountPath: /app/config/recommendation
readOnly: true
volumes:
- name: recommendation-config
configMap:
name: recommendation-engine-config
Material Property Analytics
The Material Property Analytics feature provides comprehensive analytics for material properties, including distribution analysis, trend analysis, correlation analysis, and anomaly detection.
Deployment Configuration
Material Property Analytics is deployed as a separate service with the following configuration:
- Deployment Method: Dedicated deployment
- Resource Requirements:
- CPU: 1000m (request) / 4000m (limit)
- Memory: 2Gi (request) / 8Gi (limit)
- Scaling: HorizontalPodAutoscaler with 2-6 replicas
- Storage: 20Gi PersistentVolumeClaim for analytics data
- Workflow Integration: Integrated with the coordinator service via workflow templates
Implementation Details
The Material Property Analytics service is deployed as a separate Kubernetes deployment:
# Deployment for Material Property Analytics
apiVersion: apps/v1
kind: Deployment
metadata:
name: material-property-analytics
labels:
app: material-property-analytics
component: analytics
spec:
replicas: 2
selector:
matchLabels:
app: material-property-analytics
template:
metadata:
labels:
app: material-property-analytics
component: analytics
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
prometheus.io/path: "/metrics"
spec:
containers:
- name: material-property-analytics
image: $/kai/material-property-analytics:latest
# Container configuration...
The service is integrated with the coordinator service via a workflow template:
# Workflow template for Material Property Analytics
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: material-property-analytics-template
spec:
entrypoint: material-property-analytics
# Workflow template configuration...
Relationship-Aware Model Training
The Relationship-Aware Model Training feature enhances AI model training by incorporating knowledge from the Property Relationship Graph to improve property prediction and search relevance.
Deployment Configuration
Relationship-Aware Model Training is deployed as a separate service with the following configuration:
- Deployment Method: Dedicated deployment
- Resource Requirements:
- CPU: 4000m (request) / 8000m (limit)
- Memory: 16Gi (request) / 32Gi (limit)
- GPU: 1 NVIDIA GPU (L40S preferred)
- Scaling: HorizontalPodAutoscaler with 1-3 replicas
- Storage: 40Gi PersistentVolumeClaim for model storage
- Node Selection: Runs on nodes with
gpu-type=nvidia-l40s
label - Workflow Integration: Integrated with the coordinator service via workflow templates
Implementation Details
The Relationship-Aware Model Training service is deployed as a separate Kubernetes deployment:
# Deployment for Relationship-Aware Model Training
apiVersion: apps/v1
kind: Deployment
metadata:
name: relationship-aware-training
labels:
app: relationship-aware-training
component: ml-services
spec:
replicas: 1
selector:
matchLabels:
app: relationship-aware-training
template:
metadata:
labels:
app: relationship-aware-training
component: ml-services
gpu-enabled: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
prometheus.io/path: "/metrics"
spec:
containers:
- name: relationship-aware-training
image: $/kai/relationship-aware-training:latest
# Container configuration...
nodeSelector:
gpu-type: nvidia-l40s
runtimeClassName: nvidia-gpu
The service is integrated with the coordinator service via a workflow template:
# Workflow template for Relationship-Aware Model Training
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: relationship-aware-training-template
spec:
entrypoint: relationship-aware-training
# Workflow template configuration...
Coordinator Service Integration
The coordinator service has been updated to be aware of the new services and features:
# ML Features Configuration in coordinator-config ConfigMap
ml_features: |
{
"multimodalPatternRecognition": {
"enabled": true,
"workflowTemplate": "multimodal-pattern-recognition-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 8,
"memory": "16Gi",
"gpu": 1
},
"maxConcurrent": 4
},
"domainSpecificNetworks": {
"enabled": true,
"workflowTemplate": "domain-specific-networks-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 8,
"memory": "16Gi",
"gpu": 1
},
"maxConcurrent": 4,
"supportedDomains": ["wood", "metal", "fabric", "stone", "ceramic", "composite"]
},
"relationshipAwareTraining": {
"enabled": true,
"workflowTemplate": "relationship-aware-training-template",
"minQualityTier": "high",
"resourceQuotas": {
"cpu": 8,
"memory": "32Gi",
"gpu": 1
},
"maxConcurrent": 2,
"supportedMaterialTypes": ["tile", "wood", "stone", "laminate", "vinyl", "carpet", "metal", "glass", "concrete", "ceramic", "porcelain"]
},
"materialPropertyAnalytics": {
"enabled": true,
"workflowTemplate": "material-property-analytics-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 4,
"memory": "8Gi",
"gpu": 0
},
"maxConcurrent": 5
}
}
GPU Resource Management
The GPU resource management configuration has been updated to include the new Relationship-Aware Training service:
# GPU Resource Management in coordinator-config ConfigMap
gpu_resources: |
{
"classes": {
"nvidia-l40s": {
"priority": 10,
"models": ["multimodal-pattern-recognition", "domain-specific-networks", "relationship-aware-training"]
},
"nvidia-h100": {
"priority": 20,
"models": ["gaussian-splatting", "triposr", "wonder3d", "instant3d"]
}
},
"scaling": {
"enabled": true,
"minNodes": 1,
"maxNodes": 10,
"scaleDownDelay": "10m"
}
}
GitOps Integration
All the new features are integrated with the GitOps workflow using Flux CD. The deployment configurations are stored in the GitOps repository and automatically applied by Flux.
Deployment Updates
Updated Kustomization Structure
The Flux GitOps repository has been updated with new directories for the recently implemented features:
flux/
├── clusters/
│ ├── production/
│ │ ├── kai/
│ │ │ ├── analytics/ # New directory for Material Property Analytics
│ │ │ ├── api-server/ # Updated with recommendation engine config
│ │ │ ├── coordinator/ # Updated with new workflow templates
│ │ │ ├── ml-services/ # Updated with relationship-aware training
│ │ │ └── kustomization.yaml # Updated to include new components
Deployment Process
The deployment process for the new features follows the existing GitOps workflow:
- Changes are committed to the GitOps repository
- Flux automatically detects the changes and applies them to the cluster
- The new services are deployed and integrated with the existing services
Resource Requirements
The new features have specific resource requirements that should be considered when planning cluster capacity:
Feature | CPU | Memory | GPU | Storage |
---|---|---|---|---|
Property-Based Recommendation Engine | Included in API server | Included in API server | N/A | N/A |
Material Property Analytics | 1-4 cores | 2-8 GB | N/A | 20 GB |
Relationship-Aware Model Training | 4-8 cores | 16-32 GB | 1 NVIDIA L40S | 40 GB |
Monitoring and Logging
All new services are configured with Prometheus metrics and structured logging:
- Prometheus metrics exposed on port 9100
- Metrics path:
/metrics
- Log format: JSON
- Log level: Configurable via environment variables
Health Checks
All new services include appropriate health checks:
- Liveness probe:
/health
- Readiness probe:
/ready
- Startup probe (where applicable):
/startup