Skip to main content

Recently Implemented Features and Deployment Updates

New Features and Their Deployment Configurations

The KAI platform has recently been enhanced with several new features that require specific deployment configurations. This section provides details on these features and their deployment requirements.

Property-Based Recommendation Engine

The Property-Based Recommendation Engine is integrated with the API server and provides intelligent material recommendations based on property requirements, user preferences, and project context.

Deployment Configuration

The recommendation engine is deployed as part of the API server with the following configuration:

  • Deployment Method: Integrated with the API server deployment
  • Configuration: ConfigMap mounted to the API server
  • Resource Requirements: Included in the API server resource allocation
  • Scaling: Scales with the API server

Implementation Details

The recommendation engine is implemented as a ConfigMap that is mounted to the API server:

# ConfigMap for the recommendation engine
apiVersion: v1
kind: ConfigMap
metadata:
name: recommendation-engine-config
labels:
app: api-server
component: recommendation
data:
recommendation_engine_enabled: "true"
recommendation_cache_size: "1000"
recommendation_refresh_interval: "300" # 5 minutes
recommendation_max_concurrent_jobs: "5"
recommendation_similarity_threshold: "0.7"
recommendation_config.json: |
{
"weightFactors": {
"propertyMatch": 0.6,
"userPreference": 0.3,
"projectContext": 0.1
},
"cacheStrategy": "lru",
"defaultLimit": 20,
"enableFeedbackLoop": true,
"minConfidenceScore": 0.65
}

The API server deployment is patched to include the recommendation engine configuration:

# Patch for the API server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
containers:
- name: api-server
env:
- name: RECOMMENDATION_ENGINE_ENABLED
valueFrom:
configMapKeyRef:
name: recommendation-engine-config
key: recommendation_engine_enabled
# Additional environment variables...
volumeMounts:
- name: recommendation-config
mountPath: /app/config/recommendation
readOnly: true
volumes:
- name: recommendation-config
configMap:
name: recommendation-engine-config

Material Property Analytics

The Material Property Analytics feature provides comprehensive analytics for material properties, including distribution analysis, trend analysis, correlation analysis, and anomaly detection.

Deployment Configuration

Material Property Analytics is deployed as a separate service with the following configuration:

  • Deployment Method: Dedicated deployment
  • Resource Requirements:
    • CPU: 1000m (request) / 4000m (limit)
    • Memory: 2Gi (request) / 8Gi (limit)
  • Scaling: HorizontalPodAutoscaler with 2-6 replicas
  • Storage: 20Gi PersistentVolumeClaim for analytics data
  • Workflow Integration: Integrated with the coordinator service via workflow templates

Implementation Details

The Material Property Analytics service is deployed as a separate Kubernetes deployment:

# Deployment for Material Property Analytics
apiVersion: apps/v1
kind: Deployment
metadata:
name: material-property-analytics
labels:
app: material-property-analytics
component: analytics
spec:
replicas: 2
selector:
matchLabels:
app: material-property-analytics
template:
metadata:
labels:
app: material-property-analytics
component: analytics
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
prometheus.io/path: "/metrics"
spec:
containers:
- name: material-property-analytics
image: $/kai/material-property-analytics:latest
# Container configuration...

The service is integrated with the coordinator service via a workflow template:

# Workflow template for Material Property Analytics
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: material-property-analytics-template
spec:
entrypoint: material-property-analytics
# Workflow template configuration...

Relationship-Aware Model Training

The Relationship-Aware Model Training feature enhances AI model training by incorporating knowledge from the Property Relationship Graph to improve property prediction and search relevance.

Deployment Configuration

Relationship-Aware Model Training is deployed as a separate service with the following configuration:

  • Deployment Method: Dedicated deployment
  • Resource Requirements:
    • CPU: 4000m (request) / 8000m (limit)
    • Memory: 16Gi (request) / 32Gi (limit)
    • GPU: 1 NVIDIA GPU (L40S preferred)
  • Scaling: HorizontalPodAutoscaler with 1-3 replicas
  • Storage: 40Gi PersistentVolumeClaim for model storage
  • Node Selection: Runs on nodes with gpu-type=nvidia-l40s label
  • Workflow Integration: Integrated with the coordinator service via workflow templates

Implementation Details

The Relationship-Aware Model Training service is deployed as a separate Kubernetes deployment:

# Deployment for Relationship-Aware Model Training
apiVersion: apps/v1
kind: Deployment
metadata:
name: relationship-aware-training
labels:
app: relationship-aware-training
component: ml-services
spec:
replicas: 1
selector:
matchLabels:
app: relationship-aware-training
template:
metadata:
labels:
app: relationship-aware-training
component: ml-services
gpu-enabled: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
prometheus.io/path: "/metrics"
spec:
containers:
- name: relationship-aware-training
image: $/kai/relationship-aware-training:latest
# Container configuration...
nodeSelector:
gpu-type: nvidia-l40s
runtimeClassName: nvidia-gpu

The service is integrated with the coordinator service via a workflow template:

# Workflow template for Relationship-Aware Model Training
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: relationship-aware-training-template
spec:
entrypoint: relationship-aware-training
# Workflow template configuration...

Coordinator Service Integration

The coordinator service has been updated to be aware of the new services and features:

# ML Features Configuration in coordinator-config ConfigMap
ml_features: |
{
"multimodalPatternRecognition": {
"enabled": true,
"workflowTemplate": "multimodal-pattern-recognition-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 8,
"memory": "16Gi",
"gpu": 1
},
"maxConcurrent": 4
},
"domainSpecificNetworks": {
"enabled": true,
"workflowTemplate": "domain-specific-networks-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 8,
"memory": "16Gi",
"gpu": 1
},
"maxConcurrent": 4,
"supportedDomains": ["wood", "metal", "fabric", "stone", "ceramic", "composite"]
},
"relationshipAwareTraining": {
"enabled": true,
"workflowTemplate": "relationship-aware-training-template",
"minQualityTier": "high",
"resourceQuotas": {
"cpu": 8,
"memory": "32Gi",
"gpu": 1
},
"maxConcurrent": 2,
"supportedMaterialTypes": ["tile", "wood", "stone", "laminate", "vinyl", "carpet", "metal", "glass", "concrete", "ceramic", "porcelain"]
},
"materialPropertyAnalytics": {
"enabled": true,
"workflowTemplate": "material-property-analytics-template",
"minQualityTier": "medium",
"resourceQuotas": {
"cpu": 4,
"memory": "8Gi",
"gpu": 0
},
"maxConcurrent": 5
}
}

GPU Resource Management

The GPU resource management configuration has been updated to include the new Relationship-Aware Training service:

# GPU Resource Management in coordinator-config ConfigMap
gpu_resources: |
{
"classes": {
"nvidia-l40s": {
"priority": 10,
"models": ["multimodal-pattern-recognition", "domain-specific-networks", "relationship-aware-training"]
},
"nvidia-h100": {
"priority": 20,
"models": ["gaussian-splatting", "triposr", "wonder3d", "instant3d"]
}
},
"scaling": {
"enabled": true,
"minNodes": 1,
"maxNodes": 10,
"scaleDownDelay": "10m"
}
}

GitOps Integration

All the new features are integrated with the GitOps workflow using Flux CD. The deployment configurations are stored in the GitOps repository and automatically applied by Flux.

Deployment Updates

Updated Kustomization Structure

The Flux GitOps repository has been updated with new directories for the recently implemented features:

flux/
├── clusters/
│ ├── production/
│ │ ├── kai/
│ │ │ ├── analytics/ # New directory for Material Property Analytics
│ │ │ ├── api-server/ # Updated with recommendation engine config
│ │ │ ├── coordinator/ # Updated with new workflow templates
│ │ │ ├── ml-services/ # Updated with relationship-aware training
│ │ │ └── kustomization.yaml # Updated to include new components

Deployment Process

The deployment process for the new features follows the existing GitOps workflow:

  1. Changes are committed to the GitOps repository
  2. Flux automatically detects the changes and applies them to the cluster
  3. The new services are deployed and integrated with the existing services

Resource Requirements

The new features have specific resource requirements that should be considered when planning cluster capacity:

FeatureCPUMemoryGPUStorage
Property-Based Recommendation EngineIncluded in API serverIncluded in API serverN/AN/A
Material Property Analytics1-4 cores2-8 GBN/A20 GB
Relationship-Aware Model Training4-8 cores16-32 GB1 NVIDIA L40S40 GB

Monitoring and Logging

All new services are configured with Prometheus metrics and structured logging:

  • Prometheus metrics exposed on port 9100
  • Metrics path: /metrics
  • Log format: JSON
  • Log level: Configurable via environment variables

Health Checks

All new services include appropriate health checks:

  • Liveness probe: /health
  • Readiness probe: /ready
  • Startup probe (where applicable): /startup