Recently Implemented Features and Deployment Updates

New Features and Their Deployment Configurations

The KAI platform has recently been enhanced with several new features that require specific deployment configurations. This section provides details on these features and their deployment requirements.

Property-Based Recommendation Engine

The Property-Based Recommendation Engine is integrated with the API server and provides intelligent material recommendations based on property requirements, user preferences, and project context.

Deployment Configuration

The recommendation engine is deployed as part of the API server with the following configuration:

Deployment Method: Integrated with the API server deployment
Configuration: ConfigMap mounted to the API server
Resource Requirements: Included in the API server resource allocation
Scaling: Scales with the API server

Implementation Details

The recommendation engine is implemented as a ConfigMap that is mounted to the API server:

# ConfigMap for the recommendation engine
apiVersion: v1
kind: ConfigMap
metadata:
  name: recommendation-engine-config
  labels:
    app: api-server
    component: recommendation
data:
  recommendation_engine_enabled: "true"
  recommendation_cache_size: "1000"
  recommendation_refresh_interval: "300"  # 5 minutes
  recommendation_max_concurrent_jobs: "5"
  recommendation_similarity_threshold: "0.7"
  recommendation_config.json: |
    {
      "weightFactors": {
        "propertyMatch": 0.6,
        "userPreference": 0.3,
        "projectContext": 0.1
      },
      "cacheStrategy": "lru",
      "defaultLimit": 20,
      "enableFeedbackLoop": true,
      "minConfidenceScore": 0.65
    }

The API server deployment is patched to include the recommendation engine configuration:

# Patch for the API server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  template:
    spec:
      containers:
      - name: api-server
        env:
        - name: RECOMMENDATION_ENGINE_ENABLED
          valueFrom:
            configMapKeyRef:
              name: recommendation-engine-config
              key: recommendation_engine_enabled
        # Additional environment variables...
        volumeMounts:
        - name: recommendation-config
          mountPath: /app/config/recommendation
          readOnly: true
      volumes:
      - name: recommendation-config
        configMap:
          name: recommendation-engine-config

Material Property Analytics

The Material Property Analytics feature provides comprehensive analytics for material properties, including distribution analysis, trend analysis, correlation analysis, and anomaly detection.

Deployment Configuration

Material Property Analytics is deployed as a separate service with the following configuration:

Deployment Method: Dedicated deployment
Resource Requirements:
- CPU: 1000m (request) / 4000m (limit)
- Memory: 2Gi (request) / 8Gi (limit)
Scaling: HorizontalPodAutoscaler with 2-6 replicas
Storage: 20Gi PersistentVolumeClaim for analytics data
Workflow Integration: Integrated with the coordinator service via workflow templates

Implementation Details

The Material Property Analytics service is deployed as a separate Kubernetes deployment:

# Deployment for Material Property Analytics
apiVersion: apps/v1
kind: Deployment
metadata:
  name: material-property-analytics
  labels:
    app: material-property-analytics
    component: analytics
spec:
  replicas: 2
  selector:
    matchLabels:
      app: material-property-analytics
  template:
    metadata:
      labels:
        app: material-property-analytics
        component: analytics
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: material-property-analytics
        image: $/kai/material-property-analytics:latest
        # Container configuration...

The service is integrated with the coordinator service via a workflow template:

# Workflow template for Material Property Analytics
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: material-property-analytics-template
spec:
  entrypoint: material-property-analytics
  # Workflow template configuration...

Relationship-Aware Model Training

The Relationship-Aware Model Training feature enhances AI model training by incorporating knowledge from the Property Relationship Graph to improve property prediction and search relevance.

Deployment Configuration

Relationship-Aware Model Training is deployed as a separate service with the following configuration:

Deployment Method: Dedicated deployment
Resource Requirements:
- CPU: 4000m (request) / 8000m (limit)
- Memory: 16Gi (request) / 32Gi (limit)
- GPU: 1 NVIDIA GPU (L40S preferred)
Scaling: HorizontalPodAutoscaler with 1-3 replicas
Storage: 40Gi PersistentVolumeClaim for model storage
Node Selection: Runs on nodes with gpu-type=nvidia-l40s label
Workflow Integration: Integrated with the coordinator service via workflow templates

Implementation Details

The Relationship-Aware Model Training service is deployed as a separate Kubernetes deployment:

# Deployment for Relationship-Aware Model Training
apiVersion: apps/v1
kind: Deployment
metadata:
  name: relationship-aware-training
  labels:
    app: relationship-aware-training
    component: ml-services
spec:
  replicas: 1
  selector:
    matchLabels:
      app: relationship-aware-training
  template:
    metadata:
      labels:
        app: relationship-aware-training
        component: ml-services
        gpu-enabled: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: relationship-aware-training
        image: $/kai/relationship-aware-training:latest
        # Container configuration...
      nodeSelector:
        gpu-type: nvidia-l40s
      runtimeClassName: nvidia-gpu

The service is integrated with the coordinator service via a workflow template:

# Workflow template for Relationship-Aware Model Training
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: relationship-aware-training-template
spec:
  entrypoint: relationship-aware-training
  # Workflow template configuration...

Coordinator Service Integration

The coordinator service has been updated to be aware of the new services and features:

# ML Features Configuration in coordinator-config ConfigMap
ml_features: |
  {
    "multimodalPatternRecognition": {
      "enabled": true,
      "workflowTemplate": "multimodal-pattern-recognition-template",
      "minQualityTier": "medium",
      "resourceQuotas": {
        "cpu": 8,
        "memory": "16Gi",
        "gpu": 1
      },
      "maxConcurrent": 4
    },
    "domainSpecificNetworks": {
      "enabled": true,
      "workflowTemplate": "domain-specific-networks-template",
      "minQualityTier": "medium",
      "resourceQuotas": {
        "cpu": 8,
        "memory": "16Gi",
        "gpu": 1
      },
      "maxConcurrent": 4,
      "supportedDomains": ["wood", "metal", "fabric", "stone", "ceramic", "composite"]
    },
    "relationshipAwareTraining": {
      "enabled": true,
      "workflowTemplate": "relationship-aware-training-template",
      "minQualityTier": "high",
      "resourceQuotas": {
        "cpu": 8,
        "memory": "32Gi",
        "gpu": 1
      },
      "maxConcurrent": 2,
      "supportedMaterialTypes": ["tile", "wood", "stone", "laminate", "vinyl", "carpet", "metal", "glass", "concrete", "ceramic", "porcelain"]
    },
    "materialPropertyAnalytics": {
      "enabled": true,
      "workflowTemplate": "material-property-analytics-template",
      "minQualityTier": "medium",
      "resourceQuotas": {
        "cpu": 4,
        "memory": "8Gi",
        "gpu": 0
      },
      "maxConcurrent": 5
    }
  }

GPU Resource Management

The GPU resource management configuration has been updated to include the new Relationship-Aware Training service:

# GPU Resource Management in coordinator-config ConfigMap
gpu_resources: |
  {
    "classes": {
      "nvidia-l40s": {
        "priority": 10,
        "models": ["multimodal-pattern-recognition", "domain-specific-networks", "relationship-aware-training"]
      },
      "nvidia-h100": {
        "priority": 20,
        "models": ["gaussian-splatting", "triposr", "wonder3d", "instant3d"]
      }
    },
    "scaling": {
      "enabled": true,
      "minNodes": 1,
      "maxNodes": 10,
      "scaleDownDelay": "10m"
    }
  }

GitOps Integration

All the new features are integrated with the GitOps workflow using Flux CD. The deployment configurations are stored in the GitOps repository and automatically applied by Flux.

Deployment Updates

Updated Kustomization Structure

The Flux GitOps repository has been updated with new directories for the recently implemented features:

flux/
├── clusters/
│   ├── production/
│   │   ├── kai/
│   │   │   ├── analytics/           # New directory for Material Property Analytics
│   │   │   ├── api-server/          # Updated with recommendation engine config
│   │   │   ├── coordinator/         # Updated with new workflow templates
│   │   │   ├── ml-services/         # Updated with relationship-aware training
│   │   │   └── kustomization.yaml   # Updated to include new components

Deployment Process

The deployment process for the new features follows the existing GitOps workflow:

Changes are committed to the GitOps repository
Flux automatically detects the changes and applies them to the cluster
The new services are deployed and integrated with the existing services

Resource Requirements

The new features have specific resource requirements that should be considered when planning cluster capacity:

Feature	CPU	Memory	GPU	Storage
Property-Based Recommendation Engine	Included in API server	Included in API server	N/A	N/A
Material Property Analytics	1-4 cores	2-8 GB	N/A	20 GB
Relationship-Aware Model Training	4-8 cores	16-32 GB	1 NVIDIA L40S	40 GB

Monitoring and Logging

All new services are configured with Prometheus metrics and structured logging:

Prometheus metrics exposed on port 9100
Metrics path: /metrics
Log format: JSON
Log level: Configurable via environment variables

Health Checks

All new services include appropriate health checks:

Liveness probe: /health
Readiness probe: /ready
Startup probe (where applicable): /startup

New Features and Their Deployment Configurations​

Property-Based Recommendation Engine​

Deployment Configuration​

Implementation Details​

Material Property Analytics​

Deployment Configuration​

Implementation Details​

Relationship-Aware Model Training​

Deployment Configuration​

Implementation Details​

Coordinator Service Integration​

GPU Resource Management​

GitOps Integration​

Deployment Updates​

Updated Kustomization Structure​

Deployment Process​

Resource Requirements​

Monitoring and Logging​

Health Checks​

New Features and Their Deployment Configurations

Property-Based Recommendation Engine

Deployment Configuration

Implementation Details

Material Property Analytics

Deployment Configuration

Implementation Details

Relationship-Aware Model Training

Deployment Configuration

Implementation Details

Coordinator Service Integration

GPU Resource Management

GitOps Integration

Deployment Updates

Updated Kustomization Structure

Deployment Process

Resource Requirements

Monitoring and Logging

Health Checks