Skip to main content

Telemetry Service

This document describes the telemetry service implementation in the KAI platform. The telemetry service provides a unified interface for collecting metrics, traces, and logs across the application.

Overview

The telemetry service is designed to collect and analyze application telemetry data, including metrics, traces, and logs. It provides a consistent API for telemetry operations, regardless of the underlying telemetry provider. The service supports both console-based telemetry for development and more advanced telemetry providers for production.

Architecture

The telemetry service follows a provider pattern, allowing different telemetry implementations to be used interchangeably. The service consists of the following components:

  1. Telemetry Service: The main service that provides a unified interface for telemetry operations.
  2. Telemetry Provider Interface: An interface that defines the contract for telemetry providers.
  3. Console Telemetry Provider: A provider that logs telemetry events to the console.
  4. Telemetry Initializer: A utility for initializing the telemetry service with different providers.

Usage

Basic Usage

import  from '@kai/shared';

// Track an event
await telemetry.trackEvent({
type: 'user',
name: 'user_login',
timestamp: Date.now(),
status: 'success',
properties: {
userId: '123',
method: 'password'
}
});

// Track an error
await telemetry.trackEvent({
type: 'error',
name: 'api_error',
timestamp: Date.now(),
status: 'error',
properties: {
endpoint: '/api/users',
method: 'GET'
},
error: {
message: 'Failed to fetch users',
stack: error.stack,
name: error.name
}
});

Performance Measurements

import  from '@kai/shared';

// Manual measurement
const id = telemetry.startMeasurement('database_query', {
query: 'SELECT * FROM users',
params: { limit: 10 }
});

try {
const result = await db.query('SELECT * FROM users', { limit: 10 });
await telemetry.stopMeasurement(id, );
return result;
} catch (error) {
await telemetry.stopMeasurement(id, { success: false, error: error.message });
throw error;
}

// Automatic measurement
const result = await telemetry.measure(
'database_query',
async () => {
return await db.query('SELECT * FROM users', { limit: 10 });
},
{
query: 'SELECT * FROM users',
params: { limit: 10 }
}
);

Configuration

The telemetry service can be configured through environment variables or the unified configuration system. The following configuration options are available:

// In .env file
TELEMETRY_ENABLED=true
TELEMETRY_TYPE=console

Telemetry Providers

Console Telemetry Provider

The console telemetry provider logs telemetry events to the console. It's suitable for development and debugging. Features include:

  • Console-based logging
  • Performance measurements
  • Event buffering
  • Error handling

Implementation Details

Telemetry Service

The telemetry service provides a unified interface for telemetry operations. It delegates all operations to the configured provider and adds additional functionality like event buffering and automatic measurements.

class TelemetryService {
private provider: TelemetryProvider | null = null;
private enabled: boolean = false;

// Set the telemetry provider
setProvider(provider: TelemetryProvider): void;

// Enable telemetry
async enable(): Promise<void>;

// Disable telemetry
async disable(): Promise<void>;

// Track an event
async trackEvent(event: TelemetryEvent): Promise<void>;

// Start a performance measurement
startMeasurement(name: string, properties?: Record<string, any>): string;

// Stop a performance measurement
async stopMeasurement(id: string, additionalProperties?: Record<string, any>): Promise<void>;

// Measure the execution time of a function
async measure<T>(
name: string,
fn: () => Promise<T> | T,
properties?: Record<string, any>
): Promise<T>;

// Flush all pending telemetry
async flush(): Promise<void>;
}

Telemetry Provider Interface

The telemetry provider interface defines the contract for telemetry providers. All providers must implement this interface.

interface TelemetryProvider {
// Initialize the provider
initialize(): Promise<void>;

// Track an event
trackEvent(event: TelemetryEvent): Promise<void>;

// Start a performance measurement
startMeasurement(name: string, properties?: Record<string, any>): string;

// Stop a performance measurement
stopMeasurement(id: string, additionalProperties?: Record<string, any>): Promise<void>;

// Flush all pending telemetry
flush(): Promise<void>;
}

Telemetry Event

The telemetry event interface defines the structure of telemetry events.

interface TelemetryEvent {
// Event type
type: TelemetryEventType;

// Event name
name: string;

// Event timestamp
timestamp: number;

// Event duration in milliseconds (if applicable)
duration?: number;

// Event status (success, error, etc.)
status?: 'success' | 'error' | 'warning' | 'info';

// Event properties
properties?: Record<string, any>;

// Event measurements
measurements?: Record<string, number>;

// Error information (if applicable)
error?: {
message: string;
stack?: string;
code?: string;
name?: string;
};
}

Benefits

The telemetry service provides several benefits:

  1. Improved Monitoring: Comprehensive telemetry data helps identify and diagnose issues.
  2. Consistent API: The unified interface provides a consistent API for telemetry operations, regardless of the underlying provider.
  3. Provider Flexibility: The provider pattern allows different telemetry implementations to be used interchangeably.
  4. Performance Measurements: Built-in support for performance measurements helps identify bottlenecks.
  5. Error Tracking: Structured error tracking helps diagnose and fix issues.
  6. Event Buffering: Event buffering ensures that telemetry data is not lost if the provider is temporarily unavailable.

Next Steps

The following steps are recommended to further improve the telemetry service:

  1. Add More Providers: Add support for more telemetry providers (Application Insights, Datadog, etc.).
  2. Add Distributed Tracing: Add support for distributed tracing to track requests across services.
  3. Add Sampling: Add support for sampling to reduce the volume of telemetry data.
  4. Add Custom Dimensions: Add support for custom dimensions to enrich telemetry data.
  5. Add Correlation: Add support for correlation to link related telemetry events.
  6. Add Alerting: Add support for alerting based on telemetry data.