Distributed Tracing Service
This document describes the distributed tracing service implementation in the KAI platform. The tracing service provides a unified interface for tracking requests across services, enabling end-to-end visibility into request flows.
Overview
The distributed tracing service is designed to track requests as they flow through different services and components of the application. It provides a consistent API for tracing operations, regardless of the underlying tracing provider. The service supports OpenTelemetry for distributed tracing and integrates with the telemetry service.
Architecture
The tracing service follows a provider pattern, allowing different tracing implementations to be used interchangeably. The service consists of the following components:
- Tracing Service: The main service that provides a unified interface for tracing operations.
- Tracing Provider Interface: An interface that defines the contract for tracing providers.
- OpenTelemetry Provider: A provider that implements distributed tracing using OpenTelemetry.
- Tracing Initializer: A utility for initializing the tracing service with different providers.
Usage
Basic Usage
import from '@kai/shared';
// Start a span
const spanResult = tracing.startSpan('operation_name', {
kind: SpanKind.SERVER,
attributes: {
'service.name': 'my-service',
'operation.name': 'process-request'
}
});
// Get the current span
const span = tracing.getCurrentSpan();
// Add attributes to the span
tracing.addSpanAttributes(span, {
'user.id': '123',
'request.id': 'abc-123'
});
// Add events to the span
tracing.addSpanEvent(span, 'processing_started', {
'timestamp': Date.now()
});
// Set span status
tracing.setSpanStatus(span, SpanStatusCode.OK);
// End the span
tracing.endSpan(span);
Automatic Span Management
import from '@kai/shared';
// Automatically manage span lifecycle
const result = await tracing.withSpan(
'database_query',
async () => {
// This code is executed within the span
return await db.query('SELECT * FROM users');
},
{
kind: SpanKind.CLIENT,
attributes: {
'db.system': 'postgresql',
'db.statement': 'SELECT * FROM users'
}
}
);
Trace Context Propagation
import from '@kai/shared';
// Extract trace context from incoming request
const headers = request.headers;
const traceContext = tracing.extractContext(headers);
// Create a child span
const spanResult = tracing.startSpan('child_operation', {
parent: traceContext
});
// Inject trace context into outgoing request
const outgoingHeaders = ;
tracing.injectContext(spanResult.traceContext, outgoingHeaders);
// Make request with trace context
const response = await fetch('https://api.example.com', {
headers: outgoingHeaders
});
Configuration
The tracing service can be configured through environment variables or the unified configuration system. The following configuration options are available:
// In .env file
TRACING_ENABLED=true
TRACING_TYPE=opentelemetry
Tracing Providers
OpenTelemetry Provider
The OpenTelemetry provider implements distributed tracing using OpenTelemetry. It's suitable for production and distributed applications. Features include:
- OpenTelemetry-based tracing
- W3C Trace Context propagation
- Span attributes and events
- Span status and error handling
- Trace context extraction and injection
Integration with API Client
The API client has been integrated with the tracing service to automatically trace HTTP requests. Features include:
- Automatic span creation for HTTP requests
- Trace context propagation in request headers
- Span attributes for request and response details
- Span events for retries and errors
- Span status based on response status
import from '@kai/shared';
// Tracing is enabled by default
const result = await apiClient.get('/api/users');
// Disable tracing for specific client
const customClient = createApiClient({
useTracing: false
});
// Add custom tracing attributes
const customClient = createApiClient({
tracingAttributes: {
'service.name': 'custom-service',
'client.version': '1.0.0'
}
});
Implementation Details
Tracing Service
The tracing service provides a unified interface for tracing operations. It delegates all operations to the configured provider and adds additional functionality like automatic span management and integration with the telemetry service.
class TracingService {
private provider: TracingProvider | null = null;
private enabled: boolean = false;
// Set the tracing provider
setProvider(provider: TracingProvider): void;
// Enable tracing
async enable(): Promise<void>;
// Disable tracing
async disable(): Promise<void>;
// Start a span
startSpan(name: string, options?: SpanOptions): SpanResult | undefined;
// End a span
endSpan(span: Span, endTime?: number): void;
// Set span status
setSpanStatus(span: Span, code: SpanStatusCode, message?: string): void;
// Add span attributes
addSpanAttributes(span: Span, attributes: Record<string, string | number | boolean | string[] | number[] | boolean[]>): void;
// Add span events
addSpanEvent(span: Span, name: string, attributes?: Record<string, string | number | boolean>, timestamp?: number): void;
// Get current span
getCurrentSpan(): Span | undefined;
// Run with span
async withSpan<T>(name: string, fn: () => Promise<T> | T, options?: SpanOptions): Promise<T>;
// Extract trace context from carrier
extractContext(carrier: Record<string, string>): TraceContext | undefined;
// Inject trace context into carrier
injectContext(traceContext: TraceContext, carrier: Record<string, string>): void;
}
Tracing Provider Interface
The tracing provider interface defines the contract for tracing providers. All providers must implement this interface.
interface TracingProvider {
// Initialize the provider
initialize(): Promise<void>;
// Start a span
startSpan(name: string, options?: SpanOptions): Span;
// End a span
endSpan(span: Span, endTime?: number): void;
// Set span status
setSpanStatus(span: Span, code: SpanStatusCode, message?: string): void;
// Add span attributes
addSpanAttributes(span: Span, attributes: Record<string, string | number | boolean | string[] | number[] | boolean[]>): void;
// Add span events
addSpanEvent(span: Span, name: string, attributes?: Record<string, string | number | boolean>, timestamp?: number): void;
// Get current span
getCurrentSpan(): Span | undefined;
// Run with span
withSpan<T>(span: Span, fn: () => Promise<T> | T): Promise<T>;
// Extract trace context from carrier
extractContext(carrier: Record<string, string>): TraceContext | undefined;
// Inject trace context into carrier
injectContext(traceContext: TraceContext, carrier: Record<string, string>): void;
}
Benefits
The distributed tracing service provides several benefits:
- End-to-End Visibility: Track requests as they flow through different services and components.
- Performance Monitoring: Identify bottlenecks and performance issues in the application.
- Error Diagnosis: Quickly identify the root cause of errors and failures.
- Dependency Analysis: Understand the dependencies between different services and components.
- Service Level Objectives: Monitor and enforce service level objectives based on trace data.
- Consistent API: The unified interface provides a consistent API for tracing operations, regardless of the underlying provider.
- Provider Flexibility: The provider pattern allows different tracing implementations to be used interchangeably.
Next Steps
The following steps are recommended to further improve the distributed tracing service:
- Add More Providers: Add support for more tracing providers (Jaeger, Zipkin, etc.).
- Add Sampling Strategies: Add support for different sampling strategies to reduce the volume of trace data.
- Add Trace Analytics: Add support for analyzing trace data to identify patterns and trends.
- Add Trace Visualization: Add support for visualizing trace data to understand request flows.
- Add Trace Alerting: Add support for alerting based on trace data to proactively identify issues.
- Add Trace Retention: Add support for configuring trace data retention to manage storage costs.