Monitoring

LeanCore includes a comprehensive monitoring stack for operational visibility.

Monitoring Stack

Service	Purpose
Prometheus	Metrics collection and alerting
Grafana	Dashboards and visualization
Tempo	Distributed tracing
Langfuse	AI pipeline trace capture
cAdvisor	Container metrics
Node Exporter	Host system metrics

Three pre-built dashboards provide full operational visibility:

Host and container health at a glance:

Panel	What it Shows
CPU Usage %	Timeseries with yellow >60%, red >80%
Memory Usage %	Timeseries with yellow >70%, red >85%
Disk Usage	Gauge showing partition usage, yellow >75%, red >90%
Network I/O	RX/TX bytes per second per interface
Container Status	Count of healthy containers
Container CPU	Per-container CPU usage
Container Memory	Per-container memory usage
Container Restarts (15m)	Bar gauge, yellow at 1, red at 3

JVM and HTTP performance:

Panel	What it Shows
JVM Heap	Used vs committed vs max memory
GC Pause Duration	Average garbage collection pause time
HTTP Request Rate	Total req/s and 5xx/s
HTTP Latency	p50, p95, p99 response time
HTTP Errors	4xx/5xx breakdown by status code
Connection Pool	Active, idle, pending, max connections
Thread Count	Live, daemon, peak thread counts

Connector status monitoring:

Panel	What it Shows
Server Status	UP/DOWN grid (green/red per server)
Health Response Time	Scrape duration per server
Container Restarts (1h)	Restart count per MCP container
Container Memory	Memory usage per connector

Every HTTP request through the backend produces an OpenTelemetry trace:

14 alert rules are configured for production:

Langfuse captures detailed AI pipeline execution data: