Skip to content

feat: add GPU observability config and Grafana dashboard#105

Merged
Defilan merged 1 commit intomainfrom
feat/observability-config
Dec 5, 2025
Merged

feat: add GPU observability config and Grafana dashboard#105
Defilan merged 1 commit intomainfrom
feat/observability-config

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Dec 5, 2025

Summary

  • Add Grafana dashboard for monitoring GPU-enabled LLMKube deployments
  • Include reference monitoring configs for DCGM and node-exporter
  • Comprehensive setup documentation

Relates to #5

Changes

Grafana Dashboard (config/grafana/)

  • GPU utilization, temperature, power, and memory panels
  • System metrics (CPU, memory, disk, network)
  • LLMKube-specific metric placeholders for future integration
  • Setup guide with Docker and Kubernetes deployment options

Monitoring Config (config/monitoring/)

  • DCGM exporter DaemonSet for NVIDIA GPU metrics
  • Node exporter for system metrics
  • Prometheus scrape configuration
  • Kustomization for easy deployment

Test plan

  • Import dashboard into Grafana instance
  • Verify DCGM exporter scrapes GPU metrics
  • Verify node-exporter scrapes system metrics
  • Confirm all dashboard panels display data correctly

@Defilan Defilan force-pushed the feat/observability-config branch from bd13aea to ed004af Compare December 5, 2025 16:14
Add monitoring configuration for GPU-enabled LLMKube deployments:

Grafana Dashboard (config/grafana/):
- GPU utilization, temperature, power, and memory panels
- System metrics (CPU, memory, disk, network)
- LLMKube-specific metric placeholders for future integration
- Setup guide with Docker and Kubernetes deployment options

Monitoring Config (config/monitoring/):
- DCGM exporter DaemonSet for NVIDIA GPU metrics
- Node exporter for system metrics
- Prometheus scrape configuration
- Kustomization for easy deployment

The dashboard supports multi-GPU setups and provides real-time
visibility into inference workload performance.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan force-pushed the feat/observability-config branch from ed004af to ed7d617 Compare December 5, 2025 16:21
@Defilan Defilan merged commit 571643f into main Dec 5, 2025
4 checks passed
@Defilan Defilan deleted the feat/observability-config branch December 5, 2025 16:28
@github-actions github-actions bot mentioned this pull request Dec 5, 2025
@github-actions github-actions bot mentioned this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant