GPU low-utilization monitoring
Rolling-window observability
for dark, dim, and underfed GPUs
Signals in
- NVML or DCGM exporter
- Low-util policy time
- Idle-state sampling
- Power & energy telemetry
→
Rolling windows
- Configurable short & long
- Low-util percentage
- Idle sampled % & entries
- Power corroboration
→
Operator view
- Long-window low-util KPI
- Dark / dim / bright signal
- Underfeeding or burstiness
- Not a root-cause engine
At a glance
Standard monitoring can show a GPU as moderately busy while rolling windows reveal high low-util time, repeated idle entries, and dim power.
Practical observability for H100, H200, and similar datacenter GPUs — not omniscient truth.