Skip to main content

Observability

Prometheus and Grafana Storage Monitoring Dashboard: What to Put On It

Design Around Questions

A storage dashboard should answer operational questions quickly. Avoid building one giant dashboard with every metric. Build focused views that help a person decide what to do next.

Start with these questions:

Prometheus provides alerting rules that can send alerts to Alertmanager, and Grafana can visualize Prometheus data and manage alerts. See the official docs for Prometheus alerting, Grafana alerting, and Grafana with Prometheus.

Dashboard 1: Capacity

The capacity dashboard should show:

Good capacity panels are mostly tables and trend lines. Operators need sortable lists more than decorative gauges.

Dashboard 2: Performance

The performance dashboard should show:

Use percentiles where possible. Average latency can hide short but painful spikes.

Dashboard 3: Protection Health

Storage monitoring should include protection state, not only hardware health.

This is the dashboard that helps catch silent data protection drift.

Dashboard 4: Monitoring Health

Every monitoring stack needs a dashboard for itself.

If monitoring is unhealthy, storage dashboards may look quiet for the wrong reason.

Example Alert Ideas

groups:
  - name: storage_capacity
    rules:
      - alert: StorageVolumeAbove90Percent
        expr: storage_volume_used_percent > 90
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Storage volume above 90 percent used"

Tune alert names and expressions to your exporter. The structure matters more than the placeholder metric name.

References

Back to top