Skip to main content

Pure Storage

Snapshot Policy Strategy for Pure FlashArray

Define Business Objectives First

Snapshot policies should never be designed in isolation from business requirements. Before opening the Pure UI or running a CLI script, document these three critical inputs:

Map these requirements to service tiers rather than creating per-application policies. Typical organizations need 3-5 tiers (e.g., Critical, Standard, Low, Archive). Each tier gets a reusable snapshot policy that standardizes RPO/RTO across similar workloads.

Service Tier Snapshot Matrix

Here's a production-tested four-tier model that balances protection with capacity efficiency:

Tier 1 - Mission Critical (RPO: 15 min)
  Frequency: Every 15 minutes
  Retention: 96 snapshots (24 hours)
  Daily consolidation: 1 snapshot at midnight, keep 7 days
  Weekly: 1 snapshot Sunday midnight, keep 4 weeks
  Use case: Transactional databases, ERP systems, production VMs

Tier 2 - Production Standard (RPO: 1 hour)
  Frequency: Every hour
  Retention: 48 snapshots (2 days)
  Daily consolidation: 1 snapshot at midnight, keep 14 days  
  Weekly: 1 snapshot Sunday midnight, keep 8 weeks
  Use case: Application servers, file shares, general production

Tier 3 - Dev/Test (RPO: 24 hours)
  Frequency: Daily at midnight
  Retention: 7 snapshots (1 week)
  No long-term retention
  Use case: Development environments, test instances, temp data

Tier 4 - Archive/Compliance (RPO: 24 hours)
  Frequency: Daily at midnight
  Retention: 30 daily snapshots
  Monthly: 1 snapshot on 1st of month, keep 12 months (or longer based on compliance)
  Use case: Audit logs, financial records, regulatory data

Capacity Planning Considerations

Pure Storage snapshots are space-efficient due to redirect-on-write architecture—only changed blocks consume capacity. However, high-change workloads (databases with heavy write activity) can accumulate significant snapshot overhead.

Rule of thumb: Estimate 10-20% additional capacity for snapshot overhead on active workloads. Monitor actual consumed space using purearray list --space and adjust retention if snapshot growth exceeds projections.

Implementation with Pure CLI

Pure FlashArray snapshot scheduling uses protection groups and schedules. Here's how to implement the Tier 1 policy from the matrix above:

# Create Tier 1 protection group
pureprot create --volumes vol-db-prod-01,vol-db-prod-02 protgrp-tier1-critical

# Configure 15-minute snapshots, keep for 24 hours
pureprot schedule --enabled --replicate-enabled false \\
  --replicate-frequency 900 --replicate-at 0 \\
  --snapshot-frequency 900 --snapshot-at 0 \\
  --all-for 86400 \\
  protgrp-tier1-critical

# Add daily consolidation snapshot (keep 7 days)
pureprot schedule --enabled --replicate-enabled false \\
  --snapshot-frequency 86400 --snapshot-at 00:00 \\
  --days 7 \\
  protgrp-tier1-critical-daily

# Add weekly snapshot (keep 4 weeks)
pureprot schedule --enabled --replicate-enabled false \\
  --snapshot-frequency 604800 --snapshot-at Sun:00:00 \\
  --weeks 4 \\
  protgrp-tier1-critical-weekly

REST API Automation

For Infrastructure-as-Code workflows, use the Pure Storage REST API. This Python example creates a protection group with scheduled snapshots:

import requests
import json

array_url = "https://flasharray.example.com/api/2.0"
api_token = "your-api-token"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_token}"
}

# Create protection group
protgrp_data = {
    "names": ["protgrp-tier2-standard"],
    "volumes": {"names": ["vol-app-prod-01", "vol-app-prod-02"]}
}
response = requests.post(f"{array_url}/protection-groups", headers=headers, json=protgrp_data)

# Add hourly snapshot schedule (Tier 2)
schedule_data = {
    "enabled": True,
    "snap_frequency": 3600,  # 1 hour in seconds
    "snap_at": 0,
    "all_for": 172800  # Keep for 48 hours
}
requests.post(f"{array_url}/protection-groups/protgrp-tier2-standard/schedules", 
              headers=headers, json=schedule_data)

Operational Best Practices

1. Naming Conventions

Use descriptive, sortable names that include the tier level: protgrp-tier1-db-prod, protgrp-tier2-filesvr. This makes auditing easier when you have 50+ protection groups.

2. Test Restore Procedures

Snapshots are not backups until you've proven you can restore from them. Schedule quarterly DR drills where you:

3. Monitor Snapshot Growth Trends

Set alerts when snapshot-consumed space exceeds thresholds. In the Pure GUI, navigate to Storage → Volumes and sort by Snapshots column. Investigate volumes where snapshots consume >30% of total volume capacity— this may indicate excessive change rate or retention period misalignment.

4. Application-Consistent Snapshots

For databases (Oracle, SQL Server, PostgreSQL), coordinate snapshots with application quiesce commands:

Common Mistakes to Avoid

Over-Snapshotting Low-Change Workloads

Taking 15-minute snapshots of a file share that changes once per day wastes both snapshot slots and administrative overhead. Match snapshot frequency to workload change rate and business RPO—not arbitrary "more is better" thinking.

Treating Snapshots as Backups

Snapshots protect against logical corruption (accidental deletes, ransomware, bad updates) but not against array-level failures or site disasters. Always pair snapshot policies with array replication (ActiveCluster) or external backups to tape/cloud for comprehensive data protection.

Ignoring Compliance Retention

Regulatory requirements (HIPAA, SOX, GDPR) may mandate specific retention periods. Document these requirements in your policy matrix and implement automated retention enforcement. Never rely on manual snapshot management for compliance-critical data—automation prevents gaps.

No Snapshot Naming Standards

Manual snapshots without standardized naming become unmanageable quickly. Enforce formats like {volumename}.{timestamp}.{purpose} (e.g., vol-db-prod.20260214-1500.pre-patch).

Monitoring and Alerting

Integrate snapshot health monitoring into your observability stack:

# Prometheus query example: snapshot age threshold
time() - pure_volume_snapshot_created_timestamp > 3600  # Alert if no snapshot in past hour

Next Steps: Automation and Integration

Once base policies are established, enhance with these advanced capabilities:

A mature snapshot strategy isn't just about taking snapshots—it's about integrating data protection into your operational workflows, monitoring effectiveness, and continuously optimizing based on real-world restore needs.


💬 Discussion

Have questions or feedback about this guide? Found a better approach?

Join the discussion on GitHub or contact us directly.