Pure Storage
Snapshot Policy Strategy for Pure FlashArray
Define Business Objectives First
Snapshot policies should never be designed in isolation from business requirements. Before opening the Pure UI or running a CLI script, document these three critical inputs:
- Recovery Point Objective (RPO): Maximum tolerable data loss measured in time. A 15-minute RPO means you can afford to lose up to 15 minutes of transactions.
- Recovery Time Objective (RTO): How quickly you need to restore from snapshot. Snapshots enable faster recovery than tape or replication in most cases.
- Retention Requirements: Compliance, audit, or operational needs that dictate how long snapshots must be retained (e.g., 30 days for general operations, 7 years for financial records).
Map these requirements to service tiers rather than creating per-application policies. Typical organizations need 3-5 tiers (e.g., Critical, Standard, Low, Archive). Each tier gets a reusable snapshot policy that standardizes RPO/RTO across similar workloads.
Service Tier Snapshot Matrix
Here's a production-tested four-tier model that balances protection with capacity efficiency:
Tier 1 - Mission Critical (RPO: 15 min)
Frequency: Every 15 minutes
Retention: 96 snapshots (24 hours)
Daily consolidation: 1 snapshot at midnight, keep 7 days
Weekly: 1 snapshot Sunday midnight, keep 4 weeks
Use case: Transactional databases, ERP systems, production VMs
Tier 2 - Production Standard (RPO: 1 hour)
Frequency: Every hour
Retention: 48 snapshots (2 days)
Daily consolidation: 1 snapshot at midnight, keep 14 days
Weekly: 1 snapshot Sunday midnight, keep 8 weeks
Use case: Application servers, file shares, general production
Tier 3 - Dev/Test (RPO: 24 hours)
Frequency: Daily at midnight
Retention: 7 snapshots (1 week)
No long-term retention
Use case: Development environments, test instances, temp data
Tier 4 - Archive/Compliance (RPO: 24 hours)
Frequency: Daily at midnight
Retention: 30 daily snapshots
Monthly: 1 snapshot on 1st of month, keep 12 months (or longer based on compliance)
Use case: Audit logs, financial records, regulatory data
Capacity Planning Considerations
Pure Storage snapshots are space-efficient due to redirect-on-write architecture—only changed blocks consume capacity. However, high-change workloads (databases with heavy write activity) can accumulate significant snapshot overhead.
Rule of thumb: Estimate 10-20% additional capacity for snapshot overhead on active workloads. Monitor
actual consumed space using purearray list --space and adjust retention if snapshot growth exceeds projections.
Implementation with Pure CLI
Pure FlashArray snapshot scheduling uses protection groups and schedules. Here's how to implement the Tier 1 policy from the matrix above:
# Create Tier 1 protection group
pureprot create --volumes vol-db-prod-01,vol-db-prod-02 protgrp-tier1-critical
# Configure 15-minute snapshots, keep for 24 hours
pureprot schedule --enabled --replicate-enabled false \\
--replicate-frequency 900 --replicate-at 0 \\
--snapshot-frequency 900 --snapshot-at 0 \\
--all-for 86400 \\
protgrp-tier1-critical
# Add daily consolidation snapshot (keep 7 days)
pureprot schedule --enabled --replicate-enabled false \\
--snapshot-frequency 86400 --snapshot-at 00:00 \\
--days 7 \\
protgrp-tier1-critical-daily
# Add weekly snapshot (keep 4 weeks)
pureprot schedule --enabled --replicate-enabled false \\
--snapshot-frequency 604800 --snapshot-at Sun:00:00 \\
--weeks 4 \\
protgrp-tier1-critical-weekly
REST API Automation
For Infrastructure-as-Code workflows, use the Pure Storage REST API. This Python example creates a protection group with scheduled snapshots:
import requests
import json
array_url = "https://flasharray.example.com/api/2.0"
api_token = "your-api-token"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_token}"
}
# Create protection group
protgrp_data = {
"names": ["protgrp-tier2-standard"],
"volumes": {"names": ["vol-app-prod-01", "vol-app-prod-02"]}
}
response = requests.post(f"{array_url}/protection-groups", headers=headers, json=protgrp_data)
# Add hourly snapshot schedule (Tier 2)
schedule_data = {
"enabled": True,
"snap_frequency": 3600, # 1 hour in seconds
"snap_at": 0,
"all_for": 172800 # Keep for 48 hours
}
requests.post(f"{array_url}/protection-groups/protgrp-tier2-standard/schedules",
headers=headers, json=schedule_data)
Operational Best Practices
1. Naming Conventions
Use descriptive, sortable names that include the tier level: protgrp-tier1-db-prod,
protgrp-tier2-filesvr. This makes auditing easier when you have 50+ protection groups.
2. Test Restore Procedures
Snapshots are not backups until you've proven you can restore from them. Schedule quarterly DR drills where you:
- Instantiate a volume from a random snapshot
- Mount it to a test host
- Validate application-level consistency (can the app start? Is data intact?)
- Measure time-to-restore and document findings
3. Monitor Snapshot Growth Trends
Set alerts when snapshot-consumed space exceeds thresholds. In the Pure GUI, navigate to Storage → Volumes and sort by Snapshots column. Investigate volumes where snapshots consume >30% of total volume capacity— this may indicate excessive change rate or retention period misalignment.
4. Application-Consistent Snapshots
For databases (Oracle, SQL Server, PostgreSQL), coordinate snapshots with application quiesce commands:
- SQL Server: Use VSS integration with Pure Storage VSS Provider
- Oracle: Put tablespaces in hot backup mode before snapshot, then end backup mode
- VMware: Leverage vSphere snapshot integration with Pure Storage plugin
Common Mistakes to Avoid
Over-Snapshotting Low-Change Workloads
Taking 15-minute snapshots of a file share that changes once per day wastes both snapshot slots and administrative overhead. Match snapshot frequency to workload change rate and business RPO—not arbitrary "more is better" thinking.
Treating Snapshots as Backups
Snapshots protect against logical corruption (accidental deletes, ransomware, bad updates) but not against array-level failures or site disasters. Always pair snapshot policies with array replication (ActiveCluster) or external backups to tape/cloud for comprehensive data protection.
Ignoring Compliance Retention
Regulatory requirements (HIPAA, SOX, GDPR) may mandate specific retention periods. Document these requirements in your policy matrix and implement automated retention enforcement. Never rely on manual snapshot management for compliance-critical data—automation prevents gaps.
No Snapshot Naming Standards
Manual snapshots without standardized naming become unmanageable quickly. Enforce formats like
{volumename}.{timestamp}.{purpose} (e.g., vol-db-prod.20260214-1500.pre-patch).
Monitoring and Alerting
Integrate snapshot health monitoring into your observability stack:
- Alert on failed snapshot schedules: Use Pure1 or Prometheus exporters to detect protection group failures
- Track snapshot age: Trigger warnings if most recent snapshot is older than expected RPO
- Capacity trending: Dashboard snapshot-consumed space over time to forecast capacity needs
# Prometheus query example: snapshot age threshold
time() - pure_volume_snapshot_created_timestamp > 3600 # Alert if no snapshot in past hour
Next Steps: Automation and Integration
Once base policies are established, enhance with these advanced capabilities:
- Policy-as-Code: Store protection group definitions in Git, deploy via CI/CD pipelines
- Self-Service Portals: Let developers create volumes with auto-assigned snapshot policies based on tags
- Automated Testing: Script restore validation tests that run monthly and report success/fail metrics
- Replication Orchestration: Pair snapshot policies with async replication schedules for DR automation
A mature snapshot strategy isn't just about taking snapshots—it's about integrating data protection into your operational workflows, monitoring effectiveness, and continuously optimizing based on real-world restore needs.
💬 Discussion
Have questions or feedback about this guide? Found a better approach?