NetApp

NetApp ONTAP Code Update: A Complete NDU Maintenance Runbook

Published May 25, 2026 | 14 min read

Overview

This runbook walks through a planned NetApp ONTAP nondisruptive upgrade (NDU) from start to finish. It follows the same structure as an enterprise Method of Procedure (MOP): prerequisites, preparatory work, pre-change health checks, communications, the upgrade itself, monitoring, post-validation, and a backout plan.

ONTAP’s automated upgrade path handles node failover, image activation, and giveback in sequence — but the procedure still requires deliberate pre- and post-checks to confirm cluster readiness and capture evidence. NetApp documents the core automated upgrade command as cluster image update.

The example target version throughout this runbook is 9.17.1P3.

Prerequisites

Before opening a maintenance window, confirm the following are in place:

Area	Requirement
Target ONTAP image	Downloaded, staged, and hash-verified (at least 24 hours prior)
Maintenance window	Duration agreed and recorded in your change record (e.g. 4 hours)
Health alerts	No active system health alerts
Storage failover	Healthy and possible for all HA pairs
Aggregates	All online; no broken disks
Data LIFs	All on home ports — exceptions documented
Monitoring	AIQUM maintenance mode plan confirmed for the cluster
Change record	Change ticket created; command outputs will be attached as evidence

Preparatory Work — Image Staging

Complete this at least 24 hours before the maintenance window so any image validation issues can be resolved without time pressure.

1. Download the Image

Download the target ONTAP package from mysupport.netapp.com. Select the correct version family (e.g., NetApp ONTAP 9.17.x) and download both the image (image.tgz) and its MD5 hash file.

2. Verify the Hash

On your staging server, verify the file integrity before uploading to the cluster:

# PowerShell — run on the staging server
Get-FileHash -Algorithm MD5 "C:\staging\NetApp\image.tgz"

Compare the output against the MD5 hash file from the NetApp support portal. Do not proceed if there is a mismatch.

3. Upload the Image to the Cluster

From an ONTAP CLI session, pull the image from your internal staging server:

cluster image package get -url http://<staging-server>/NetApp/image.tgz

4. Validate the Image

cluster image validate -version 9.17.1P3

Review all output. Fix any errors before the maintenance window. Warnings should be reviewed and documented in the change record. Do not ignore them without understanding the impact.

Pre-Change Health Checks

Run the following commands immediately before starting the maintenance window. Attach all output to the change record as your before-state evidence.

1. System Health Alerts

system health alert show

Expected result: no active alerts. Investigate and resolve any alerts before proceeding.

2. Storage Failover Status

storage failover show

Expected result: all HA pairs show state Connected and Takeover possible: true.

3. Aggregate Status

storage aggregate show -state !online

Expected result: no output (all aggregates are online).

4. Broken Disks

storage disk show -broken

Expected result: no output (no broken disks).

5. LIF Home Port Status

net int show -is-home false

Expected result: no output (all LIFs are on home ports). If LIFs are off home ports, either revert them or document the exceptions before starting.

6. Current Cluster Version

cluster image show

Record the current running version for the change record.

7. Cluster and Node Health

cluster show
system node show -fields health

All nodes must be healthy before starting the upgrade.

8. SnapMirror Status (If Applicable)

snapmirror show -fields status,healthy

Confirm there are no unhealthy or unexpected SnapMirror relationships that could be interrupted by a node failover during the upgrade.

Communications — Starting Maintenance

Before executing the change:

Email notification — Notify storage operations stakeholders that the maintenance is starting.
Chat notification — Post to your team channel: <CHANGE_ID> - <cluster-hostname> NetApp ONTAP Code Update - Starting
Change record — Transition the change ticket to Implementing.

Change Execution

1. Pause Monitoring

If using AIQUM (Active IQ Unified Manager), log in and enable maintenance mode for the cluster for the duration of the maintenance window. This suppresses expected alerts during the failover/giveback cycles and prevents false escalations.

2. SSH Into the Cluster

Connect to the cluster management LIF. The automated upgrade manages each node in sequence; you do not need to SSH to individual nodes.

3. Generate “Start Maintenance” AutoSupport

This creates a timestamped record in NetApp’s systems and sets a maintenance window suppression period:

system node autosupport invoke -node * -type all -message "MAINT=4h Starting_Code_Update"

Adjust the MAINT=Xh duration to match your approved maintenance window.

4. Initiate the Nondisruptive Upgrade

The following command triggers the full automated upgrade sequence: validation, node failover, image activation, giveback, and repeat for each remaining node.

cluster image update -version 9.17.1P3 -pause-after none -ignore-validation-warning false -skip-confirmation false -stabilize-minutes 4 -nodes *

Key flags: - -pause-after none — upgrade all nodes without manual pause between them - -ignore-validation-warning false — do not silently bypass warnings - -stabilize-minutes 4 — wait 4 minutes after each node giveback before moving to the next node

Note: If your session disconnects during the upgrade, open a new SSH session and continue monitoring. The upgrade continues regardless of your session state.

Monitoring Progress

Open a separate SSH session to monitor progress without interrupting the upgrade session.

Live Progress

cluster image show-update-progress

Run this repeatedly or leave it open. It shows the current upgrade phase and per-node status.

Upgrade History

cluster image show-update-history

Shows completed phases and their timestamps — useful for evidence capture.

HA Status During Upgrade

storage failover show

During the upgrade, one node will be in takeover state at a time. Verify each node returns to Takeover possible: true before the upgrade moves to the next HA pair.

Backout Plan

Warning: Only execute the following steps if the automated upgrade fails and cannot be resumed.

Engage NetApp Support before starting any backout actions. Revert and downgrade procedures are Support-led and must be scoped to your specific ONTAP version, platform, and failure state.

Step 1 — Set the Boot Image

Tell ONTAP which image to boot on next restart:

system image modify -node * -image 9.16.1P2
system node image show

Step 2 — Verify Revert Preconditions

Switch to advanced privilege mode and run a check-only pass before committing:

set -privilege advanced

system node revert-to -node <node1> -check-only true -version 9.16.1P2

Resolve any issues reported before proceeding.

Step 3 — Execute the Revert

system node revert-to -node <node1> -version 9.16.1P2

Step 4 — Filesystem Revert (If Required)

Only needed if directed by NetApp Support for a specific failure state:

system node run -node <node1>
revert_to 9.16.1P2
boot_ontap

Step 5 — Repeat for the HA Partner Node

Perform Steps 2–4 on the partner node of each affected HA pair.

Step 6 — Restore HA Configuration

cluster ha modify -configured true
storage failover modify -node <node1> -enabled true

Post-Validation

Run the following commands after the upgrade completes. Attach all output to the change record as your after-state evidence.

1. Confirm Target Version

version -v

The output must show 9.17.1P3.

2. Confirm Upgrade Completion

cluster image show
cluster image show-update-history

All nodes must show the target version. The update history must show completed with no failures.

3. Storage Failover Status

storage failover show

All HA pairs must be back to Connected and Takeover possible: true.

4. LIF Placement

net int show -is-home false

All LIFs should be back on their home ports. If any are not, revert them:

network interface revert -vserver <svm-name> -lif <lif-name>

5. System Health Alerts

system health alert show

Expected result: no new active alerts.

6. Cluster and Node Health

cluster show
system node show -fields health

All nodes must be healthy.

7. Generate “End Maintenance” AutoSupport

system node autosupport invoke -node * -type all -message "MAINT=END"

Resume Monitoring

Disable AIQUM maintenance mode for the cluster once all post-validation checks pass. Confirm monitoring dashboards are showing green before calling the change complete.

Communications — Completing Maintenance

Email notification — Notify storage operations stakeholders that the maintenance is complete.
Chat notification — Post to your team channel: <CHANGE_ID> - <cluster-hostname> NetApp ONTAP Code Update - Completed
Change record — Attach all pre- and post-validation command output, then transition the ticket to Implemented.

Key Takeaways

Stage and validate the image at least 24 hours before the window — validation errors found during the change are avoidable.
Capture storage failover show, system health alert show, and cluster image show output before and after as your change evidence.
The cluster image update command handles failover and giveback automatically — avoid manual node intervention unless directed by NetApp Support.
Keep AIQUM maintenance mode active for the full upgrade window to suppress expected failover alerts.
Engage NetApp Support before starting any backout. Reverting ONTAP is not a self-service operation.

Comments

================================================================================ NetApp ONTAP NDU Code Update - Method of Procedure (MOP) Plain-Text Template - Fill via Find and Replace ================================================================================ HOW TO USE THIS TEMPLATE -------------------------------------------------------------------------------- This file is plain text. Do NOT print it and hand-write values. Instead: 1. Save a working copy named after your change (for example: CHG0012345-cluster01-ndu.txt). 2. Open the working copy in Notepad, Notepad++, VS Code, Sublime, or any text editor. Use "Find and Replace" (Ctrl+H in most editors) to substitute every token below with your real value. Replace ALL occurrences. 3. The remaining "[ ]" checkboxes are meant to be edited DURING the change: change "[ ]" to "[x]" once each step is verified. Add notes after the command line if helpful. 4. Save the completed copy and attach it to the change ticket as evidence. -------------------------------------------------------------------------------- TOKEN MANIFEST - Replace each of these BEFORE the maintenance window -------------------------------------------------------------------------------- <<CHANGE_ID>> Change/ticket ID (e.g. CHG0012345) <<CLUSTER_HOSTNAME>> ONTAP cluster name (e.g. cluster01) <<CLUSTER_MGMT_LIF>> Cluster management LIF or IP <<CURRENT_ONTAP_VERSION>> Currently installed ONTAP version <<TARGET_ONTAP_VERSION>> Version you are upgrading to (e.g. 9.17.1P3) <<PREVIOUS_ONTAP_VERSION>> Same as CURRENT, used by backout section <<NODE1>> First node name in HA pair <<NODE2>> Second node name in HA pair <<SVM_NAME>> Example SVM used in commands (post-validation) <<LIF_NAME>> Example LIF name (post-validation) <<MAINT_DATE>> Maintenance date (YYYY-MM-DD) <<MAINT_TIME>> Maintenance start time (HH:MM) <<MAINT_WINDOW_HOURS>> Length of maintenance window in hours (e.g. 4) <<TIMEZONE>> Time zone (e.g. America/New_York) <<ENGINEER_NAME>> Engineer performing the change <<APPROVER_NAME>> Change approver / CAB rep <<STAGING_SERVER>> Hostname or IP serving the ONTAP image <<STAGING_PATH>> Full path or URL to the image file <<IMAGE_FILE>> Image filename (e.g. 9.17.1P3_q_image.tgz) <<IMAGE_MD5>> MD5 hash published on NetApp Support Site DURING-CHANGE FIELDS (filled in as you go - leave token until then if you like): <<MAINT_START_TIME>> Actual start time recorded at SECTION 4 <<MAINT_END_TIME>> Actual end time recorded at SECTION 10 <<NOTES_VALIDATION_WARNINGS>> Free-form notes about any validation warnings <<NOTES_HOME_LIFS>> Notes if any LIFs needed to be reverted <<SIGNOFF_ENGINEER_TIME>> Engineer sign-off time <<SIGNOFF_APPROVER_TIME>> Approver sign-off time ================================================================================ HEADER ================================================================================ Change ID : <<CHANGE_ID>> Cluster : <<CLUSTER_HOSTNAME>> Current Version : <<CURRENT_ONTAP_VERSION>> Target Version : <<TARGET_ONTAP_VERSION>> Maintenance Date : <<MAINT_DATE>> <<MAINT_TIME>> <<TIMEZONE>> Window Length : <<MAINT_WINDOW_HOURS>> hours Engineer : <<ENGINEER_NAME>> Approver : <<APPROVER_NAME>> ================================================================================ SECTION 1 - PRE-CHANGE STAGING (T-7 to T-1 days) ================================================================================ 1.1 Download the target ONTAP image from the NetApp Support Site. [ ] Image file: <<IMAGE_FILE>> [ ] Published MD5: <<IMAGE_MD5>> 1.2 Stage the image on <<STAGING_SERVER>> at <<STAGING_PATH>> so the cluster can reach it over HTTP/HTTPS. [ ] Image staged and reachable from <<CLUSTER_HOSTNAME>> 1.3 Pre-stage the image on the cluster (does NOT activate it): cluster image package get -url http://<<STAGING_SERVER>>/<<IMAGE_FILE>> cluster image package show-repository [ ] Package present in repository [ ] MD5 reported by ONTAP matches <<IMAGE_MD5>> 1.4 Confirm change ticket <<CHANGE_ID>> is approved and the maintenance window is communicated. [ ] Ticket approved [ ] Stakeholders notified (date/time/window/impact) ================================================================================ SECTION 2 - COMMUNICATIONS - STARTING MAINTENANCE ================================================================================ 2.1 Send pre-maintenance notification. [ ] Email sent to storage operations stakeholders [ ] Chat notification posted: "<<CHANGE_ID>> - <<CLUSTER_HOSTNAME>> NetApp ONTAP Code Update - Starting now. Expected window: <<MAINT_WINDOW_HOURS>> hours." 2.2 Disable monitoring noise during the window. [ ] AIQUM cluster maintenance mode enabled for <<CLUSTER_HOSTNAME>> [ ] Pager / alerting suppressed for the maintenance window ================================================================================ SECTION 3 - PRE-CHANGE HEALTH CHECKS ================================================================================ Run from the cluster management LIF (<<CLUSTER_MGMT_LIF>>). Capture ALL output and attach to <<CHANGE_ID>> as pre-state evidence. 3.1 version -v [ ] Current version matches <<CURRENT_ONTAP_VERSION>> 3.2 cluster image show cluster image show-update-history [ ] No prior upgrade in a failed/in-progress state 3.3 storage failover show [ ] All HA pairs: Connected, Takeover possible: true 3.4 storage aggregate show -state !online [ ] No aggregates in degraded/offline state 3.5 storage disk show -broken storage disk show -container-type spare [ ] No broken disks [ ] Sufficient spare coverage 3.6 cluster show system node show -fields health [ ] All nodes healthy, eligible, and in quorum 3.7 net int show -is-home false [ ] No output (all LIFs on home ports) 3.8 system health alert show [ ] No active alerts 3.9 Send pre-maintenance AutoSupport with maintenance window length: system node autosupport invoke -node * -type all \ -message "MAINT=<<MAINT_WINDOW_HOURS>>h" [ ] AutoSupport sent ================================================================================ SECTION 4 - VALIDATION ================================================================================ Maintenance start time recorded: <<MAINT_START_TIME>> 4.1 Run the cluster image validation against the staged target: cluster image validate -version <<TARGET_ONTAP_VERSION>> 4.2 Review the validation output carefully. [ ] No "blocking" issues reported [ ] All "required" actions completed prior to running update Notes / warnings observed: <<NOTES_VALIDATION_WARNINGS>> ================================================================================ SECTION 5 - EXECUTE THE NDU ================================================================================ 5.1 Open a SECOND SSH session for monitoring before issuing the update. [ ] Second session connected to <<CLUSTER_MGMT_LIF>> 5.2 From the primary session, start the nondisruptive upgrade: cluster image update -version <<TARGET_ONTAP_VERSION>> \ -pause-after none \ -ignore-validation-warning false \ -skip-confirmation false \ -stabilize-minutes 4 \ -nodes * [ ] Command accepted - upgrade running NOTE: The upgrade continues even if your SSH session disconnects. Do NOT manually fail over nodes during the upgrade. ================================================================================ SECTION 6 - MONITORING ================================================================================ 6.1 Monitor progress from the second SSH session (run repeatedly): cluster image show-update-progress 6.2 Review completed phases and timestamps: cluster image show-update-history 6.3 Watch HA status during each node's upgrade cycle: storage failover show [ ] Each node returns to "Takeover possible: true" before the next node starts Expected sequence per HA pair: <<NODE1>>: takeover -> activate image -> boot -> giveback -> stabilize (4 min) <<NODE2>>: takeover -> activate image -> boot -> giveback -> stabilize (4 min) ================================================================================ SECTION 7 - BACKOUT PLAN ================================================================================ IMPORTANT: Only execute if the automated upgrade fails and cannot be resumed. Engage NetApp Support BEFORE starting any backout action. 7.1 Set boot image to the previous version: system image modify -node * -image <<PREVIOUS_ONTAP_VERSION>> system node image show 7.2 Verify revert preconditions (switch to advanced privilege first): set -privilege advanced system node revert-to -node <<NODE1>> -check-only true \ -version <<PREVIOUS_ONTAP_VERSION>> [ ] No blocking issues reported 7.3 Execute revert: system node revert-to -node <<NODE1>> -version <<PREVIOUS_ONTAP_VERSION>> 7.4 Filesystem revert (only if directed by NetApp Support): system node run -node <<NODE1>> revert_to <<PREVIOUS_ONTAP_VERSION>> boot_ontap 7.5 Repeat steps 7.2 - 7.4 for each remaining node (<<NODE2>>, ...). 7.6 Restore HA configuration after all nodes are reverted: cluster ha modify -configured true storage failover modify -node <<NODE1>> -enabled true ================================================================================ SECTION 8 - POST-VALIDATION ================================================================================ Attach all command output to <<CHANGE_ID>> as after-state evidence. 8.1 version -v [ ] Output shows: <<TARGET_ONTAP_VERSION>> 8.2 cluster image show cluster image show-update-history [ ] All nodes on <<TARGET_ONTAP_VERSION>> [ ] Update history shows completed with no failures 8.3 storage failover show [ ] All HA pairs: Connected, Takeover possible: true 8.4 net int show -is-home false [ ] No output (all LIFs on home ports) If any LIFs are off home ports: network interface revert -vserver <<SVM_NAME>> -lif <<LIF_NAME>> Notes: <<NOTES_HOME_LIFS>> 8.5 system health alert show [ ] No new active alerts 8.6 cluster show system node show -fields health [ ] All nodes healthy 8.7 Send end-maintenance AutoSupport: system node autosupport invoke -node * -type all -message "MAINT=END" [ ] AutoSupport sent ================================================================================ SECTION 9 - RESUME MONITORING ================================================================================ [ ] AIQUM maintenance mode disabled for <<CLUSTER_HOSTNAME>> [ ] Monitoring dashboards showing green [ ] No unexpected alerts after maintenance mode lifted ================================================================================ SECTION 10 - COMMUNICATIONS - COMPLETING MAINTENANCE ================================================================================ [ ] Email notification sent to storage operations stakeholders [ ] Chat notification posted: "<<CHANGE_ID>> - <<CLUSTER_HOSTNAME>> NetApp ONTAP Code Update - Completed" [ ] Pre/post validation output attached to <<CHANGE_ID>> as evidence [ ] Change ticket <<CHANGE_ID>> transitioned to: Implemented [ ] Maintenance end time recorded: <<MAINT_END_TIME>> ================================================================================ SIGN-OFF ================================================================================ Engineer: <<ENGINEER_NAME>> Time: <<SIGNOFF_ENGINEER_TIME>> Approver: <<APPROVER_NAME>> Time: <<SIGNOFF_APPROVER_TIME>> ================================================================================ END OF MOP ================================================================================