I was doing some housecleaning on one of my Microsoft Hyper-V clusters and noticed that five VMs had checkpoints from various times in the past. Typically you want to keep checkpoints (formerly known as "snapshots") around for only a short while since merging can cause performance degradation and sometimes it just flat-out doesn't work. I was pretty sad to see these were years old. Sigh.
Well, these VMs were on a high performance SAN and they were all non-critical, so I shut them down. I then took a SAN snapshot of the volumes so I could revert in case they failed. Enough layers of snapshots yet? Anyway. I deleted the checkpoints on the first VM. Nothing seemed to be moving. I checked Resource Monitor to see if there was any disk activity on those volumes. Nothing. I gave it an hour and it was just stuck at 0%, showing "Merge in Progress" but it was pretty clear no merge was in progress. This was what I was afraid of.
So next, I nuked the volume, reverted to the SAN snapshot and tried using the PowerShell Merge-VHD command. It also got stuck at 0%. I gave it 15 minutes, saw no activity, and nuked that one too. So, thinking maybe it was the host, I duplicated the SAN snapshot to another host and tried again with Merge-VHD. Nothing. Clearly it's the VHDX/AVHDX files that are the problem.
What can we do?
Well, it's actually possible to merge another way. If you export a checkpoint, it will flatten (merge) the VHD files down to the exported location. Could this possibly work where other methods failed? Shockingly, it did. Not only did it merge, it merged really fast. It went basically as fast as my SAN would go, which was a nice change.
So how do you do this? First you need a pristine copy of the Hyper-V VM. That means all the undisturbed VHDX files, VM configuration files, and everything else. You can take advantage of snapshots from your storage system like I did or restore from backup.
Once you get the files, you need to import the VM (if you restored from backup, this may not be necessary). Go through the wizard, when it asks which VM you want to import, you will see several dates. Choose the oldest one. It's kind of not intuitive, because it seems like you should specify the date of the most recent checkpoint, but what's happening is it's actually asking you for which disk in the chain is the first one. Complete the wizard.
Now for another non-intuitive step. We need to take a fresh checkpoint here, even though the other checkpoints are giving us fits. The reason is because we are going to be exporting this checkpoint and we want the state of the VM to be completely up-to-date. So, that means we need to create a new checkpoint.
After it finishes creating, right click on the new checkpoint and select Export. Go through the wizard and select a location with plenty of free disk space to store the merged VHDX.
After a little time, you will have a fully exported VM with one flat VHDX file (or more if you have more disks). From here you can either import it from the temporary location, or simply replace all the VHDX and AVHDX files with the merged one and edit the Hyper-V configuration to point to the new VHDX file.
From here, checkpoints should be working again. You should be able to create and delete them without issue. Until the next bug.