As is typical, things go wrong at exactly the worst time.
I have a work laptop that is strictly work and I have a home PC that is a mix of work and gaming. I'd also booked three weeks off for Christmas and was planning some serious gaming time as I don't get that much time in the working week.
A few days before my holiday starts I upgraded Windows 10, let steam push down all the updates it needed to and even cleaned out all the dust bunnies. Everything was looking good. It was then that the PC bluescreened with an "Unexpected Store Exception" - Not an error I'd ever seen before.
Surely just one of those things, a reboot and......... Nope. The machine refused to boot. Operating system not found.
A few days before starting my holiday, my main computer had decided to throw a major wobbly because the SSD had failed. I don't have RAID1 or anything like that on my desktop because it's just a desktop and...., well, maybe I should.
However, I am paranoid so I have the Veeam Agent for Windows running on both the work laptop and the main PC. These machines back up to area on a FreeNAS SMB share. It's all automatic and to be fair, I very rarely check it because it just works and I was now in need of the backups I was hoping it had been taking.
One nice feature of the Windows agent is that when it's installed or upgraded, it generates a bootable ISO file that can be used for a bare metal restore. I always put a copy of this into the backup folder where I point the veeam agent for that machines backups, this is just for convienience. I could grab the ISO file from a VBK of the machine and extract it that way but it's just effort.
Once I'd swapped out the SSD for a replacement and used Rufus to convert the Veeam bootable ISO to a bootable USB stick I kicked off the bare metal restore.
Fortunatley, Veeam had multiple backup points and I simply restored from the last once which was taken about an hour before hand.
Because of the size the data, the restore took just over 6 hours. Once done, I rebooted and...... nothing. Boot failure. At this point I will admit to being confused, the restore of the whole disk worked with no errors but still there was this boot failure error message.
I'll admit that it was a bit of desperation that lead me to trying something a little odd - I used Veeam Agent to restore the EFI partition again from a manual parition restore but this time I picked a much older restore point just in case something odd was going on with that boot partition, because of the size of the EFI partition, this restore took just a few seconds and one reboot later everything was working again. No data lost at all.
It seems that what happened was that the SSD failure was very different to a hard drive failure. Normally when a hard drive dies it's pretty catastrophic and there is a lot of noise. It's pretty obvious that a disk has died.
Here, it seems that the SSD failure was a lot more subtle. I'm not sure if the data on the SSD was lost because of the failure or if the failure just corrupted something on the boot sector, either way it seems pretty clear that restoring from the latest restore point was fine for the data but not fine for the boot partition. Clearly, Veeam Agent had backed up corrupt data and when the backup was run an hour before the PC crashed, Veeam Agent backed up a corrupt EFI partition because the SSD was already failing.
I've learned a few things out of this, firstly Veeam Agent is clearly awesome but perhaps, more importantly, SSD failures are very different to those of hard drive failures. SSD's can fail in such a way that it may not be immediately obvious. This means that it's more important than ever to have backups both onsite and remote of key data because SSD's are becoming more and more prevelant in todays world and, like any new tech, they can fail in new and interesting ways.
Subscribe to Ramblings of a Sysadmin
Get the latest posts delivered right to your inbox