I often use KVM-based virtual machines. One of my personal KVM windows VMs runs Windows Server 2008R2 (mostly so I can fool around with Windows server system administration once in a while, and to test various interop scenarios).
On initial install, the guest was using emulated IDE and SATA drivers, for ease of installation. After install, with some fiddling around I switched to paravirtualised drivers for both the boot and data volumes.
These drivers give a faster path from the Windows guest’s requests to the physical I/O than occurs when going through the IDE or SATA emulation layers. Technically, they are ‘paravirtualised‘ network- and block-device drivers. In my case, the increase in speed of network and disk I/O operations was quite easily perceptible, so I haven’t even bothered to benchmark the before/after scenarios.
This was all working great with version 0.1-49 of the Windows virtio drivers.
After a little while, I decided (perhaps foolishly) to update to the then newly-released version 0.1-52.
After the reboot, the Windows guest BSOD‘d at boot every time.
This problem is difficult to root-cause. if there were solely Free Software in the mix here, I would have a hope of debugging this and fixing it myself; but with the commercial, closed-source Microsoft Windows in scope, even though (on opinion) under applicable legislation I am allowed to reverse engineer Windows to root-cause this fault, realistically the effort required to do so is unreasonable.
Failed Attempts to Recover
The following recovery methods were attempted but didn’t prevent the BSOD at boot:
- Hammering away at F8 early in VM boot, then selecting ‘Last Known Good Configuration’ from the menu that appears; and
- Hammering away at F8 early in VM boot, then selecting ‘Safe Mode’ from the menu that appears.
- Modifying the VM configuration to use emulated SATA or IDE instead of virtio for all volumes resulted in another BSOD.
Hammering away at F8 early in VM boot then selecting ‘System Recovery Options’ did not result in a bluescreen at boot, but none of the virtual hard disks was visible, presumably because no paravirtualised storage drivers were loaded. Attempting to load any version of the paravirtualised storage drivers (even the older, known-good version) from a virtual CD-ROM caused an immediate bluescreen.
- The new version of the viostorage drivers, in conjunction with other factor(s), causes the bluescreen, whereas the older version does not (confirmed by experiment);
- The upgrade to the newer version of the drivers has succeeded (confirmed by inspecting the NTFS filesystem in the KVM image from Linux), and they are being loaded at boot even in safe mode, causing the BSODs (confirmed by inspection of the BSOD from booting into safe mode);
- Even when instructed to load the older drivers in the recovery console, Windows somehow obtains access to the newer buggy drivers and performs some operation or other with them, which causes another BSOD (this is a complete stab-in-the-dark guess – but it would be just like Windows to disobey the sysadmin in this way);
- There is no (visible to me) way to specify which drivers to use other than the obvious way, attempted in step 3 above (Googling found nothing);
- Therefore, we need to forcibly deny Windows access to the newer version of the drivers.
- Shut down the guest.
- In the Linux VM host, examine the guest’s virtual machine image (which in my case is in RAW format and is stored in an LVM2 LV
sudo fdisk /dev/vgdata/win2k8
It contains two partitions: the first for recovery, and the second containing the Windows installation itself. This two-partition configuration is the standard one for Windows Server 2008R2 – this is what one ends up with if one lets the Windows installer do the partitioning.
- Note the start offset of the second partition (the one that contains the Windows installation, which in turn contains the problem driver). In my case, this is 206848. This is in units of sectors in fdisk by default.
- Multiply the partition’s start offset in sectors (206848) by the size of a sector in bytes (512), yielding the start offset in bytes of the partition from the start of the VM image (105906176 == 206848 * 512).
- Mount the filesystem located on the second partition, by use of the ‘offset’ parameter in a loopback mount:
sudo mount -t ntfs -o loop,offset=105906176 /dev/vgdata/win2k8 /mnt
- Search the Windows filesystem for files named similarly to viostor.sys:
find /mnt/ -iname \*viostor.sys\* -print
- Similarly renamed both
disabled.viostor.sys.disabledas in step 3 (but left them in those same directories);
- But I left
Windows/LastGood/system32/DRIVERS/viostor.sysalone, because I thought Windows might actually have got it right, so that ‘LastGood’ might really be the last known good version (despite the failure of the ‘Last Known Good Configuration’ boot option).
- In the VM configuration, change all storage drivers to IDE or SATA (ie, no longer using the paravirtualised viostor drivers);
- Boot up guest again and check that everything’s working, which for me it was. Recovery seemingly complete.
However, I still want the paravirtualised disk and network drivers – just the working ones, not the ones that cause a BSOD!
So I needed to re-perform the switch from emulated to paravirtualised storage and network drivers.