Tips and Tricks for Making VM Migration More Secure

881

A challenge for any cloud installation is the constant tradeoff of availability versus security. In general, the more fluid your cloud system (i.e., making virtualized resources available on demand more quickly and easily), the more your system becomes open to certain cyberattacks. This tradeoff is perhaps most acute during active virtual machine (VM) migration, when a VM is moved from one physical host to another transparently, without disruption of the VM’s operations. Live virtual machine migration is a crucial operation in the day-to-day management of modern cloud environment.

For the most part, VM migration executed by modern hypervisors, both commercial proprietary and open source, satisfy the security requirements of a typical private cloud. Certain cloud systems, however, may require additional security. For example, consider a system that must provide greater guarantees that the virtual resources and virtual machine operations on a single platform are isolated between different (and possible competing) organizations. For these more restrictive cloud installations, VM migration becomes a potential weak link in a company’s security profile. What do these advanced security threats look like?

Advanced cyberattacks that are specific to VM migration

  • Spoofing: Mimicking a server to gain unauthorized access. This is essentially a variation of the traditional man-in-the-middle (MITM) attack.

  • Thrashing: A sophisticated denial-of-service (DOS) attack. In this case, the attacker deliberately disrupts the migration process at calculated intervals, so the migration is continually restarted over and over, consuming extra compute and network resources as a result. An alternative attack, for systems with automatic migration as part of an orchestration-level load balancing strategy, the DOS attack attempts to force repeated VM migration to burden system resources.

  • Smash and Grab: Forcing a VM image, either at the source or destination server host, into a bad state for the purpose of disrupting operations or exfiltrating data.

  • Bait and Switch: Creating a deliberate failure at a precise moment in the migration process so that a valid copy of a virtual machine is present on both the source and destination host servers simultaneously. The intent of this attack is to exfiltrate information without detection.

Before addressing a possible remedy to each of these attacks, a few more details regarding VM migration are necessary. First, in this context we are only concerned about active VM migration, or migration that does not interrupt the operation of migrating VMs. Migration of a VM that has been halted or powered down is not considered here. Also, the approaches we describe here are limited to the hypervisor and its associated toolstack. Any security profile must also include the hardware platform and network infrastructure, of course, but for this discussion these facets of the system are out of scope.

Lastly, the type of storage employed by the cloud has a big impact on VM migration. Networked storage, via protocols such as iSCSI, NFS, or FibreChannel, are more complicated to configure and maintain than local server storage, but simplify (and speed up) migration because the VM image itself typically does not need to be copied across a network to a separate physical storage device.

Note, however, that the attacks listed above are not solved by using networked storage. While the risk is somewhat alleviated by reducing exposure of the VM image to the network, and the timing window to exploit the migration process is significantly shortened with shared storage, these attacks remain viable. State data and VM metadata still must be passed over the network between server hosts, and the attacks outlined above rely on the corruption of the migration process itself.

With the ground rules understood, let’s dive into a basic approach to address each of the migration cyberattacks.

How to address VM migration cyberattacks

Spoofing: Man-in-the-middle attacks are well studied, and modern hypervisors should already utilize the proper authentication protocols integrated within its migration process to prevent this class of attack. The most common variations of Xen, for example, include public key infrastructure support for mutual authentication via certificate authorities or shared keys to guard against MITM attacks. For any new installation, it is worth verifying that proper authentication is available and configured properly.

Thrashing: External DOS attacks are usually best addressed outside of the hypervisor, within the network infrastructure. Systems that use orchestration software to automate VM migration for load balancing, or even defensive purposes, should be configured to guard against DOS attacks as well. Automated migration requests should be throttled to prevent network contention and avoid overloading a single host.

Smash and Grab: This attack attempts to disrupt the migration process at an opportune moment so that the VM state data is corrupted or forced out of sync with the VM image at the source or destination server, rendering the VM either temporarily or permanently disabled. A smash-and-grab attack could behave like DOS attack over the network, or could be enacted by malware in the hypervisor. In both incarnations, this attack tests how well the migration process recovers from intermittent failures, and how well the migration process can be rolled back to a prior stable state.

The robustness of a migration recovery scheme varies greatly from one hypervisor toolstack to another. Regardless, there are a few steps that can minimize this type of threat:

  • First, before initiating a migration, you should regularly create snapshots of the important VM images, so that you always have a stable image to go back to in case disaster strikes. This is common practice but bears repeating because it is often forgotten.

  • Second, many hypervisor toolstacks support customization of the migration process via scripts. For example the XAPI toolstack has hooks for pre- and post-migration scripts, where an industrious system designer or admin can insert their own recovery support at both the source and destination hosts. Support can be added to validate that the migration is successful, for instance, and bolster recovery procedures if a failure has occurred. These scripts also provide a means of adding special configuration support for specific VMs after a successful migration, such as adjusting permissions, or updating runtime parameters.

  • Third, and perhaps most importantly, when a VM migration completes, either successfully or unsuccessfully, it is likely that an old footprint of the migrated image is still hanging around. For successful migrations, a footprint will remain on the source server, while failed migration attempts often result in a residual footprint remaining on the destination server. While the hypervisor toolstack may delete the old VM image file after completion, very few toolstacks actually erase the image from disk storage. The bits representing the VM image are still on disk, and are vulnerable to exfiltration by malware. For systems with high security requirements, it is therefore necessary to extend the toolstack to zeroize (i.e., fill with zeroes) the blocks on disk previously representing an old VM image, or fill the blocks with random values. Depending on the storage device, hardware support may exist for this as well.

Bait and Switch: We can approach the bait-and-switch attack as a variation of the smash-and-grab attack, and the mitigation of this threat is the same. For the bait-and-switch attack to succeed, a residual copy of the aborted VM migration attempt must remain on the destination server. If the VM footprint is immediately scrubbed from the disk, the risk of this attack is also greatly reduced.

In summary, it is important to understand the security implications of active migration. The system behavior under specific attack scenarios and failure conditions during the migration process must be taken into full account when designing and configuring a modern cloud environment. Luckily, if you follow the steps above, you maybe be able to avoid the security risks associated with VM migration.

John Shackleton is a principal research scientist at Adventium Labs, where he is the technical lead for a series of research and development projects focused on virtualization and system security, in both commercial and government computing environments.

Stay one step ahead of malicious hackers with The Linux Foundation’s Linux Security Fundamentals course. Download the sample chapter today!