Securing your Linux VMs in Azure with Azure Disk Encryption

Securing your Linux VMs in Azure with Azure Disk Encryption

This week I’ve been working with a set of Linux virtual machines hosted in Azure. They are all Ubuntu 18.04 LTS, so relatively new – but not the latest release (which at the time of writing this is 21.04).

Once you provision a Linux VM, it won’t have disk encryption enabled. The storage account where the disk is stored will have encryption-at-rest, though.

What’s the difference? Encryption-at-rest encrypts all blobs, but it’s still unencrypted if someone gets ahold of the disk image. And it can be mounted like any regular disk image in another VM. With disk encryption, you can encrypt the internals of the disk while it’s in use (and mounted) in a VM – thus, if anyone gets their hands on the disk somehow, they can’t read any of its contents without a proper key.

On Windows VMs, this is handled with BitLocker. On Linux, it’s handled with DM-Crypt. Together, these are called Azure Disk Encryption. It isn’t enabled by default, which makes sense as it needs to tinker with the internals of your VM once enabled.

Enabling Azure Disk Encryption

I set out to test how Azure Disk Encryption (or just ADE for short) works. On paper, it seems very simple: Flip a switch within the VM (under Disks) and you’re good to go.

I provisioned a clean Ubuntu 18.04 LTS VM, using a cheap B2ms size (2 vCore, 8 GB of RAM) with Premium SSD. I figured I’d use the faster storage to make it happen fast. Once provisioned, I ensured I can access the VM with SSH. Everything good so far.

Before enabling ADE, I took a snapshot of the OS disk (I didn’t have any data disks):

This is to ensure that should the OS disk become borked, I can easily recover from the snapshot.

I then enabled Azure Disk Encryption under Disks > Additional settings:

For this to work, you will need an Azure Key Vault. So I provisioned one of those, and added the encryption key there. Keep in mind, that for Key Vault to work you have to grant the special permission Azure Disk Encryption for volume encryption:

Once you commit the disk encryption change, the VM will reboot. This is so that the Azure Extension for Disk Encryption can be installed. It will fiddle with the mounted drives in order to ensure the OS and data disks can be encrypted.

I checked the extensions of the VM, and it deployed the extension properly:

But since I did this same procedure for 3 VMs at the same time (to ensure everything works identically between different VMs), I noticed that sometimes the extension wouldn’t show up in Azure Portal. It easily took up to 20 minutes after the initial reboot for the extension to become visible. Until then, Azure Portal simply stated that the extensions couldn’t be loaded.

While the encryption is in progress, the terminal of the Linux looks worrying:

It’s in maintenance mode. Querying with PowerShell, you can check the status of the VM with Get-AzVMDiskEncryptionStatus:

You cannot get further details than this. While the encryption is in progress, the VM is running (thus incurring cost), but you cannot log in (and this is discouraged, as you might inadvertently run a process that locks files, thus blocking the encryption process).

Once it completes, the progress messages should state, “Encryption succeeded for OS volume.” On my first Linux VM, this took more than three hours. It never seemed to complete, so I rebooted the VM – as I feared it was borked, and the extension list didn’t even load. The serial log was also erroring out. This effectively killed the whole VM – it rebooted only to Grub’s rescue mode.

I recovered from the snapshot – and suddenly, this first VM had a successful encryption status! The extension lighted up, and the progress message stated that the encryption succeeded. Querying with lsblk we can see that /dev/sdc1 is now mounted (and the root is encrypted), and there’s an encrypted partition visible:

But how can this happen? If I recovered by replacing the half-encrypted disk with the fully non-encrypted snapshot, shouldn’t the VM restart the encryption process – as it detects that DM-Crypt never completed? Well, yes. I guess that the encryption process actually completed, but Azure Portal (or the APIs for PowerShell’s cmdlet used to query for the status) was never completed. When I then replaced the disk, I think DM-Crypt quickly recovered and completed the process.

Querying yet again with Get-AzVMDiskEncryptionStatus for the first Linux VM (the one that failed initially), it’s looking better:

What about the second VM, that I did not interrupt? It managed to encrypt the OS disk in less than 10 minutes this time.

Lessons learned

Encryption on Linux is interesting, as it relies so heavily on running a process within the VM. Once that is running, you can only wait. After I waited for hours I felt the process was broken, especially when the VM was in maintenance mode. Keep in mind that the initial VM I had was a clean OS install. Encryption should have been completed in 10 minutes, like for the two other VMs I had.

I’m glad it recovered, even with a recovered snapshotted disk.

The key lesson here is that especially with encryption, you have to test it carefully. I did my tests with three VMs, and for all of them, encryption was successfully (eventually) completed. But I wouldn’t wait for days for the encryption to complete, especially for smaller volumes like the ones I used (32 GB Premium SSD).