Building Windows Images for MaaS

MaaS (Metal-As-A-Service) is Canonical’s answer to managing a physical cloud. If you have a bunch of beefy machines and you wanted a pretty way to manage them, MaaS is one solution. In case the name ‘Canonical’ sounds familiar, they’re the guys behind Ubuntu.

MaaS provides a powerful API and a friendly (albeit slightly-less-powerful) Web UI giving it a cloud-like look-and-feel — a lot of the magic that goes into handling the network and hardware will be invisible to most users. Furthermore, a minimal MaaS setup requires very little effort to get started.

After you’ve got MaaS running and a few machines under its control, you can select a machine and an OS Image to install on it and, after some time, MaaS will tell you it’s deployed. If you went with a Linux Image, this would mean you can SSH in.

Of course, this requires that you give MaaS an OS Image to install. Sadly, this isn’t as simple as finding an ISO file and just handing it to MaaS. The OS Images in MaaS are actually specially-crafted variations of an OS that typically include something like ‘cloud-init’; an initialiser that helps the newly-deployed machine fetch some configuration and tell whatever cloud controller it sits under that it has finished installing. If you’ve ever played with images in OpenStack before, how MaaS images work is very similar.

Canonical have nicely provided some ‘minimal’ OS Images for Ubuntu (LTS versions) and Centos — but for Windows, you’ll need to go elsewhere.

Thankfully, Cloudbase have something to help you get started: Windows Openstack Imaging Tools (GitHub)

Without going into details, the Windows Imaging Tools functions will create a Windows VM (Virtual Machine), apply whatever unattended install operations to that VM, Sysprep the VM and compress the (virtual) disk.

The Basics

For those unfamiliar with Sysprep, it’s a tool that captures a customised Windows installation. You can use this captured image as a means of duplicating whatever customisations you made across all the computers you apply the image to. Quick note; Sysprepping (applying the Sysprep tool) will also remove unique installation stuff (Windows Production Activation, event logs and security IDs) — so don’t think all installations will be perfect clones.

Also of interest, if you apply a Sysprep-captured-image to another machine, it will act like it’s setting up Windows for the first time. So if you provide a custom script for Windows setup, it will be run — this is how Cloudbase run cloud-init.

I should also mention that Sysprepping will shutdown your machine, or in this case, the VM. The Windows Imaging Tools script actually waits for the VM to shutdown to complete the image-creation process.

And for those unfamiliar with Windows unattended installations; it’s Microsoft’s equivalent of Ubuntu preseed or Centos Kickstart. You provide an XML file containing a bunch of configuration options and sets of commands to run, then Windows will apply what it can without any human-intervention. The Cloudbase GitHub repository includes a template suitable for MaaS machines, but you can find plenty other samples online.

Now, the fun part starts — with the aforementioned repository (and the pre-requisites it asks for), you can create custom Windows images. For example, I decided to install Cygwin so that when MaaS deploys my image, I can SSH into it like a Linux machine. Another option is to disable SMB (assuming you don’t need it) for security reasons (looking at you, WannaCry).

While installing my preferred applications is nice, there’s a very important thing we should do whenever we start a Windows machine (virtual or physical); install updates. If I chose not to install them into the image, I’d need to do it for every deployment (and I’d much rather only do something time-consuming once). Plus, installing them in the image means that my newly-deployed machine won’t be insecure from the get-go (or, at least, more insecure due to known vulnerabilities).

The Challenge

Sadly, while the Cloudbase scripts give you the option to install Windows Updates, I have yet to see it do what I expect (there are already some GitHub issues open about this) — I either end up with no updates installed at all OR a Windows image that can’t even boot properly (more on that later).

Before I continue to whine about somebody else’s code, I should mention that these were problems I found in early 2017 — by the time you read this, everything could be sunshine and rainbows. Furthermore, this repository seems to be aimed at getting you started — I think Cloudbase want you to pay them for the full solution (you know, business and stuff).

I should also mention that I don’t fully understand how this GitHub repository tries to install updates and how it runs Sysprep. In fact, there’s a great deal of things I don’t understand about Windows, its updates or Sysprep.

However, I genuinely believe that how updates were being installed in their script is suboptimal. Firstly, as many Windows users will likely be fully aware, some updates are usually installed in 3 steps (ignoring the download); the first phase happens immediately and (mostly) in the background while you’re doing work; the second phase happens before a reboot/shutdown (with a bright blue background with Windows discouraging you from turning the machine off); and the final phase is during the startup after the reboot.

Sadly, the script that installs updates also does the Sysprepping and since Sysprepping shutdowns the machine, you’ll be missing that final phase of installing updates. Now, if the resulting image is deployable in MaaS, you’ll notice that the missing step will be completed during MaaS’s deployment. Assuming that works (in my experience, it probably won’t), the deployment time is ridiculous and, as Microsoft continue to supply further updates, likely to get longer every month.

Notice how I seem a little pessimistic about the image even working? — yeah… so installing updates at the same time as Sysprepping gets… weird. Sometimes the image that these scripts create might not successfully deploy in MaaS at all — if you have a console/screen (for the machine you’re deploying) you can see Windows struggle and hang (you can forcefully power cycle it and get more interesting results). Again — I don’t understand this stuff, so I’m not going to try explaining what’s happening — I suspect something to do with goat sacrifices and lunar phases.

Furthermore, the script that handles updates and Sysprepping is run in an asynchronous block in the Cloudbase-provided Windows unattended install template. At the same time that this script is running, there’s a synchronous block running other customisation operations. This actually brings two new problems; the processes can affect each other and the Sysprep could start before your customisations finish leaving out some operations entirely (I’m not a fan of this).

In summary, I want an image with updates installed and it can be deployed reliably. I want the MaaS-deployed machines to have all of my customisations installed and I would greatly appreciate the machines being ready within my lifetime. My problem is that installing updates doesn’t always work, and when it does it either breaks the image or causes deploy times to skyrocket.

The Solution

The first thing to do is get a full reboot done in the VM before I run the Sysprep — if we can do that, we can completely install the updates in the created image; no need to finish installing them mid-deploy. Plus, we can install the updates independent of the Sysprep and I expect much more reliable images as a result.

This, however, involves significant restructuring of the Cloudbase scripts — they’re fire-and-forget; the Windows unattended install/magic does everything between Windows configuration and calling Sysprep — they’ll quite happily wait an eternity for the VM to shutdown (when the Sysprep completes). For me to squeeze a reboot in the middle is like trying to make an apple-juice sandwich.

I decided to renovate the Cloudbase-provided unattended install template to remove the asynchronous section that calls Sysprep (because I want finer control of when Syspreps happen) and perform as many customisations as possible in synchronous block and end with writing a marker file in an obscure location (you’ll see why later).

Meanwhile in Cloudbase-Imaging-Tools-PowerShell-script-land, I re-purposed the function that waits for the VM to power down to wait until it can find the marker file instead. This also meant I needed a way to run a command in the VM from the host, but that’s what “Invoke-Command -VMName” (or “Enter-PSSession”) is for. It does mean I need logon credentials, but I can set those in the unattended install template (note: Sysprep will reset the Administrator password, so it’s perfectly fine to set it to something static like “12345”).

With the power of Invoke-Command, I can check for the marker file, initiate a reboot and even call the Sysprep script. Since I can control all this with my own script, I can take full control of the VM’s lifecycle and, as a result, the OS image’s contents.

Since I’m already bringing a crowbar to these scripts, I decided to replace their windows-updates script. I really wasn’t a fan of its reliability within the VM and how it handled failures, so I built my own from fragments of PowerShell scripts from various online sources.

Instead of relying on the VM (and then the MaaS-deployed-machine) to download the updates, I decided to go with offline installs. This might seem weird; it’s slower and there’s a bigger window for new updates to be skipped, but there is method to my madness…

Firstly, downloading updates is slow. Whilst there are a lot of updates, Microsoft (thankfully) bundle them into ‘cumulative’ updates every month (so you usually get one big patch file and a number of smaller ones for this month so far), but you should expect gigabytes of data. Whilst I was developing and testing my image-building scripts, I was wasting time downloading the same updates a gajillion times. So I figured it would be faster to download them once and copy them into the VMs. Another related bonus feature; my VM does not require an internet connection and I do not want to mess with virtual switches in Microsoft PowerShell.

Secondly, update failures — they happen. When they do, I want to retry installing the patch as soon as I can. Sadly, the VM might simply be unable to apply a certain patch — again, complicated stuff beyond my feeble mind (I think it’s something to do with the number of pigeons you can see). In such cases, I actually want to keep the patch files in the image and try to re-apply them at deploy time. I realise that this contradicts an earlier statement I made, but the number of failed updates is usually small. Plus, a ‘failed’ update usually does something to make the next attempt a lot faster. Having the patch files readily available makes the deploy a lot faster and gives me the option to apply patches before I even start listening on the network.

This method of installing updates was faster (over the span of many image-build-attempts), more reliable and more flexible. You can disagree, you can shout that I’m missing the point, you can even give me death-glares — but here’s the thing; it works for me.

I decided to add a call to my own update script in the unattended install template (before creating the marker file) and then a second call during deploy time (via Windows Custom Setup) to retry any failures.

The Result

With all that in place, I can now build a MaaS-friendly Windows image that has:

  • as many updates as possible installed
  • any number of customisations performed
  • a high probability of deploying in MaaS within a reasonable amount of time (less than half an hour on decent hardware)

Of course, installing updates in the image is only a start — as soon as you deploy a machine based on that image, there’ll likely be more updates available. Once you’ve deployed, you can probably just switch to the standard Windows updates service (even if the patches were applied without it, Windows will still understand how up-to-date it is). Naturally, the longer you use the same image, the more updates will need to be applied at deploy time.

However, as long as you re-build your Windows image regularly, you should be able to keep deploy times short and your newly-deployed machines up-to-date (mostly).

Closing Notes

I am glossing over some of the more interesting parts of the offline patching process (like cabinet files that aren’t actually valid cabinet files) and how you’ll need an extra reboot after you’ve deployed a machine for any patches that trickled through. Similarly, there are some other modifications necessary to the Cloudbase magic to add my own custom scripts at the end of the Windows setup (during deployment).

I should note that a lot of these problems didn’t need to be solved by us — if you want an easier time, you can always go to Canonical and/or Cloudbase and ask them to build your images for you (if you’re willing to spend a little green).

On the same note, whilst I found their scripts confusing in this case, I would still recommend Cloudbase’s stuff as a decent starting point.

--

--