A quick adventure in MaaS (part 1)

Previously, I wrote an article about how to create a MaaS-friendly Windows OS Image that can be deployed in a MaaS cluster. This time, I’ll be talking about how we set up our first MaaS clusters — what challenges we saw and what decisions we made.

This is a multi-part feature; the second part should be available a couple of weeks later. For this part, we get started with MaaS, including installing MaaS and adding Rack Controllers (we’ll define this later). In part 2, we’ll explore the database and HTTPS proxy.

Before we get into the scary stuff, I want to mention that our architecture is just ‘an architecture that works’ — it meets the requirements, it scales well enough and we can handle a few disasters. Our solution is not the solution — in fact, it still has some problems, but it’s good enough.

Also, before I get started on any technical details, the time of writing this is Autumn 2017 and the version of MaaS we are playing with is 2.2.

So, let’s recap; MaaS is Metal-As-A-Service — it’s a Canonical project to manage a bunch of hardware like a cloud. It’s all super-complicated stuff that is nicely abstracted away behind a pretty web interface.

Because a lot of this stuff is super-complicated, I’m going to gloss over some of the more technical parts in the interests of brevity. On that note, here’s a quick definition; a “MaaS Cluster” is a collection of machines that are managed by, or run applications for, MaaS Controllers (we’ll talk more about Controllers later).

If we just want to get started with MaaS, then we don’t need to do much — simply grab an Ubuntu (Xenial) machine, open up a terminal and run:

sudo apt-get install maas

… and that’s it.

That one command will install all the necessary MaaS packages onto our machine and you can use a web browser to go to:

http://wherever.the.maas.machine.is.located:5240

…or not, as it turns out we’ll need to create an ‘admin’ user first. Thankfully, as it explains on the page, that’s just another command we need to run on the terminal:

sudo maas createadmin --username "admin" --password "swordfish" --email "maasadmin@some.mail.domain"

For those of you that hate passwords in commands, you can omit all the parameters and you’ll have an interactive session instead — the above is just the bare-minimum to get going (it’s even missing some options about importing ssh keys).

Back to the web panel, we can log-in as ‘admin’ (or whatever you decided to enter as the username — perhaps something a little harder to guess like ‘Smitty Werbenjagermanjensen’…).

After some very quick forms about setting up the MaaS Cluster (all the options can be skipped or left as their defaults), we can start adding nodes and deploying them.

That’s all you need to do to start with MaaS. Obviously, adding machines and setting up networks will depend massively on your environment, hardware and network.

At this point, I would like to take a moment and applaud Canonical for taking something that is actually quite complicated and simplify it into just a couple of commands. Seriously — credit where credit is due and it is totally due.

Just One Machine?

Right now, we have a single Ubuntu machine managing everything in the cluster. Pretty neat, but not incredible at scale — especially if some machines live in a different rack or subnet.

Deploying a machine through MaaS is a complicated process. Thankfully, most of the magic is hidden away from the user, but there is a step that we should observe; transferring the OS image. For Ubuntu and Centos images, the size of the publicly-available images is about 0.5GiB at worst, so the network won’t be massively impacted by such a transfer.

Now, imagine deploying over a hundred machines at the same time. Alternatively, try using a custom image — it’s easy to create an image that’s 10GiB and imagine transferring that a few times.

As you can guess, there is going to be a sizable bottleneck on the poor Ubuntu machine we installed MaaS on. Even worse, deploying machines will affect the UI/API as the network interface is busy deploying things.

Furthermore, if your machines are arranged into racks, chassis’s or different rooms, the network link between these bunches of machines can get very busy very fast. So beyond just the MaaS Controller, you’ll likely find network bottlenecks as well.

Similarly, if everything important is on one machine, a hardware failure can put you out of action completely.

One solution would be to have a distinct MaaS Controller per section, but having to connect to many MaaS Clusters will be a hassle (to say nothing how much fun it would be maintaining OS images).

Also — I’m going to quickly define ‘rack’ as a logical collection of machines (e.g. all machines in a single row or a room) where each machine in a ‘rack’ can communicate with another in the same ‘rack’ very quickly, but communication between machines in different racks might be slower. So, for the purposes of this document, ‘rack’ does not necessarily refer to a physical rack, but instead, something that is rack-like from a networking standpoint.

Thankfully, the wizards at Canonical offer us Rack Controllers. Basically, these are an extension of MaaS’s control to handle scaling. Quick note on the name ‘Rack Controller’; there doesn’t need to be a Rack Controller per rack (or per thing that is rack-like), but it is a good starting point when thinking of scale.

It’s perfectly acceptable to use one Rack Controller for many physical racks. Similarly, it’s fine to have many Rack Controllers for a single rack. Ultimately, how you arrange Rack Controllers into racks should be based on your deployment requirements.

For those of you following along with the commands thus far, you might have noticed that when you installed the ‘maas’ package, it depended on maas-region-controller and maas-rack-controller (and a python django web framework, but we'll ignore that for now). So, when we installed maas, we technically installed our first Rack Controller with it.

The other thing we installed was the Region Controller — that’s what was providing the web front-end we saw earlier. We’ll come back to this topic later, but for now know that our first MaaS box is both a Rack Controller and a Region Controller.

For the purposes of this document, let’s assume that we have three racks of 15 machines each. Right now, we are using one of those machines in one of those racks to run MaaS.

Let’s say at the ‘top’ of these racks is a network switch that connects everything in one rack to other racks. This would likely mean communication between racks is slower and we ideally want the MaaS servers (the things we want MaaS to deploy) to use it for doing actual work (as opposed to letting MaaS infrastructure eat it for any deployments).

With the setup described, the most obvious (and probably correct) next step would be to install Rack Controllers in the other two racks. This means installing Ubuntu and then running:

sudo apt-get install maas-rack-controller

Unlike installing the maas package, we don't get a pretty web interface to let us know it worked or what to do next.

However, if you go back to the Region Controller (the machine we installed the maas package on) and you navigate to Nodes and then 'Controllers', you'll see a button to 'Add rack controller' which gives you some more instructions. Thankfully, we just need to go back to our new Rack Controller and run:

sudo maas-rack register --url http://wherever.the.maas.machine.is.located:5240/MAAS --secret <secret>

The ‘secret’ will, by default, be a randomly generated 32 character hexadecimal string. For those that hate using web-browsers to fetch configuration details, you can find this secret in the contents of the following file on the Region Controller:

/var/lib/maas/secret

For those of you that dislike randomly-generated secrets, you can always set the secret yourself. To do this, simply overwrite the contents of the ‘secret’ file on the MaaS box and then restart the maas-regiond service. For example:

echo -n "0123456789abcdef0123456789abcdef" | sudo tee /var/lib/maas/secret
sudo systemctl restart maas-regiond

Personally, I usually set the secret for my MaaS clusters as it makes it easier to add new Rack Controllers since I do not need to examine the Region Controller. If you can’t decide on a good secret, consider using an MD5 checksum of your favourite song.

If you do want to set your own secret, I’d strongly encourage you to do it before adding more Rack Controllers — otherwise, you’ll have to re-register each Rack Controller.

After running the maas-rack register command, you should be able to see a new Rack Controller in the MaaS web panel (under the 'Controllers' section we visited before).

The Rack Controllers are what are providing OS images during a server deployment, but the Rack Controllers get the images from the Region Controller. However, unlike before where each deployment would stress the network between racks, the Rack Controller will have a snapshot of the OS images and will be able to provide them without necessarily contacting the Region Controller. So, just be adding the Rack Controllers (and setting them up to provide deployments for the machines in the same rack), we’ve massively reduced cross-rack network utilisation (for machine deployments).

The snapshot is synchronised every once in a while — usually if you add or update an image, a synchronisation will happen a few minutes after. Alternatively, you can disable automatic image synchronisation and choose to kick off the process when it suits you. For example, we can synchronise images across Rack Controllers at 4AM — assuming the network won’t be busy because everybody is asleep and not interacting with our servers.

There is a lot of more complicated steps when adding a machine to ensure it is deployed by the correct Rack Controller, but this is influenced by the hardware and your network setup (VLANs, for example), so I’m ignoring it in the interests of brevity.

In summary, we have:

  • 3 racks
  • 15 machines in each rack
  • a machine acting as both a Region Controller and Rack Controller in one rack
  • a machine acting as a Rack Controller in each of the other racks

Snowflakes

So, we have 3 Rack Controllers, one of those Rack Controllers is also a Region Controller and that makes it special. The differences might seem minor, but they can make diagnostics and monitoring more difficult.

If I was monitoring CPU usage of all Rack Controllers, I would see one machine consistently higher than the others — this is actually because it’s doing more work as a Region Controller. If I wanted to investigate why one of my Rack Controllers was broken, I would need a different process for the Region Controller (e.g. a reboot is no longer viable as it will take the entire cluster down).

So, we have a Rack Controller that’s special. It needs to be treated differently. It needs to be observed differently. We call things like this ‘snowflakes’.

I dislike snowflakes. I work in DevOps — snowflakes make my job difficult.

To me, snowflakes are to be avoided like the miniature-ice-death-spikes that they are.

So, here’s what we’re gonna do — separate the Region Controller and the Rack Controller into distinct machines. This way, I’ll have 3 perfectly identical Rack Controllers — each of them can be deployed, monitored and repaired like any other.

This does mean I’ll have to install the Region Controller in a slightly different way:

sudo apt-get install maas-region-controller

(and you’ll still need to create a user with the maas createadmin command)

Now we have:

  • 3 Rack Controllers (none of which are more special than the others), one in each rack
  • a Region Controller

Part 2

Sadly, our adventure today must come to a close. Stay tuned for the next exciting episode.

--

--