A quick adventure in MaaS (part 2)

This is the second part of our adventure into MaaS, if you haven’t already, read the first part here.

Catch up

In the previous part, we defined a potential environment of 3 racks, each having 15 machines. We started with a single MaaS Controller and added an administrator user. Then we added further Rack Controllers to improve performance. Afterwards, we separated the first MaaS Controller into its two components; the Region Controller and the Rack Controller.

In this part, we will talk about separating the database and adding HTTPS support.

The base of data

Even after removing the Rack Controller from the Region Controller, that machine is still a bit busy for my tastes. Besides providing the MaaS API and webpages, it’s running a sizable PostgreSQL database. This database is actually holding all the data for the MaaS environment, including OS images, server details and user credentials.

Now, my world is one of paranoia — I have a machine that’s hosting a database and actively accepting connections from the public internet. The data in the database includes the hardware and power controls for our servers, so I wouldn’t be too happy if somebody took a copy of it. In general, I like to keep my sensitive data at least one hop away from the scary people that break into systems for a hobby.

I’m not saying MaaS is insecure — I haven’t seen any security breaches, but an exploit like Shellshock would have been the end of this cluster. Similarly, I’m not saying I’m completely invulnerable by simply moving the database to another machine, but it does help (at least a little).

Even if ignored my security fears, there’s another reason to move the database; it’s a single-point-of-failure (SPoF).

Essentially, if there was a hardware failure (or if one of those pesky humans ran a silly command), I would need to set up the MaaS cluster all over again. Including re-uploading any custom images. Worse, I’d lose all the history of events (who did what to which machines) — if there was a strict audit process, I’d probably need a new job.

Now I should mention that the MaaS servers might survive a catastrophic database failure, but it depends on your setup — if you asked MaaS to provide DHCP to the servers, it won’t be able to renew IP leases, so your machines will forget where they live in IP-land and quickly become useless.

Ideally, I would like my data to survive a random small disaster, even if there some manual steps involved.

One option is to ask somebody else to host a PostgreSQL database for me — a quick Google of “postgresql as a service” gives us interesting results. If you have the budget for it and you lack PostgreSQL skills, this is what I recommend, but note that OS Images will be stored in the database, so you’ll likely need something bigger than the cheapest option.

If you do have some skills PostgreSQL, you can go with a cluster, a HA pair or a master-slave — whichever you feel most comfortable.

We went with a master-slave pair, this does mean there will be manual intervention to fail-over in case of a disaster, but it does give us some advantages:

  • we can proceed with just one database machine if the environment is too small to warrant a ‘spare’ machine for disasters
  • we can setup a master independently; once it’s configured, we don’t need to re-visit it
  • we can setup the slave at any time (as long as the master is available)
  • we can perform backups on the slave without ever impacting the MaaS cluster’s performance

This solution works for us. Canonical suggest a HA-pair of PostgreSQL nodes. What you do with your cluster is up to you.

All data about a MaaS Cluster is stored in the PostgreSQL database, so as long as I can create a backup of it (or the slave in our case), we should be able to handle physical disasters and any accidental SQL mishaps (like “DROP TABLES”). We can then send the backups off-site, just in case a meteor decides to hit the datacenter — again, where you place the backups will depend on what you want to do — but an obvious solution would be something like an AWS S3 bucket or Dropbox.

At this point, we have a dedicated machine just handling PostgreSQL and a separate PostgreSQL slave that should be replicating everything happening in the main database. If the main database suffered a hardware failure, we can always promote the slave.

Simply installing PostgreSQL on a machine will not mean that MaaS uses that instead of the one that comes installed with the maas-region-controller package. In fact, we need to perform a few steps to separate out the database from the Region Controller.

I won’t go into too much detail about installing PostgreSQL here — there is ample documentation on that already. However, you should note that you should use PostgreSQL version 9.5 (as this is the version that MaaS officially supports right now).

After starting the database service, you’ll want to create a ‘maas’ user and a ‘maasdb’ database. You can use the PostgreSQL utility commands createuser and createdb if you want, but I'll stick to psql.

If you went with a basic installation, you should be able to run:

sudo -u postgres psql

This will open an interactive session where we can enter SQL commands (or some bonus PostgreSQL shortcuts).

To create our ‘maas’ user, run (in psql):

CREATE USER maas WITH NOCREATEDB LOGIN ENCRYPTED PASSWORD 'notswordfish';

To create the ‘maasdb’ database, run:

CREATE DATABASE maasdb WITH OWNER = "maas" ENCODING = "utf-8";

You are now finished with pqsl, so type \q to exit.

With the database created, you will need to ensure the PostgreSQL service is listening on an address that your MaaS Region Controller can reach. This will involve editing the file at:

/etc/postgresql/9.5/main/postgresql.conf

and setting the listen_addresses parameter (Google is your friend if you're stuck).

Next step is to allow the ‘maas’ user to connect to the ‘maasdb’ database from an external address. By default, postgres will not allow other machines to connect to the database (this is a good thing). You need to edit the file at:

/etc/postgresql/9.5/main/pg_hba.conf

and add the following line to the end:

host    maasdb          maas            0.0.0.0/0               md5

This will allow any machine to connect to the ‘maasdb’ database as ‘maas’ (as long as they got the password right). Ideally, you want to give a more specific address than 0.0.0.0/0 — but this should get you started.

You’ll need to restart the database service to pick up the configuration changes. Again, this depends on your installation, but for most, run the command:

sudo systemctl restart postgresql

For this document, we mirrored the setup that the maas package installs in terms of database and user names - you're welcome to choose different names, but I'm fairly comfortable with them (you definitely want to change the password though).

After configuring PostgreSQL, we now need to tell MaaS to use it. Thankfully, that’s just a few commands on the Region Controller:

maas-region local_config_set --database-host "postgres.box" --database-port 5432 --database-name maasdb --database-user maas --database-pass "notswordfish"maas-region dbupgrade

That final command will create all the database tables that MaaS expects — if you’re changing to a slave or restoring from a backup, you’ll want to skip it if you want to keep the old data.

For MaaS to actually start using the new database, you’ll need to restart the maas-regiond service (you shouldn't need to touch the Rack Controllers). Just in case you forgot from last time, you can restart the MaaS Region Controller with:

sudo systemctl restart maas-regiond

As for setting up a slave, you should ask Google — again, there’s plenty of other places that will do a much better job explaining it.

Right now, our MaaS cluster looks like this:

  • 3 Rack Controllers
  • one Region Controller (no longer running a database)
  • one PostgreSQL database ‘master’ node
  • one PostgreSQL slave

Dude, where’s my HTTPS?

Remember when I said MaaS wasn’t insecure? Well, maybe that’s true — but using HTTP on webpages asking for passwords is not ideal. I want to use HTTPS and I’d prefer it if I could use a more sensible port for a website than 5240. Sadly, MaaS does not support HTTPS out-of-the-box, you’ll want to use a proxy and the Region Controller package comes with the Apache 2 webserver — but you’ll have to configure it yourself.

Personally, I prefer NGINX — no particular reason, we just have more experience with it.

Here’s a sample (and rather cut-down) configuration:

# HTTP to HTTPS redirect
server {
listen *:80;
server_name wherever.the.maas.machine.is.located;
location / {
rewrite ^(.*)$ https://$host$1 permanent;
}
}
# HTTPS proxy
server {
listen *:443 ssl;
server_name wherever.the.maas.machine.is.located;
ssl on; ssl_certificate /some/directory/to/place/sslcerts/cert.pem;
ssl_certificate_key /some/directory/to/place/sslcerts/key.pem;
index index.html index.htm index.php; # web sockets (used in MaaS web browser)
location /MAAS/ws {
proxy_pass http://maas;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Proxy "";
proxy_set_header X-Forwarded-Protocol $scheme;
}
# API, webpage content
location / {
proxy_pass http://localhost:5240;
proxy_redirect http://$host https://$host;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Proxy "";
proxy_set_header X-Forwarded-Protocol $scheme;
}
}

Sadly, just adding NGINX doesn’t mean the Region Controller is not listening for HTTP on port 5240. Even worse, there doesn’t seem to be an easy ‘only listen on these addresses’ option in MaaS, so we’ll have to handle this ourselves with iptables, a firewall or some fancy network Access Control Lists. For simplicity, here's the iptables commands to run:

sudo iptables -A INPUT -s 127.0.0.1/32 -p tcp -m multiport --dports 5240 -m comment --comment "accept all traffic to region api port from localhost" -j ACCEPT
sudo iptables -A INPUT -p tcp -m multiport --dports 5240 -m comment --comment "drop all packets to region api port from anywhere else" -j DROP

The above will allow traffic from NGINX to the Region Controller, but drop anything else. This will include all the traffic from the Rack Controllers, so you’ll need to update the maas-rack register command to now use HTTPS (and the proper port). Alternatively, you can squeeze in an extra 'ACCEPT' iptables rule before the 'DROP' (remember that the order of iptables rules matter) for each of your rack controllers (or for an entire subnet if you know what you're doing).

So, NGINX is nicely handling HTTPS for us and if the MaaS webpage is working, everything is great, right?

Well, not quite — if anybody is using the MaaS CLI (Command Line Interface) or the Python MaaS client (or any other API client), they won’t work anymore.

Without going into too much detail, the MaaS API exposes a ‘describe’ endpoint — when queried, it provides a list of endpoints (URLs, HTTP methods and quick descriptions) that the API user can call (the answer will depend on what permissions the API user has). Right now, MaaS doesn’t know there’s a HTTPS proxy sitting in front of it, so when we ask for a list of API URLs, it’s going to give HTTP as the protocol. As you can expect, the client will then use the URL that MaaS gave and promptly fail.

If you want to see what a MaaS client will see, you can go to the following URL:

https://wherever.the.maas.machine.is.located/MAAS/api/2.0/describe/

The response will be a sizable JSON object (again, depending on how much the user is allowed to do). What you (and the clients) will care about most is the ‘resources’ array. Here’s an example of an element of that array:

{
"doc": "Manage a boot resource.",
"params": [
"id"
],
"uri": "http://wherever.the.maas.machine.is.located/MAAS/api/2.0/boot-resources/{id}/",
"actions": [
... (sorry, going to skip this for brevity)
],
"name": "BootResourceHandler",
"path": "/MAAS/api/2.0/boot-resources/{id}/"
}

That uri field is what will cause problems for clients - even though we made a HTTPS call to get this list, MaaS thinks you should call it with HTTP.

In order to fix this, we need to tell MaaS to say its using HTTPS when requests go through NGINX. Thankfully, the MaaS front-end is handled by the ‘django’ framework, so that’s as simple as adding the following line:

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTOCOL', 'https')

to the end of the file:

/usr/lib/python3/dist-packages/maasserver/djangosettings/settings.py

and you’ll need to restart the maas-regiond service for the changes to take effect.

Note: the above python will only work if you keep the X-Forwarded-Protocol header in the NGINX configuration. For more information, check out the django documentation about this.

Now, when MaaS API clients first connect to the Region Controller and ask for a list of available endpoints, the answer will have URLs for HTTPS. Now everybody and everything is happy with HTTPS.

What’s next?

We’ve got a total of 45 machines, 6 of those are currently running MaaS infrastructure (or databases that the infrastructure relies on). We can handle some disasters, but there will be manual steps involved.

We could add an extra Region Controller in a HA setup, but that’s now a machine that can’t do actual work. Since the database holds all the state of the MaaS Cluster, we can just bring up a new Region Controller if the first one fails (if the IP address and MaaS secret don’t change then the Rack Controllers will be fine).

On a similar note, we could bring up spare Rack Controllers in each rack so that they can automatically fail-over as well — but that’s an extra machine wasted per rack.

As mentioned before, you can ask somebody else to host your PostgreSQL database which would, in our example, return 2 machines. However, depending on where the database is located, this could impact the cluster performance (deploys will likely be fine, but using the webpages would feel slower).

What about logs? MaaS Controllers support rsyslog, so you can always run a local rsyslog server and point it to your centralised logging infrastructure. Alternatively, you can spare a server to run a miniature ELK (ElasticSearch, Logstash, Kibana) stack - or go with a 3-node ElasticSearch cluster if MaaS logs are important enough to you. Again, there are some ElasticSearch-as-a-service offerings, so if you can't be bothered, pay somebody else to do it for you.

Maybe you want to spare a machine to act as a VPN server?

Maybe you need a machine to monitor the others? MaaS does have the ability to do some hardware and power checks, but you might need something more specific to the applications you’ll be running. Alternatively, you can add your monitoring software on the Rack Controllers.

How your cluster is structured is entirely up to you — these blog posts were just the first few steps into our adventure into MaaS. We’ve made some improvements since then and tried some other things here and there, but this adventure is long enough already — so, go build your own MaaS cluster and define your own architecture.

--

--