Upgrading Golang 1.9 for great justice

Golang 1.9 fixed a few of issues for us that were ‘interesting’ to say the least; in particular, an issue resulting from the tickers in the golang time package not being monotonic, and one in the runtime netpoller itself that caused us some major pain at scale.

Let’s do the time warp again

When taking actions periodically, especially at fine granularity, you generally care about doing them in order, maintaining cause and effect. For our stats logging, we were using the golang time package Tickers to take snapshots of data and send it periodically. This all seems great until you remember we live in a world where time isn’t quite as fixed as it seems.

Time doesn’t always move forward in computers. Sometimes, it stops, or goes backwards, or jumps into the future. Clocks go out of sync, time servers become unavailable, clock skew happens due to hardware or software issues, timezones change and daylight savings happen. All of these things can affect what time your computer thinks it is, while it is running. What happens when you are sending data every minute and suddenly the next minute is happening in 2 seconds instead of another 59 because the clock has changed?

Well, this is where monotonic time sources come in. Monotonic clocks are not subject to clock adjustment so they reliably increase as time goes on.

In some of our applications, we have things happening at regular intervals and upstream these get aggregated into measurement periods (for example, sending stuff at 1 minute intervals and aggregating things into 1 minute chunks). For these things, it doesn’t matter where in the aggregation period an event occurs, just that no more than one occurs in the same period — events must be guaranteed to happen exactly one interval length apart.

When time is flexible, this can no longer be true — which is where monotonic time saves the day. Upgrading to 1.9 resolved an issue where clock skew would sometimes have us seeing double entries in one period and no entries in the next one, and now this no longer occurs. Thanks 1.9!

Wake me up, before you go go

We discovered we were victims of a peculiar issue in the Go runtime that caused our TCP connections to error when talking to GCE. As it turns out, this was down to spurious wakeups happening that meant the connections would think they were established when they actually weren’t. Consequently, we’d be grabbing a connection and using it, but it would actually be doomed to fail from the start. Some excellent analysis of the issue can be seen here: https://github.com/google/google-api-go-client/issues/220 and here: https://github.com/golang/go/issues/19289. Ultimately, the fix got merged into 1.9 here: https://github.com/golang/go/commit/bf0f69220255941196c684f235727fd6dc747b5c. Apparently the issue was quite rare, but for our use cases we were seeing it multiple times a day and it made a couple of things quite unreliable.

--

--