2——A Journey into Microservices: A Cloudy Beginning

This is the second part of a three-part article. Start reading at part one, A Journey into Microservices.

From the start there were a number of guiding principles we wanted to instill into our new platform.One of these was taking a Cloud Native approach, something which had been been pioneered by Netflix.Adrian Cockcroft has talked about the approach Netflix took, with their aim during this process being to:

“Construct a highly agile and highly available service from ephemeral and assumed broken components”–Adrian Cockcroft

While using a cloud provider isn’t necessarily a requirement, infrastructure as a service obviously makes scaling much easier.In Hailo’s case we use a wide range of Amazon’s products, allowing us to match customer demand and react quickly to changing markets, while keeping costs low.

However, regardless of hosting provider, any large distributed system will have components that are failing or degraded at any point in time.This is something we all continue to strive to fix–trying to achieve a utopia of perfect hardware systems, running perfect apps, connected by a perfect network.Unfortunately this isn’t really attainable, and instead we are stuck with a dystopia of buggy apps, running on hardware which often fails, or disappears.This isn’t necessarily a bad thing though–it forces us to face these problems head on, and design software which can benefit from a rapidly changing environment.

This concept of antifragility was popularised by Nassim Nicholas Taleb and is central to becoming cloud native. Most systems become weaker when under stress, however by deliberately introducing stressors into our systems, or chaos in Netflix’s case, we can identify these issues, and design out these weaknesses–vastly improving the reliability of our service.

With these concepts in mind we tried out several different prototypes, and eventually settled on a service oriented architecture which supported both Go and Java services, communicating Protocol Buffer formatted messages over a RabbitMQ message bus.This allowed interesting routing patterns, such as routing to specific versions of a given service, or point to point messaging.Cassandra remained our primary data store given that it was working so well, and fitted in perfectly with our requirements.

Hailo's Microservice Platform based around RabbitMQ and Cassandra

So, why Go? Previously we had been using PHP and Java, and while we wanted to retain the ability to write Java on our platform, we were less interested in keeping PHP.Go is a small language, and is therefore easy to learn.Being compiled it is mind bogglingly fast (especially when you are moving from a scripting language like PHP), and features such as its type system and fantastic interface support make writing code fast; giving us improvements both in development and compute time.Also Go has excellent concurrency primatives–something of particular importance for us as our infrastructure would require a lot of inter-service communication.Finally, it is fun to write, and there is an amazing community!So not only did our developers enjoy writing new services on our platform, but we could recruit from an incredibly talented pool of developers in London.

But, what even is a Service?

The definition of a service, microservice, macroservice, mediumservice or even a moderatelylargeservice seems to vary wildly depending on who you talk to.As we previously discussed in our Webapps as microservices post, Martin Fowler defines the microservice architecture as such:

The microservice architectural style is an approach to developing a single application as a suite of small services … these services are built around business capabilities and are independently deployable … there is a bare mininum of centralized management of these services, which may be written in different programming languages

In our case we defined a service as a small, distinct, unit of responsibility–something that did one job, and did it well.These ranged widely from some very small services with maybe a hundred lines of logic (excluding libraries), to a few which were much larger; such as the system which tracks our drivers in realtime.Regardless, the guiding principle remained the same–that these services should each have clearly defined responsibilities.

Secondary to the responsibilities of the service, was the interface it provided to the outside world.In our case we chose Protocol Buffers, which gave us well defined, strongly typed messages to be passed between our services.Each handler (or endpoint) on a service defined a Request and Response envelope message which it would accept and reply with.As Protobuf is extensible, these could be changed and added to during the lifetime of the service, while still supporting older versions of clients.

Finally there was the internal implementation of the service.Arguably this was the least important part of the service as most services were small, and could easily be rewritten or replaced if necessary.In comparison, changing the interface or responsibilities of the service would usually require changes to other services, with the corresponding cross team communication overheads.

Developers, Developers, Developers

Having established what a service should look like, we now needed to provide tooling to make it extremely easy for developers to build services, so we could get on with solving real-world problems!

Services were based on our ‘platform-layer’ library which abstracted away the message bus transport layer, and delivered messages to and from registered message handlers within the service.This also provided the framework for inter-service RPC calls, service discovery, monitoring information, authentication and authorisation, and A/B testing.

In addition to this our ‘service-layer’ libraries provided abstraction layers over most of our internal 3rd party services, such as Cassandra, Memcache, Zookeeper and NSQ, adding convenience methods, host discovery, automatic configuration, and logic to refresh configuration dynamically as it changed.

Components of a Service

To increase productivity we created templates for services, and added Hubot integration so we could quickly provision a new project with a single command:

Services as a Service

Setting up Continuous Integration for a service was also as simple as asking Hubot.This used Janky to set up the project in Jenkins, and add post receive hooks to our Github repositories.

Finally, the end result of a successful build was a deployable artifact stored on Amazon’s S3.This could then be provisioned to any of our environments using our internal deployment tools.

Always be Shipping

Once developers had built their services, the next step was deploying these.Docker was around version 0.4 when we started working on our platform, and although it was tempting to try using this in production, we didn’t want to spend our time debugging issues until it was more stable.We also wanted something fairly lightweight, so we could swap it out in the future, and it had to be simple to use.

Go made this very easy, as the output of our build was a statically linked binary which could be uploaded to S3 and then downloaded and executed on any machine.Couple this with our platform command line, and a snazzy Web dashboard, and we had our deployment system:

The Provisioning dashboard for an environment

Provisioning a service through either the Provisioning dashboard or platform command line communicates with our provisioning manager service (a scheduler which itself runs on the platform). This decides where the service instances should be run. A provisioning service daemon, which runs locally on every machine, polls this and identifies when it needs to run a new service. This then pulls down the service from Amazon’s S3 and manages execution of the binary during its lifetime.

Simple Global Provisioning

As Docker stabilised we added this into our build chain, so that services could be run within containers. This process is almost identical, with the exception that we pull down a Docker image, and request that the Docker daemon execute the container:

Simple Provisioning with Docker

Once a service starts running it automatically discovers and connects to RabbitMQ, and publishes its existence, registering with our service discovery system.

Services automatically discover RabbitMQ and publish their existence

The Binding Service then sets up the the correct queue binding rules within RabbitMQ, connecting the delivery queues to this new service instance.This supports some advanced traffic routing features, including per-version weighting, so particular versions of a service can receive a weighted amount of traffic.

Once discovered, services are bound to RabbitMQ and receive messages

Now continue reading part three, A Journey into Microservices: Dealing with Complexity.