Latest Publications

Magnum with a narrow Scope

magnum-scopeThe OpenStack Magnum project began in 2014 as a Containers-as-Service solution for OpenStack clouds. Since then it has grown considerably in popularity, and soon became the place that people thing of when they think of anything containers related for OpenStack should go. Community members proposed a whole range of features, many of which duplicated areas of development where other projects like Docker and Kubernetes were already serving well. This led to a community discussion at our last design summit in Austin, where I was joined by a number of concerned community members, including several prominent leaders in the OpenStack technical committee to redefine the scope of the Magnum project to be more narrow, ensuring its success rather than becoming the home of all container related ideas. This new scope allowed the creation of another project named Zun where those who want to add the equivalent of Docker and Kubernetes equivalent functionality to OpenStack can work on those pursuits. Although I don’t contribute to the Zun project, I did support the creation of that project to allow the OpenStack community to collaborate on those pursuits.

Along with Magnum’s new and focused scope, we adjusted the mission statement:

To provide a set of services for provisioning, scaling, and managing container orchestration engines.

A container orchestration engine, or COE, is a software system such as Docker, Kubernetes, Apache Mesos, or similar systems that provide orchestration and clustering features for containerized applications. Magnum includes pluggable cluster drivers for each of these COEs. Perhaps someday Zun will also have a community contributed Magnum cluster driver as well.

In the Mitaka release cycle, Magnum’s contributors removed a considerable portion of the code from Magnum and our client that deal with the lifecycle of containers, and instead focus on the management and scaling of the clusters in which your containers run. When presented with this news, many people ask the following question:

Why? Doesn’t this mean that in order to use containers on OpenStack you have to use two different clients?

The answer is definitely yes. This was our intent. We think that the clients that come with each different COE are better, and more complete than any experience that wraps them. People actually like using the Docker and Kubernetes clients. So rather than putting a leaky abstraction around them, we help cloud users create clusters that they can grow and manage without assistance from their cloud operators, and allow them to use the container tools they know and love. They get the best of OpenStack and the best of container management software by some of the most active communities in the world of open source software.

In all honesty, this is not too different than using Trove to create a MySQL server. You use Trove to get your server, and do management related things like backing it up. Once you have your server, you use your favorite MySQL client (or client libraries) to connect to it. targetMagnum works the same way with the COEs it supports. We help you make your cluster, and we get out of your way.

Thanks everyone in the community who worked together to redefine the Magnum project for success, and for doing the work to get us laser focused on a target we can definitely hit.


Scalability and Containers

During Container World 2016, I delivered a keynote at Open Container Night where I offered a high level overview of three prevailing Container Orchestration Engines (COEs). That presentation included this slide to illustrate that there are many factors that need to be considered when finding the right COE for you:



This slide was innocently posted out of context on Twitter, and soon began a debate about scalability of these various systems. I was not intending for this to be used as a benchmark reference. Some recognized that the flowchart is supposed to be amusing. The truth is that practically nobody creates a single cluster of 10,000 physical servers (x16 CPU = 160,000 cores). I would never do that. Serious network constraints would surface before reaching that scale. None of today’s COE systems are actually intended for this imaginary use case. Large systems are broken down into smaller zones using federation or shards, and tend to be balanced among different geographical regions for fault tolerance, and disaster recovery reasons. Giant flat networks simply don’t work well.

An unfortunate side effect of this discussion was the misguided idea that Kubernetes could not scale beyond 200 hosts. In fact, version 1.2 of Kubernetes is known to scale to 1000 kubelets, and it’s getting better and better. In fact, when you get into the large cluster range of hundreds of hosts, you need a system to manage automation, data replication, and a scheme for gracefully handling node failures. These are all bases that Kubernetes has covered.

The Numbers, Explained

The internet loves numbers, and we all seem to love a debate. The numbers on my slide were not intended to start a debate, nor were they meant to indicate that there are any absolute limits to the availability of each of these options. I’m trying to explain that if your IT shop has a few racks full of servers (<200), that you’re probably going to like Kubernetes because of the declarative style of defining the desired state of your apps. If you own millions of dollars worth of servers (>1000 servers) or a whole data center full of hardware approaching a cost of hundreds of millions of dollars (>10,000 servers) then you probably have the skills and resources to do whatever you want, regardless which COE you pick. I know a few shops like this, and when they prefer Mesos, it’s usually because Hadoop is a framework that can coexist concurrently with Marathon, and others each taking different aspects of their big data workloads on a common set of physical infrastructure. Kubernetes offers an elegant approach for a variety of workloads. It’s modeled after Borg at Google, which runs enormous workloads. If you need your cluster to scale to more than 1000 nodes on a single master, then you’re probably doing it wrong. When used properly, any of the COE’s I mentioned can work at this scale.

In my talk I did not offer any guidance about what to use when you only have a few servers. I would definitely not attempt to use Mesos at a small scale like that. Kubernetes offers features like replication and fault recovery that can restart containers upon failure. If you like being able to keep a mental model of where everything is, you might prefer a more manual approach with Docker Swarm. If you have other ways to deal with what happens when things go wrong, or want total control over everything, you might decide not to use any COE at all. Each of these COE’s are constantly evolving, so I can’t say for certain which one will remain the prevailing popular choice over time.

Application Scalability

Your selected COE is not the factor that determines how your app will scale. Your application architecture and implementation will determine that. Every app is different, and each will have it’s own constraints and bottlenecks that make scaling it challenging. Using containers and COE systems like Docker Swarm, Kubernetes, or Apache Mesos help simplify the use of a microservices application architecture, which can allow many applications to reach scale that they can’t when using a more coarse grouping of your various application components. Note that microservices architecture is not the best for all applications. Let’s say for example that you have an application that acts on huge volumes of data that need to move around between various services. In that case, the load on your network will be considerable. Throughput between your various microservices will quickly surface as a bottleneck. If your working set of data exceeds your memory capacity, chances are that I/O throughput of your storage systems will be critically important to your application performance. True scalability comes from breaking down your various problem sets such that each server is working on a small part of the problem, and work can be distributed among many workers that advance through the problem in parallel. It may also come from accessing your data in place, and not constantly moving it back and forth across your network. At the end of the day, it comes down to understanding where the work happens inside your software, and making that work as efficient as possible. If your application is not designed to scale well, then you can’t really expect it to, no matter what COE you task with managing it.

SCALE 14x: Docker, Kubernetes, and Mesos: Compared.


I had the pleasure of giving this talk at SCaLE 14x. I was impressed with the audience at over 200 in attendance for this breakout session. It’s clear that containers are the new hotness, and we all want the inside track on how to make the best use of them. This talk guides you through which of the container orchestration systems are best for you based on various criteria about you, your organization, and your workloads. The slides are available, and the talk was recorded. As soon as the video is posted I will link to it here as well. I expect the post processing to happen in a couple of weeks, so stay tuned.