Are Cloud Operating Systems the Next Big Thing?

1150

You may have heard the new buzz word “Cloud Operating System” a few times in the last few months. The term gained prominence when Cloudius Systems launched OSv at LinuxCon in September. Many people working on OSv – namely Glauber Costa, Pekka Enberg, Avi Kivity and Christoph Hellwig – are well known in the Linux community, due to their role in creating KVM. But the concept of a cloud operating system isn’t new. There are many cloud OSes from which to choose, including our own forthcoming MirageOS.

Contrary to popular belief, cloud operating systems do not threaten system security or reduce the importance of Linux in the cloud. And they provide many advantages over today’s application stack including portability, low latency and simplified management. In fact, they may just be the next big thing.

Cloud Operating Systems: A new incarnation of an older idea

The approach taken by OSv (as well as others before OSv), revisits an old approach to operating system construction – the Library OS – and puts it in the context of cloud computing within a virtual machine. The basic premise of this approach is to simplify the application stack in the cloud significantly, removing layers of abstraction and offering the promise of less complexity, increased system security and simplified management of application stacks in the cloud. Figure 1 shows this approach in more detail.

CloudOS Diagram

As you can see, Cloud Operating Systems are designed to run a single application within a single Virtual Machine: thus much of the functionality in a general purpose operating system is simply removed. Of course this could be done directly on hardware, in effect requiring that every language runtime is ported to different hardware environments. This is an expensive proposition, which has limited adoption of the Cloud Operating System until now. However, hypervisors expose an idealized and tightly controlled hardware environment within the VM that can be used directly by a language runtime. Add to this the fact that your typical cloud application only needs access to disk and a network (graphics, sound, and other functionality is implemented on top of network protocols) and suddenly, the cost of porting a language runtime to a hypervisor is manageable.

As we are also only running one application per VM, the runtime and application can run in the same address space, simplifying things even more. In other words, the need for TLB flushes is removed and extra code and overhead eliminated. For some languages, such as OCaml and Haskell, the runtime and application can be statically linked, reducing footprint, dead code and increasing security even more. This single kernel and application image is being termed a “unikernel” to reflect its highly specialised nature.

Examples of Cloud Operating Systems

As stated earlier, OSv is not the first Cloud Operating System on the market. To the credit of OSv’s creators, it did put the technology on the map by creating lots of buzz. The table below shows what is available. 

Cloud OS

Targeted language 

Available for

ClickOS

C++

Xen

Drawbridge

C

Windows ”picoprocess”

ErlangOnXen

Erlang

Xen

HalVM

Haskell

Xen

GUK11

Java

Xen

MiniOS

C

Xen

MirageOS

OCaml

Xen, kFreeBSD, POSIX, WWW/js

NetBSD “rump”

C

NetBSD, Xen, Linux kernel, POSIX

OSv

Java

KVM, Xen

Of course, many of these projects are still under development and it will come down to how easy it will be to port existing applications to a Cloud Operating System for a specific language. This depends to a large degree on how rich and well defined language runtimes are.

Are Cloud Operating Systems bad for Linux and security?

When OSv was launched in September, the two overwhelming reactions by the Linux community were that OSv (and by extension Cloud Operating Systems) are bad for Linux and Security.

Let’s look at Linux first: if you run a Cloud Operating System on top of LXC, KVM or Xen you are still running Linux. In some sense, the Cloud Operating System approach is really only enabled by the wide hardware support that Linux provides and the dominance of Linux-based technologies in cloud computing. Admittedly, fewer Linux Kernels may be running in guests, but is this really an issue?

What about security? How secure a system is depends on two factors. 1. The amount of code in the system: more code => more potential exploits => less security. On the flip side: less code => fewer potential exploits => more security. Thus removing layers of code is a good thing from a security perspective. 2. The damage an exploit can do: in a typical cloud application, an attack always starts with input that would either lead to an attack of the underlying OS or application and will then try and get to some user data, take over your OS, another application or other resources in the system. In the Cloud Operating System case, we only have one application and a language runtime. In a nutshell, an attacker would not gain more than access to the already running application. As the Language runtime for a Cloud Operating System is smaller than in the case of a general purpose OS, an attack vector through the Language runtime is harder and less likely. You may argue, that such an attacker could then jump into another VM. However, that possibility exists today: if you trust an application stack running in a public cloud today, you would actually be better off in the new model.

So what about the Xen Project?

As I work for the Xen Project, I wanted to talk a little bit about Xen and Cloud Operating Systems. The first observation is that the majority of Cloud Operating Systems run on Xen. There are two primary reasons for this. First, Xen’s footprint in the Cloud: with AWS, Rackspace Public Cloud and many others running Xen, supporting Xen first makes sense. The second reason is technical: Xen Paravirtualization provides a very simple and idealized interface for I/O to the guest. In contrast, the KVM VIRTIO interface looks pretty much like the underlying hardware. As a consequence, it is easier to port a language runtime to Xen. It is also worth noting that Xen has also been using operating systems within its core functionality to implement advanced security features. It is possible to run device drivers, QEMU and other services within their own VM on top of MiniOS. Conceptually the approach taken by Xen to increase security is very similar to that used by Cloud Operating Systems. If you want to know more, watch George Dunlap’s LinuxCon presentation Securing Your Xen-Based Cloud.

The Xen Project also has been developing its own Cloud Operating System called MirageOS for some time. MirageOS, will have its first release shortly and is worth watching out for.

Are Cloud Operating Systems going to be the next Big Thing?

Certainly ErlangOnXen, HalVM, MirageOS and OSv show great potential. Due to their very small footprint and low latency, Cloud Operating Systems are also particularly suited to run on microservers. There is also great potential for OCaml (via MirageOS) and other languages that compile to non-x86 code such as ARM or even JavaScript, since the same libraries can work on embedded systems as well as web servers and the cloud. Portability of existing applications is less of an issue, as these can be written straight away in a portable way.

We are also seeing a new breed of Cloud Operating Systems which target very specific use cases. An example is ClickOS, which is designed to make the development of middlebox appliances such as NATs, Firewalls and SDN appliances very easy. This is an interesting development, which gels well with an increased interest in virtualization outside server and cloud. Potential applications we are starting to see are emerging in automotive, set-top boxes, mobile, networking and many embedded use cases. The Cloud OS approach has big potential for these type of applications (although for embedded use-cases it is probably more accurate to talk about Library Operating Systems).

Remaining challenges

One technical challenge, which could be resolved through cross-project collaboration, is to remove code duplication for bootloading (across open source hypervisors as well as CPU architectures) and possibly some other areas.

The biggest challenge for Cloud Operating Systems, however, is that today most cloud providers do not provide support for very small, high density VM deployments. Two things need to happen to resolve this. Hypervisors need to be able to run thousands of VMs on large hosts: this is something which is being addressed for Xen by increasing the numbers of Virtual Machines that can be run on hosts (see David Vrabel’s talk on Unlimited Event Channels). Cloud billing resolution would also need to adapt to allow charging for much smaller Virtual Machines that run for very short times (less RAM, less disk space, VM lifespan measured in seconds rather than hours). Whether this will happen, depends on whether this makes economic sense for cloud providers. On the other hand, being able to maximize resource utilization beyond where it is today may be a very attractive option for private clouds.