GPL Linux virtual machines and virtual machine clustering

86

Author: JT Smith

By Grant Gross

Last week I wrote a roundup of the virtual machine-like technology available for Linux, and an alert reader pointed out the User-mode Linux project.

Think of User-mode Linux (UML), a modification to the Linux kernel that’s released under the GNU General Public License, as a cross between the VMware workstation product that allows users to run Linux and Windows side-by-side and larger virtual machine-type products that allow dozens of Linux copies to run on one server. Jeff Dike, leader of the project, says users have reported running as many as 50 virtual machines on one piece of hardware.

Here’s Dike’s description of a virtual machine, probably better than I can explain it, from an article published in Linux Magazine: “(Virtual machines) offer the ability to partition the resources of a large machine between a large number of users in such a way that those users can’t interfere with one another.
Each user gets a virtual machine running a separate operating system with a certain amount of resources assigned to it. Getting more memory, disks, or processors is a matter of changing a configuration, which is far easier than buying and physically installing the equivalent hardware.”

UML isn’t the only Open Source project working on virtual machines or related technology. There are a couple of other projects released under the GNU GPL, unlike the mostly commercial and proprietary VM technologies I featured in the first article. Among the Open Source alternatives:

  • The FreeVSD project and its commercial counterpart Idaya market a Web-hosting platform that allow multiple virtual servers to be created on a single hosting server. The FreeVSD project’s goals include this one: “To establish and support FreeVSD as the standard for global Web hosting whilst keeping it free from the constrictions and limitations of closed source software.” Idaya offers ProVSD, while version 1.4.9 of FreeVSD is available for download here.

  • The Plex86 project has the goal of creating “an extensible open source PC virtualization software program which will allow PC and workstation users to run multiple operating systems concurrently on the same machine.” This allows users to run Wndows software in Linux, much like VMware’s workstation product. It’s being developed under the LGPL.

  • The vserver project works within the Linux kernel to allow users to “run general purpose virtual servers on one box, full speed,” according to project leader Jacques Gelinas. Vserver is also released under the GNU GPL.

    I asked UML’s Dike about his progress on the project and what’s next for it. One interesting idea he has is to use UML for clustering, a concept it took me a while to get my head around. Our email conversation follows. For more information, check out the project’s extensive Web site, which includes case studies of UML being used in the real world, a list of uses for UML and screen shots of UML in action

    NewsForge: How long have you been working on the project?

    Dike: It depends on when you believe the project started. I started
    thinking about the feasibility of a userspace port in late ’98. I decided that there were probably no fundamental problems with the idea, and started writing code in early February ’99. The first public sign of UML was my announcement on the kernel list in the first week of that June.

    NewsForge: What part of the world are you in, and do you have another job besides this project?

    Dike: New Hampshire, USA. I’m the CTO of a startup (addtoit.com) … I’m doing some contracting on the side.

    NewsForge: Any idea of how many users or downloads your project has?

    Dike: No idea. Here are some random numbers though. 🙂 Uml-user has (as of Tuesday, Jan. 15) 275 subscribers, uml-devel has (as of Tuesday) 171 subscribers.

    SourceForge has just over 70,000 downloads listed for me. However,
    they have lost track of downloads at points in the past. Also,
    UML is available from other mirrors, several other projects are
    distributing UML, and it was in the 2.4.x-ac pool, which can be
    downloaded from everywhere. So, 70,000 is probably a gross
    underestimate, and I have no idea what would be more accurate.

    For some reason, there are now hundreds of downloads from SF
    every day, which is up drastically from a month or so ago. The
    interesting thing is that page views have not increased similarly.

    NewsForge: How many other developers are working on UML?

    Dike: Basically, the project is me. Of course, I’ve had important contributions
    from other people and I don’t want to downplay them, but I’m the
    only person doing work in the core UML code.

    NewsForge: Explain how UML works — it looks like it works kind of like VMware’s workstation product, in that you can run different distributions side
    by side. Is that a fair comparison?

    Dike: The overall effect is the same as VMware. You can boot up multiple
    Linux virtual machines on a single host.

    The design is radically different. VMWare is a hardware x86 emulator which
    can (in principal) boot any x86 OS kernel. UML is a port of Linux to Linux.
    So, UML can only be a Linux guest. However, UML can run on any platform that
    Linux runs on (such a port needs some work, and UML somewhat runs on ppc), in
    contrast to VMWare being restricted to x86.

    NewsForge: How many VMs can you run at once?

    Dike: I frequently run three to four copies of UML on my laptop (256M, 750 MHz PIII). There’s a case study on the UML site
    (http://user-mode-linux.sourceforge.net/case-studies.html) describing a 20-node virtual UML network running on a fairly modest PC. I’ve heard from other people who have run dozens of copies of UML on a single host — the highest I’ve heard of is around 50.

    NewsForge: Doesn’t running four or five instances of UML on a laptop leave precious little RAM/other resources for each?

    Dike: The default “physical” memory size for UML is 32M, which will run a fairly decent virtual system. My laptop has 256M in it. 32M * (4 or 5) = 128M or 160M. That leaves plenty of room for other things. I’ve never noticed multiple instances of UML causing a resource drag. Given the fact that other people have run dozens of UML instances on machines not too different from my laptop, I’d say that I’m not close to pushing any limits when I run four or five.

    NewsForge: What’s next for the project?

    Dike: I’m currently concentrating on killing bugs and adding little bits of missing functionality so I can consider it stable and functional enough to say that
    it has reached version 1.0. That will be a stable, robust, functionally
    complete virtual machine.

    After that, there are a number of very interesting clustering possibilities
    for UML. There are a number of Linux clustering projects happening now,
    and they will probably end up using UML as their development base, just
    as many kernel hackers are using UML for development now. These clusters
    are ultimately intended to be implemented as clusters of physical machines.
    However, virtual clusters would be interesting in their own right. A UML
    cluster running on multiple hosts running different OSes could provide its
    processes transparent access to the combined resources of its hosts. Imagine
    Apache inside a UML cluster having access to a MySQL database on its Linux
    host available as a filesystem inside UML, to the database engine on its
    OS/400 host, and to apps on its Windows host.

    NewsForge: So you have one machine with several VMs on it connected to a cluster
    — and so the cluster can have access to any of the virtual machines on that machine? What’s the advantage to this?

    Dike: No, you’d spread a virtual cluster over multiple hosts. It would look
    like a single UML, but it would have multiple virtual processors and each
    of them would be running on a different host.

    In the example I gave, there would be three hosts, running Linux, OS/400, and
    Windows, and there would be a single UML running on all of them, just as a
    cluster of physical machines runs a single kernel on multiple boxes.

    NewsForge: Wouldn’t a virtual cluster in essence just have the computing
    resources of that one machine?

    Dike: No, because it would a single UML instance spread over multiple hosts. So it would have access to the combined resources of those machines.

    NewsForge: So people are actually using UML for more than applications testing? It sounds like people are using UML to do the mainframe style of VM things, running multiple copies of Linux on one machine doing different functions.

    Dike: Kernel development is probably the biggest use of UML right now. A number of people are using it to build virtual networks, for educational purposes
    and for testing (e.g. the FreeS/WAN people are using UML as their testbed).

    Others are using it to jail services like bind and sendmail. That adds an
    extra layer of security for services that have a history of exploits.

    I’ve heard from a number of ISPs who are interested in using UML to offer
    virtual colocation. I don’t know of any that have put it into production
    yet.

    In addition, there are lots of people who find it convenient to be able
    to fire up another Linux box whenever they want. They have all kinds of
    different reasons, i.e.

    • playing with new kernels
    • playing with new distributions
    • setting up and testing new services
    • maintaining packages that require a whole system to test (i.e. Rpm)

    NewsForge: What is the ultimate goal for the project?

    Dike: I’m not sure it has an ultimate goal. Obviously, I’d like to see virtual
    machines be a standard fixture in server rooms everywhere, not just server
    rooms that have S/390s in them. And obviously, I’d like those virtual
    machines to be UMLs.

    After that happens, I’m going to start looking for UML clusters to start
    taking over the world …

  • Category:

    • Linux