[转] Vmware vs Virtualbox vs KVM vs XEN: virtual machines performance comparison

http://www.ilsistemista.net/index.php/virtualization/1-virtual-machines-performance-comparison.html?limitstart=0

Today, “Virtual machine” seems to be a magic words in the computer industry. Why? Simply stated, this technology promise better server utilization, better server management, better power efficiency, and, oh yes, some other random pick of better things! The obvious question is if virtual machines technology really provide this better experience. In short: yes. While it has its set of problems and complications, when used correctly this technology can really please you with some great advantages over the one-operating-system-for-one-server paradigm vastly used in the x86 arena.

But, assumed that virtual machines make sense in your environment, what is the best virtualization software to choose? There are many virtualizer and paravirtualizer available today, and some once-commercial virtualization softwares are now freely released (for examples, think to VMware Server and Citrix XenServer). In the end, the choice can be very hard. As the remaining commercial, non-free virtualizator are designed for the upper end of the market (datacenters or large-sized corporates), this article will focus on available, free virtual machine softwares. So, be prepared to a furious battle between VMware vs Virtualbox vs KVM vs Xen!

To simplify the situation, let define three requirements that a good virtual machine software must satisfy, in relevance order:

it should met the required feature level
it should met the required performance level
it should be simple to manage

Speaking about the feature level, the choice is very straightforward: simply discard the virtualizers that do not meet your level. For example, do you need the capability to do many snapshots? Good, simply remove from the list the virtualizers who don't have it (anyone said VMware server?). Or you do you need live migration? Remove the incapable virtualizers from the list.

This is simply a check-the-features-list work and, as the virtual machine softwares are very actively developed and the supported features can vary rapidly, it should be really done against the latest released virtualizer version.

Regarding point n.3, the simplicity of management, it should be noted that for most environment most, if not all, the free virtualizers are equivalent: they very ofter provide a comfortable GUI and a well done CLI interface. One thing that not all provide, however, is a Web-based GUI: this can be a really useful feature as it enable basic guest management without the need to install a proprietary GUI client and/or messing around with the CLI. As above, that feature should be checked against the latest released software version.

The whole point of this article, however, is not to provide a detailed feature-driven analysis of the various virtual machine softwares: features vary very rapidly and always have a relative importance based on the specific environment needs.

This article will focus on point n.2 – performance level. This often overlooked point is crucial to select the right virtualizer software from the list of software that mets you feature level. However, please keep in mind that the order in which the virtualizers performances stack on each other can vary considerably based on the workload type. For example, a virtualizer can be faster that another in a network bound benchmark, but the latter can be faster in a CPU-heavy job.

So, this article have not the claim to elect the absolute better virtual machine software. No, it only present you a glimpse of the performances provided by some of the best developed virtualizers. Only you, the final customer, can finally do the right choice selecting the best-suited software to yours specific needs. I hope that this article can help you in the choice.

UPDATE: a recent article comparing KVM vs VirtualBox can be found here: http://www.ilsistemista.net/index.php/virtualization/12-kvm-vs-virtualbox-40-on-rhel-6.html

Testbed end Methods

The contenders of this test round are four free virtual machine software:

VMware Server, version 2.0.2
Xen, version 3.0.3-94
KVM, version 0.83
VirtualBox, version 3.1.2

All those virtualizer were tested on a self-assembled PC on which only one guest instance was run. Below you can find the hardware configuration and guest settings:

As I want to evaluate the out-of-the-box experience and the auto-configure capabilities of the various hypervisor, the virtual machine were created with default settings. The only exception to this rule was the enabling of the nested page table feature on VirtualBox (as the other virtualizer can auto-enable this feature and were run with nested page table enable, I feel unfair to not enable it on VirtualBox also).

VMware and Virtual box offer you the possibility to install in the guest OS some additional packages which should improve performances and/or integration with the host system. For this review, I installed the VMware guest addition but not the VirtualBox addition. Why? Simply put, while VMware's guest tool provide some very important performance-enhanced driver (as a paravirtualized network driver), the VirtualBox addition seems to focus only on providing a better mouse/keyboard support and on enabling more resolutions for the virtual video adapter. This is in-line with my previous tests, where I note that VirtualBox's performance level was not touched at all by its guest addition (the video emulation was even a bit slower with guest tools installed).

A note about Xen: while the other hypervisors examined today run on top of a completely “standard” and unmodified host operationg system, Xen use another paradigm. At the lowest end, near the hardware, it has a smaller “bare-metal” hypervisor. On top of that first layer run the hosting operating system (to use Xen terminology, it runs on Dom0), that has the special privileges to talk with the hardware (using the hardware drivers) and to start other, unprivileged guest OS (in the so called DomU).

All the benchmark data collected are relative to this single guest OS instance. While is not very realistic to run a single guest OS instance, I think that the data collected can be very indicative of the virtualizators relative performances and of its overhead.

The collected data cover two type of workload:

a synthetic one, made of very specific tests (eg: CPU test, memory test, etc)
a real-world one, made recording the performances exhibited by some real server application (most notably, Apache 2.2.14 and MySQL 5.1.41)

The synthetic test should give us some perceptions about what kind of operation slow down the virtual machine and, so, where a virtualizer is better (or worse) than others. However, keep in mind that synthetic tests can describe only half of the truth: to have a more complete understanding of the situation, we need to examine “real-word” results also. In this article, we will evaluate the performance of two very important services: Apache and MySQL. All the tests below (with the exception of IOmeter) were run 3 time and the results were averaged.

Also, please keep in mind that, while proper hardware benchmarking is itself a very difficult task, benchmarking a guest operating system, which run on top of another, hosting operating system, is even more difficult. Try to isolate the pure guest performance can be very tricky, especially regarding I/O performance. The reason is clear: the benchmark method must not only account for the guest-side I/O caching, but also for host-side I/O caching. To alleviate the host-side cache effects (which can really alter the collected data), I run a host-side script that actually synchronized the host-side write cache and then drop both write and read caches before the execution of any I/O sensitive benchmark. While you can argue that caching is a very important source of performance and a virtual machine can use it very efficiently (and you are right), this article aim to isolate the hypervisors performances on some very specific tasks and, to present you reproducible results, I had to go for the route of drop host-side cache between each test run.

Let me repeat it another time: this article does not aim to elect the best-of-all-and-in-all virtual machine software. A different benchmark methodology can give different results, and, in the future, I will do many other tests in many other environments. For example, the task of running many different guest OS will be the subject of a following article.

So... it's time for some number. Let see some synthetic benchmarks results first.

Synthetic benchmarks considerations

The synthetic benchmarks help us to examine the various subsystems one by one.

We are going to examine the subsystems in the following order:

CPU speed
cache and memory speed
I/O disk access speed
network speed

While synthetic tests are invaluable in an in-depth performance analysis, remember that they describe only half of the truth and, sometime, can even be misleading: for example, a software that run a CPU benchmark very fast can be quite slow in real-world situations, where many factor are at play at the very same time.

Sandra CPU Benchmark

To evaluate pure CPU speed, I used the proven Sandra CPU benchmark, that show both integer and floating-point performances.

Sandra CPU benchmark

As you can see, VMware takes the lead in both ALU and FPU performance, but it is the integer speed which show the higher advantage. Xen also shows a bit higher results on integer test than the other contenders. The key point here is to identify what CPU and ISA version are presented to the guest os by the virtualizer:

You are right: only VMware and Xen present to the guest OS the SSSE3 and newer extensions. As Sandra's ALU test can run in a optimized SSE4 mode, it give higher results on the virtualizers that export these instructions.

The FPU test, on the other side, runs using SSE3 and all the virtualizers examined today export them to guest OS. The net result is that VirtualBox and KVM are on par with VMware, while Xen show some problems giving a considerably lower result. This is a trend that you will see on most CPU-heavy tests: the first three software shown somewhat similar results, while Xen is slower.

Cache and Memory subsystems

Current systems can be memory-performance limited in a number of situations, so it is crucial to show the cache and memory subsystems performance both in term of bandwidth and latency.

To collect these data, I used Sandra Cache and Memory tests.

First, let see the L1/L2/L3 cache bandwidth results:

Cache bandwidth

While VMware, VirtualBox and KVM give us quite similar in results, Xen is behind the competitors. This can be a results of a more-heavy hypervisor or, more probably, of the “double encapsulation” of the guest system.

Now, it's time for memory bandwidth results:

Memory bandwidth

The story is reversed now: Xen is slight faster, followed by KVM, VMware and, finally, by VirtualBox (which lag considerably behind the other). It seems that Xen do a very good use of the nested page table feature.

Other than bandwidth, a very important parameter is latency. Let see the latency data for caches first:

Cache latency

The results are quite similar, but VMware seems a bit more slow while fetching data from L2 cache.

Now the memory latency results:

Memory latency

The results are very close, with VMware at the slowest end and Xen at the fastest end.

Please consider that repeating the test many times, I often obtained quite different results. While it is interesting to analyze these data, many factor are at play here. For example, the slower show of VMware can be simply due to some host-side work that the system was doing in that precise moment. At the other end, for the same reason, it can be that Xen has the potential to be faster than it showed here.

However, all in all, Xen seems to be the fastest hypervisor in these memory related tests.

Mixed CPU / Memory performance data

To shown the aggregate CPU / memory performances, I used a number of synthetic and semi-synthetic benchmarks. The first is SuperPI:

SuperPI benchmark

Apart Xen, which is considerably slower, all other virtualizers run neck to neck.

Then, I run some cryptographic related benchmark using OpenSSL. Let's begin with AES-256 encryption benchmark:

AES-256 encryption

Same story here: Xen is slower, while the others are quite on par.

It is RSA-2048 turn now. Key signing speed first...

RSA-2048 sign speed

...and verify speed then:

RSA-2048 verify speed

The results are clear: while VMware, VirtualBox and KVM offer very similar speed, Xen is noticeably slower.

I/O benchmarks: Windows 2008 installation time

A critical parameters for virtual machine are I/O performances: while a 10% loss on CPU speed can be a minor problem (as CPU performance are almost always greater than needed), a 10% speed loss on I/O performances can hardly be a no-problem event. So, it is crucial that each hypervisor do its best to cause the smallest possible overhead on I/O operations.

To offer you a 360-degree view on the problem, I run very different I/O benchmarks.

The first is Windows 2008 install time: for this test, I've measured the time needed for a Windows 2008 full installation. The timer was started at the initial file copy operation (right after the partitions definition) and stopped at the end of the first installation phase (right before the system ask to be restarted):

Windows 2008 installation time

As you can see, VirtualBox is the clear winner: it took less that 6 minutes to complete the operation.

VMware and Xen are very closely matched, while KVM is the real loser: it took over 30 minutes to complete, a 5-fold increase compared to VirtualBox! What can be the problem here? It is a vdisk image format related problem, or it is an indicator of a serious I/O overhead?

To reply the above questions, I run the same test in different conditions:

Windows 2008 installation time with QCOW2 and RAW

In the above graph, I shown the results of Windows 2008 install using three different disk image setup:

a normal, dynamic QCOW2 image
a normal QCOW2 image with 10 GB of preallocated space
a raw image

Using a raw image let us to bypass any possible problem related on cache type and QCOW2 block driver (see Qemu documentation for more infos), while using a preallocated QCOW2 image we can measure the cost of the dynamic increment feature.

The records speak themselves: while the install time remain quite high, using both a RAW image or a preallocated QCOW2 image bring us an interesting boost. In other words, it seems that KVM has a very high I/O overhead and then, stripping some of the I/O operations using a preallocated image or a RAW image (which is, by definition, preallocated) give us a noticeable speed increment.

This theory is supported also by direct observation of the time needed to load the Windows 2008 installer from the CD: KVM was the slowest, indicating slow I/O performances not only on disks but also on removable devices as CD-ROM.

So now we know we KVM was so slow, but why was VirtulBox so fast? Probably, VirtualBox configure its write cache as a write-back cache type, which is faster that a write-throught type but it is also more prone, in certain circumstances, to data loss. This is a perfect example of how the different “roots” ot the analyzed hypervisor emerge: VirtualBox was created as a desktop product, where speed is ofter more important that correctness. The other hypervisors use the opposite approach: they sacrifice speed on the altar of safety; however, they can be configured to behave as VirtualBox (using a cache-back policy).

I/O benchmarks: HDTune benchmark

The second test is HD Tune Benchmark Pro, which let us examine the bandwidth and access time of the primary disk reading 512 byte, 4 KB, 64 KB and 1024 KB sized chunks. Generally, the low-sized fragment benchmarks are access-time bound, while the 1024 KB sized benchmark is peak bandwidth bound.

Let see the results (please keep in mind that the HDTune benchmarks are read only tests):

HDTune read bandwidth

Can you ever see the 512 byte and 4 KB results? This graph is typical for a server equipped with mechanical disks: the lowest-sized benchmarks are dominated by seek time (which is a constant time, not related to chunk size). This is the main reason while flash-based SSD are so fast ;)

However we can isolate the virtualizer speed, and we can see that Xen and KVM are considerably slower that Vmware, which in turn is slight slower that VirtualBox across the board. Another point of view for the same that is the following:

HDTune read IOPS

This time, we are not measuring performances on KB/s, but on IOPS (I/O operations per second) terms. As you can see, VirtualBox satisfies the greater number of IOPS for the three low sized benchmark, while it is only slight behind the leader (VMware) on the 1024 KB sized test.

At last, we want to see the disks total access time (seek time + read latency)

HDTune access time

This graph is yet another view of the same data: VirtualBox is the leader shortly followed by VMware, while KVM and especially Xen are significantly slower.

Did you remember that I run that test invalidating host-side cache each time? What can be the I/O results if we did not that action and, instead, we want to use the host-side cache? If you are curious, these graphs are for you...

Bandwidth results:

HDTune cached bandwidth

I/O operations per second:

HDTune cached IOPS

Access time speed:

HDTune cached access time

Wow, if a real physical disk can really guarantee that sort of results, it should be a best buy ;)

Seriously speaking, these results are not really from the physical disk subsystems: they are a cache-to-cache copy or, if you prefer, a host-memory-to-guest-memory copy. In other words, these results really shown hypervisor's I/O overhead.

VMware is the indisputable leader, with VirtualBox at the second place and Xen, greatly behind, at the third. The most interesting thing however is the incredibly slow KVM shown: its results are only a little better that the non-cached version. What can be the culprit here? The cache was disabled? I think no, because the results are better than the non-cached version – they are only too little better. It can be the slow block subsystem? Yes, it can be, but the 1024 KB seems to suggest a slow host-to-guest memory copy performance also. Whichever is the cause, KVM is simply a full light-year behind in the cached tests.

I/O benchmarks: IOMeter

The last synthetic test is IOMeter. It is a very interesting test because it can not only vary the chunk size (like HD Tune), but it can also vary the I/O queue depth (the number of outstanding requests per worker). Normally, varying the queue depth is used to test the NCQ capability of the chipset / disk combo; this time, we use it to test the various hypervisors. The block size is fixed to 4 KB.

Let see the IOPS number first:

IOMeter IOPS

The first half of the above graph is the read test, the second half is the write test.

The very first thing to note is that KVM is, by much, the slower hypervisor in the read test; VMware is at the other end of the spectrum with a very great performance. What can be the cause of that KVM poor show? It is not possible that the problem depend on the dynamic allocation of QCOW2 image, because IOMeter allocate all the needed disk space before the test execution. To me, the KVM results are due to very poor caching and great I/O overhead. Xen is considerably faster then KVM, but it lags behind VirtualBox by a great margin which is, in turn, at about half route between Xen and VMware. However, it is interesting to note that Xen and VMware are the only hypervisor that seem to have a little benefit from the increasing queue depth.

Watching the write test, we have a very different picture: the faster virtual machine is VirtualBox (with decreasing performance as queue depth rises), while the other seems to be equally slow. Don't be fooled by the graph however: there are very interesting difference between the other virtualizers, only the graph's scale is not fine enough. For give you a more precise view, I reported the raw write test data in a table:

As you can see, KVM, Xen and VMWare behave in very different manners, but they are all much behind VirtualBox. Why? Remember the Windows 2008 installation time? The whole point can be a different write cacheing policy. To me, seems that VirtualBox use a write-back cache algorithm, while the others use a write-throught policy. The net results is greater speed for VirtualBox, but also a greater risk of data loss in the cases of power failure and/or guest/host crash. However, if you prefer speed over safety, the other virtualizers can be configured to use a write-back policy also.

Is worth note that, in write test, VMware is the only virtual machine that has a good boost from increased queue depth.

What about CPU load under Iometer tests?

IOMeter CPU load

As before, the left half is about the read test, the right half about the write test.

That graph shows that in the read test VirtualBox and VMware are the most CPU hungry hypervisor – but they have much greater I/O results than the others. In the write test, VirtualBox and KVM have the most CPU cost – but the perform in very different manner.

To give you a more correct point of view, I created a graph that visualizes the CPU load cost for each I/O operation:

IOMeter normalized CPU load

We can see a clear patter now: VirtualBox and VMware are the most I/O efficient virtualizers, with Xen not too much behind. KVM is the clear loser here.

Network subsystem

To benchmark the pure network speed, I flooded each guest machine with many pings, using the command “ping -f 192.168.122.166 -c 25000 -s 1024” from a Linux client. The Linux client and the host server machine were connected to the same 10/100 switch.

What was the best-performing guest?

Ping flood test

The results are quite close, with Xen only slight slower.

What about CPU load?

Ping flood CPU load

With this test we transferred and received a 1 KB x 25000, for a total of 25.6 MB sent and 25.6 MB received. As you can see from the previous graph, the CPU load is not approaching 100% for any guest machine, so why we need 25 seconds to transfer a total of 51.2 Mbytes? While is true that the virtualizer add a considerable CPU overhead and the IP and Ethernet encapsulation add some extra bytes sent on the LAN link, I think that in this case I nearly saturate the packet-forwarding rate of the little switch used to connect the two machines.

Anyway, the test remain very interesting as, in this case, I recorded interrupt and privileged time. To really understand this graph, you need to know that the former is really a subcomponent of privileged time, so you don't need to add it to the latter to get CPU time. In other word, the privileged time is the sum on IRQ time + syscall time.

Recording the total privileged time and IRQ time, we can easily have an idea of syscall time (privileged time - IRQ time).

We can see that KVM was the best performing virtual machine, with a privileged time dominated by IRQ servicing time: this means that the network-related syscalls executed quite fast on KVM.

On contrary, Xen has a very high privilege time but a lower IRQ time: the network-related syscalls executed slowly. VMware as a extraordinary low IRQ time, probably a courtesy of its paravirtualized network driver, while VirtualBox is overall a decent performer.

Real World tests

The synthetic tests above should paint a quite clear picture about the various hypervisors subsystems: mainly CPU, memory and I/O. But how these data correlate to real world application performances? To investigate this question, I run some interesting application benchmarks. The utilized application are:

Apache 2.2.14 w/PHP 5.2.11 (32 bit version)
MySQL 5.1.41 (64 bit version)
Sysbench 0.4.10 (64 bit version)
FileZilla 3.3.0.1 (32 bit version)
7-Zip 4.65 (64 bit version)

Web server benchmark: simple, static content

To benchmark the web server performance under simple, static content, I used the default Apache's test page – the “It works!” page. While you can argue that this test is unrealistically simple (and you are right), it should give us a view of pure Apache and HTTP performance. Let see the results:

Apache static benchmark

While Xen is really slow, the other hypervisors perform quite similarly here. Why Xen is so slow? While is seems to have a lower CPU efficiency, this thing alone is not sufficient to justify this bad show. As Apache creates a new thread for each new connection, it is entirely prossible that the real culprit here is a very low speed in creating new threads. If this is true, Xen shoulds exhibit very low performance in massive multithreaded environment. Later, in the MySQL benchmarks (another multithreaded process), we will check this supposition.

What about CPU load?

Apache static CPU load

VMware and VirtualBox have the lower CPU usage, but KVM and Xen are not too much behind; however, remind that Xen results are the lowest by a great factor.

What is intriguing here is to note the different distribution of user time and kernel time (in which syscall time and the IRQ time are also included). VirtualBox seems to be the best user-level hypervisior, but it lag behind in kernel-level CPU time. At the other end, VMware has great kernel-level CPU time but the worst user-level time.

What can be the cause? Generally, kernel time is dominated by syscall time, but in some case it can also be greatly influenced by IRQ servicing time. We have already noted that VMware has very low IRQ servicing time, so it is not surprising that it was the faster machine. All in all, these results maps quite nicely with the ping flood results: Xen is the slowest machine, while VMware is the fastest.

What about disk load?

Apache static HD load

Not surprisingly, KVM cause the most load from the disk. Xen cause a low load but it perform at a fraction of the other hypervisors. All in all, the most efficient virtual machine is VirtualBox, followed by VMware.

Web server benchmark: complex, dynamic page created with PHP and MySQL

For this benchmark, I used a default Joomla installation (complete of example pages).

Here are the data:

Apache dynamic test

Wow, these results are very low... It is possible that Apache and PHP on Windows are a not so great choice, but we want to concentrate our attention to the virtualizers results. KVM is finally on top, with VirtualBox very near; then we have Xen and, in the last place, VMware. As Xen is quite good now, does this benchmark contradict the previous Apache static benchmark? No. This time, Apache is capable of spawning only 3 or 4 thread each second. At this rate, Xen has no great problems with threading.

Now, let see CPU load:

Apache dynamic test CPU load

We have 100% CPU load for each contender.

Disk stats:

Apache dynamic test HD load

All in all, the disk utilization is quite low, but VMware is the loser here. VirtualBox and KVM are the winner, outclassing by a small margin Xen.

MySQL performance: Sysbench prepare

In the previous Apache test, the backend database was MySQL. What about pure MySQL performance? To test it, I used the Sysbench database benchmark module running on a Linux client machine connected to the host machine with a 10/100 switch. Let see the results of the first step, the “prepare” step, in which I populate a test database with 1000000 rows:

Sysbench prepare time

Xen was the slowest machine, while VirtualBox was the fastest, followed by KVM and VMware.

Now, CPU load:

VirtualBox was not only the faster machine, but also the one with less CPU load. VMware has the higher CPU load, with great user time dominance, but the real loser is Xen: its CPU load is similar to that of KVM, but the latter give use noticeable better performances.

Finally, hard disk time:

Sysbench prepare HD load

All virtual machine seems limited by HD access (the disk time always exceed 100%), but VMware seems to use the disks in a more limited manner.

MySQL performance: Sysbench simple test

In this test, 16 threads execute a total of 100000 select of the previously created database and table.

Note that this is a read only test.

There are the “pure speed” results, in transaction per seconds:

Sysbench simple TPS

Xen is again the slowest hypervisor, with a very great gap from the other. Do you remember the issues with massive multi-threaded program that we supposed to exist in the Apache benchmark? Well, it seems that we were right: MySQL is another well-threaded program, and Xen results are very low. On the other hand, KVM is a little faster that the others.

CPU load records:

Sysbench simple CPU load

We see a very great dominance of privileged (kernel) time. It means that the system spend the most time on syscalls or IRQ servicing routines.

Now the hard disk load test:

Sysbench simple HD load

The most HD hungry machine, VMware, load the primary disk is a bit more than the others. All in all, it seems that the combined host-side and guest-side caching are doing a good work now, also for KVM.

MySQL performance: Sysbench complex test

Now it is time to go ahead with some more heavy SQL test. The Sysbench complex is a test with, err, complex statement and mixed read and write requests. I run this test with 16 threads asking for a total of 10000 requests (1/10 that the simple test).

Some number in transactions per second:

Sysbench complex TPS

In the heavily threaded MySQL program, Xen is again the slowest machine by a very wide margin, while the others are more or less on par.

What about CPU load?

Sysbench complex CPU load

VMware is the “lightest” hypervisor, probably thanks to its paravirtualized network driver and low disk write CPU load (see the IOMeter results above).

Apropos of disks, the next graph is very interesting:

Sysbench complex HD load

KVM seems to be the hypervisor less heavy on disk subsystems. Don't be fooled by relatively good Xen results: its low load is probably only a results of its bad number of transaction per second.

FTP server test

For evaluate FTP performance, I installed FileZilla on each guests and then I uploaded to, and downloaded from, it the ISO image of the Ubuntu 9.10 x86_64 live CD.

Here are the results expressed in KB/s:

FTP transfer speed

All in all, I think that the all virtualizers tested are in the same league here.

It can be that CPU usage can paint a different picture?

FTP transfer CPU load

Please note that the bottom half show you CPU load during upload, while the top half during download. The results are not so homogeneous now: Xen is the least efficient hypervisor here, followed at some distance by KVM (remember that interrupt time is already included in privileged time). It's a shame, because its IRQ service time seems to be very low, but its total privileged shows that the syscall routines execute very slowly.

Speaking about KVM, we can see that it seems to be quite better that Xen, but it remain over 2X heavier than VMware and VirtualBox, which are the true leaders here. VMware's very low interrupt time is again a gift of its paravirtualized network driver, but VirtualBox is very good also: it is only in the download test that VMware is capable of pulling a bit ahead.

What about the hard disk load?

FTP transfer HD load

The first thing to note is that the download disk activity is so small thanks to the guest side caching (I run download tests after the upload ones and, while I purged the host sided cache, the guest cache was left intact).

Upload disk activity is a very different beast: we can see that VMware is the most efficient machine, Xen is the least, and VirtualBox and KVM are somewhere in the middle.

File compression / decompression test

Last but not least, I run some file compression and decompression benchmarks. To this purpose, I used the integrated 7-Zip benchmark and another little benchmark that is really a .bat script with the sole purpose of compress and decompress a small zip file (about 6 MB) containing thousands of smaller icon files.

First, let see the 7-Zip results:

7-Zip benchmark

Apart Xen, which is slower, VMware, VirtualBox and KVM are quite paired each other. Please consider that the 7-Zip benchmark run entirely in RAM and do not use the disk storage.

What about a more real world, commonly faced situation as compressing and decompressing a .zip file?

Zip compressing and decompressing

This time the primary disk is heavily loaded by the decompress operations. Quite surprisingly, KVM (with its apparently slow disk access time) is very fast to both compressing and decompressing. VirtualBox is also very fast at compressing but not so great at decompressing, while VMware and Xen behave in the exactly opposite manner and are overall the slowest machine.

What can be at play here? It is difficult to give us a very precise interpretation but it seems to me that in the compression test, which is more CPU intensive, KVM and VirtualBox both have a quite strong advantage. In the decompression test, which is generally disk bound, we see that KVM lost the crown probably as a result of its no-so-quick disk subsystem, while Xen and VMware are slightly better. The bad VirtualBox performance surprise me, as in theoretical disk tests it show excellent results.

Conclusions

Ok, after 20 pages of tests, its time to draw some conclusions.

To me, it seem that VMware and VirtualBox are the fastest virtual machine across the board. They have good CPU/memory performance, good disk access time and good network layer speed.

KVM is, instead, a mixed beast: it has quite good CPU/memory and network speed, but it fail in the crucial I/O subsystem performance more often than not.

Xen is at the opposite end of the spectrum: it as respectable I/O access time but quite bad CPU/memory performance that, in turn, can badly influence network speed and CPU load also.

Speaking about interfaces, VirtualBox has an excellent Qt-based interface and a well developped CLI. Xen and KVM utilize the virt-manager interface (GTK+ based), which is very well done and also provide a robust CLI to play with. VMware is the only that does not provide a standalone graphical interface, substituting it with a Web-based UI. While this is a very good idea in principle, we must underline a very annoying fact about the web interface: it is quite unresponsive and very unstable. On CentOS/RedHat 5.4 you must manually install a libc dynamic library file to let the WebUI "survive" more than just some clicks from the user (read this bug report for details: http://bugs.centos.org/view.php?id=3884 ). On the other hand, VMware probably has the best developped CLI, with Perl and C bindings.

So, who is the winner of the day? Let me repeat: each of the examined virtualizer has its reasons to exist, with its strong points but also with some drawbacks.

However, if I am forced to pick just one of these softwares for a dedicated server machine, I will probably go with VMware: while VirtualBox is quite on par, I think that VMware's paravirtualized network driver give it a slight advantage over the others. Note however that VMware server has some important handicaps: it can manage only 2 snapshots and, as stated above, the WebUI has some problems running on CentOS 5.4. So, if you plan to heavily use the snapshot features, you must use the latest CentOS/RedHat release (5.4) and/or you want a desktop-oriented virtualizer, go with VirtualBox: it has excellent performances and an easy-to-use interface.

If you, instead, love the Linux-standard virtmanager interface, you can go with KVM or Xen. All in all, I would prefer KVM most often that Xen because the latter seems to be very slow in CPU and memory subsystems. Moreover, Xen seems to have some serious problem with massive-multithreaded programs, which is not so good for a server machine.