Author: Jem Matzan
I tested Gentoo Linux 2005.0 for x86 and AMD64 because it is customizable enough for benchmarking and can be reduced to only the software and services that I needed for performance testing. Other 64-bit OSes are available and would probably work well, but by using Gentoo I knew that I could compile (or recompile) the entire operating environment with the most appropriate options. Previously I did some performance testing on FreeBSD for AMD64, but used different hardware and test criteria.
I ran benchmark tests covering database performance with MySQL and Super Smack; encryption performance with OpenSSL; 3D rendering performance with Unreal Tournament 2004; and compiler speed with a timed compile of X.org.
My test system hardware was a workstation equipped with:
- An Athlon 64 4000+ processor
- MSI K8T Neo2-FIR motherboard
- Corsair TwinX LL 1024MB set (two tested 512MB modules)
- Seagate SATA-V 160MB hard drive connected to the VIA SATA RAID chip
- Albatron Nvidia GeForce FX5700 Ultra3 (128MB DDR3 video RAM)
The software was Gentoo Linux 2005.0 using the Universal ISOs. I performed a stage 3 installation with no USE flags, the compiler options set for -pipe -O2 -fomit-frame-pointer
, and the Pentium 4 and K8 -march options. I used the Pentium 4 option with the Athlon 64 because it has the same technologies (SSE, SSE2, MMX) that the Pentium 4 option provides. This could enhance performance in some tests, as the AMD-specific architectures below the K8 do not include SSE2. I also set the MAKEOPTS variable to -j2, which increases processor load by performing two parallel makes when compiling.
I disabled CPU frequency scaling in the kernel for both setups. The drivers for all of the system’s hardware were built into the kernel, with the exception of the video driver. I installed the Nvidia driver version 1.0.6629-r4 from Portage for both architectures.
I ran all the benchmark tests from the command line except for Unreal Tournament 2004, which required an X server. For testing this, I installed Fluxbox as a low-overhead window manager to work from.
It’s important to note that this benchmarking project measures the performance of the software, not the hardware, so the software setup for both test cases is going to have to be different. In a hardware performance comparison, the software in each test case must remain the same or as similar as possible to eliminate any software variables. When comparing 32-bit and 64-bit performance on the same hardware, the situation is just the opposite: the hardware must be the same and the software must change. The operating system might be the same distribution and the software may be the same versions between two test cases, but the compiler will behave differently and compile in different options and features for each architecture. In some cases, the 64-bit tests will not elicit results that come close to the theoretical limits of the hardware, due to the fact that the AMD64 architecture has been available only for a short amount of time compared with 32-bit x86, which has had more than a dozen years to achieve maximum performance optimization.
OpenSSL speed test
OpenSSL is responsible for the bulk of the Internet’s daily data encryption and decryption. It uses several different protocols for a variety of data types and applications. Some protocols are more CPU- and memory-intensive than others, and, doubtless, some have been tweaked for better performance on the x86 architecture.
My benchmark command was openssl speed > openssl.txt
. You may notice in the configuration options at the top of the results that OpenSSL has been compiled slightly differently for each architecture. This does not invalidate the results, as both tests were run using the default, unadorned configuration that Gentoo Linux provides. Here are the results, with 32-bit listed first:
OpenSSL 0.9.7e 25 Oct 2004 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
And for 64-bit:
OpenSSL 0.9.7e 25 Oct 2004 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
32-bit OpenSSL seems to pull ahead on several of the tests, but 64-bit blows it away by factors of two and three in the AES, RSA, and DSA ciphers. The top set of tests measures the algorithm speed of the listed ciphers. The second set of tests — these may be more important to some — test signing and verifying encryption keys.
Super Smack and MySQL
Super Smack is a database benchmarking utility that works with either MySQL or PostgreSQL. I chose to use MySQL, since it is more common among Web-based applications and has a larger installed base. The MySQL version used for both systems was 12.22 Distrib 4.0.24.
The benchmark command was super-smack -d mysql
(that’s 10 clients with 10,000 queries), and the results are given in queries per second.
Super Smack 32-bit | Super Smack 64-bit | |
Select-key | 17374.58 q/s | 18148.10 q/s |
Select_index | 9333.48 q/s | 9717.08 q/s |
Update_index | 9333.48 q/s | 9717.08 q/s |
The test parameters generate 200,000 MyISAM table queries, which is sufficient for testing, and identical to the settings in Tony Bourke’s database benchmarking article.
The 64-bit edition was faster, but not by a significant degree.
Unreal Tournament 2004
Unreal Tournament 2004 comes with both 32-bit and 64-bit binaries, making it an ideal OpenGL 3D rendering and gaming performance benchmark. It works in conjunction with Nvidia’s 64-bit Linux driver. The driver version for both 32-bit and 64-bit was the latest stable version at the time of testing, which was 1.0.6629-r4. UT2004 was patched to version 3339, which is also the latest at the time of testing.
I followed Andreas ‘GlaDiaC’ Schneider’s benchmarking guide (scroll down to section 7 for the benchmarking procedure) for the UT2004 tests. I changed the number of bots to 16 to increase the CPU load, raised the test time to 120 seconds to increase the accuracy of the data, and turned the detail settings to their maximums. Screen resolution was 1024×768 and color depth was 24-bit. The results, which are the log files from the tests, are listed in frames per second. The lowest recorded framerate is the first number, the second is the average frame rate, and the third number in the series is the maximum recorded framerate.
UT2004 Build UT2004_Build_[2004-11-11_10.48]
x86 Linux
AuthenticAMD Unknown processor @ 2400 MHz
GeForce FX 5700 Ultra/AGP/SSE2/3DNOW!ons-primeval?spectatoronly=1?numbots=16?quickstar
t =1?attractcam=1 -benchmark -seconds=120 -ini=default.ini -exec=../Benchmark/Stuff/botmatchexec.txt7.255768 / 60.568241 / 141.303101 fps rand[1543912059]
Score = 58.373878
And for 64-bit:
UT2004 Build UT2004_Build_[2004-11-11_10.48]
x86-64 Linux
Unknown processor @ 2400 MHz
GeForce FX 5700 Ultra/AGP/SSE2ons-primeval?spectatoronly=1?numbots=16?quickstar
t =1?attractcam=1 -benchmark -seconds=120 -ini=default.ini -exec=../Benchmark/Stuff/botmatchexec.txt7.701819 / 43.102283 / 93.125053 fps rand[707662726]
Score = 43.058800
The 32-bit version got considerably better framerates, which provides a smoother game experience. But do you notice something missing in the system information block? The 3DNOW! multimedia extensions are not being used on the 64-bit system. I tried recompiling the entire operating environment with the 3dnow USE flag, but it still didn’t register in Unreal Tournament. Hopping over to an Opteron system with a different Nvidia card, I found the same results — no 3DNOW! extensions in the 64-bit version of UT2004.
I don’t know if the absence of AMD’s multimedia extensions are the cause of the lower framerates, and I don’t know what part of the equation is to blame for this, but it’s safe to assume that something is not as it should be with the 64-bit software, since 3DNOW! is part of the AMD64 instruction set architecture.
X.org compile time
You do a lot of compiling on a Gentoo Linux system, and the same can be said of FreeBSD and other operating systems that are source-based or have a Ports-like infrastructure. To test compiler speed, I ran emerge --fetchonly xorg-x11
, which retrieves all of the X.org source code (a total of nine files). When it finished, I ran time emerge xorg-x11
and recorded the compile time. The first number in the table is the total time the compile took to complete; the second number is the time the entire build took to execute; and the third number is the time consumed by system overhead during the compilation procedure.
32-bit | 64-bit |
26min 39sec real | 32min 16sec real |
21min 39sec user | 22min 7sec user |
4min 10sec system | 9min 23sec system |
The 32-bit system compiled X.org faster than its 64-bit counterpart. The real time-killer looks like system overhead. Both systems used GCC 3.4.3-r1 and Linux kernel 2.6.11-gentoo-r7. Again, I don’t know if any single factor is to blame, or if there are several contributors to the inferior 64-bit performance.
According to a benchmark test performed last year, GCC can have a profound effect on the speed of generated code, especially for AMD64 systems. FreeBSD’s David O’Brien pointed out in this email regarding a previous benchmarking project that GCC compile time performance is not truly the focus of GCC development — the performance of the compiled binary is all that matters. In our Gentoo benchmark, we’re in a sense testing the speed of the code that GCC compiles. In the X.org compile test, we’re also testing the speed of the compiler itself, which is not so much an indication of GCC’s quality as it is the time you will be spending compiling programs. In this case, it seems that AMD64 users will endure longer compile times with this version of GCC. This does not ignore architecture-specific hand-coded assembly optimizations that no doubt benefits one or both architectures in the above tests.
Conclusions
While everyday “Internet and email” desktop performance of a 64-bit operating system may not be much different from that of a 32-bit platform, CPU- and memory-intensive applications see significantly enhanced performance. 3D gaming performance could suffer from not-yet-perfect Nvidia drivers (there were two newer “unstable” versions of the Nvidia driver in Portage at the time of this writing) and 64-bit game binaries that are still experimental. 64-bit gaming is, after all, a new thing to PC game developers.
64-bit operating systems may not be practical for simple desktop use at this point, partially because of some of the hassles in setting them up, and partially because they offer little performance increase for most desktop applications. But the advantage of running a Web or email server is obvious when you look at the OpenSSL and MySQL results, assuming you use those technologies.
Sometimes the purpose of a benchmarking project is to show which squeaky wheels need the grease. This benchmarking project has shown that there’s still a long way to go for AMD64-specific optimizations in the GNU/Linux world.