The audio system options in Linux can be a bit confusing. The projects (ALSA, OSS, ESD, aRts, JACK, and GStreamer, to name a few) all describe themselves in broad, similar terms, and the panoply of packages reads like a circular mix-and-match game — alsaplayer-esd, libesd-alsa, alsa-oss, alsaplayer-jack, gstreamer-alsa, gstreamer-esd, and so on. It can be difficult to tell how all the pieces fit together. Consider the package description for libasound2-plugins in Ubuntu, which says: “The ALSA library plugin ‘jack’ allows the ALSA library to play or capture via JACK (This should not be confused with the jackd output driver ‘alsa’, included in the jackd package, which allows the JACK daemon to play back via the ALSA library.” Yeah — how could anybody ever get those confused?
Sorting it out
This potential for confusion stems from multiple sources, including a shortage of user-friendly (as opposed to developer-friendly) documentation, and overlapping goals among several of the projects.
For instance, the Advanced Linux Sound Architecture (ALSA) project includes several distinct components. The set of kernel hardware drivers for audio cards is one, and the library that exposes the ALSA application programming interface (API) to software is another. You need the hardware drivers to get your sound card to produce audio, but any particular application may or may not use the library.
Or consider the KDE sound library aRts. It includes a sound server — the low-level daemon that accepts audio from apps and feeds it to the hardware driver — and higher-level functions like encoding and decoding various file and stream formats. Things are a little cleaner on the GNOME side. The Enlightened Sound Daemon (ESD) is the sound server, and a separate library (GStreamer) handles the codecs.
But the biggest source of confusion is that there are so many audio projects offering their own APIs, including the userspace ALSA library, aRts, ESD, and GStreamer. And as we have seen, each pair among them overlaps in different ways. In addition we have SDL and OpenAL for games, Open Sound System (OSS) for older general-purpose applications, and JACK for pro-level, low-latency operations.
Finally, some applications don’t rely on any external libraries for audio functionality, but do it all internally. Most notable among these are the full-featured media playback apps, such as Xine and MPlayer. They often handle everything from format decoding to multichannel demuxing within the app itself.
It’s easiest to explain where PulseAudio fits into the GNOME system because of that desktop environment’s separation between individual tasks. An application like Rhythmbox relies on GStreamer to decode sound files from compressed form into raw audio. GStreamer in turn passes the audio down to ESD, and ESD delivers it to the ALSA hardware driver.
In this situation, PulseAudio replaces ESD without affecting the rest of the pipeline. But another player might rely on the ALSA userspace library, which is not part of the previous example. Here you can insert PulseAudio into the pipeline, again right above the kernel-level hardware drivers. It adds an extra layer, but with it you enjoy the benefit of all of your audio passing through the same sound server.
And that’s the point: some apps are written to use the userspace ALSA API, some aRts, some JACK, some handle the audio internally. If you can reroute all of the audio through one handler, you get more control, fewer conflicts, and fewer surprises.
Sounds simple
The fastest way to get started with PulseAudio on your system is see if your distro provides packages, because many do. Although the next releases of Fedora and Ubuntu will use PulseAudio by default, they and other distros make packages available for the current releases, too. The PulseAudio wiki includes a list of which distros provide packages, including the development versions of Mandriva and openSUSE.
If your distro doesn’t provide pre-packaged binaries, you can download and compile PulseAudio yourself. But for most users, getting started is a matter of installing the packages and setting up the configuration options in your audio apps.
The PulseAudio wiki page entitled The Perfect Setup offers basic step-by-step instructions, but you may want to seek out distro-specific guides if your distro supplies the packages. Ubuntu does not yet provide its own instructions, so on my Ubuntu Feisty box, I had the most success when following the Debian HOWTO posted at forum.debian.net.
I started by installing all of the PulseAudio-related packages available through Synaptic. There are separate packages for connecting PulseAudio to audio frameworks such as GStreamer, ALSA, and JACK. The most important one is pulseaudio-esound-compat; it replaces the ESD package esound entirely, creating a dummy /usr/bin/esd that is actually a symbolic link to PulseAudio. With this in place, all applications that expect ESD are fooled into using PulseAudio automatically.
An important caveat is that PulseAudio creates its own group, and all users that want to use PulseAudio must be a member of it. Since most desktop users don’t spend much time thinking about groups or group permissions, this can be an easy step to overlook. Under Ubuntu, you change group membership by visiting System -> Administration -> Users and Groups. This launches the Users settings manager; you have to click on the Manage Groups button and bring up Properties window for the pulse, pulse-access, and pulse-rt groups separately in order to add yourself.
That done, I copied a sample /etc/asound.conf file from the Debian tutorial. This is an ALSA setup file, but it is not created for you because in most cases ALSA apps will run fine without it. But you need to create it in order to specify that ALSA apps should use PulseAudio as their default output.
Feel the beat
Using PulseAudio as a drop-in replacement for ESD, and setting up ALSA to use PulseAudio through /etc/asound.conf, will account for 90% of the audio needs of a regular desktop Linux session.
For the remaining apps — including those oddballs that handle all audio internally — the specific instructions at the PulseAudio wiki serve as an invaluable resource. For instance, most KDE apps use aRts, but you can set up aRts to use ESD, which is then rerouted automatically to PulseAudio. Amarok, XMMS, and some other media players let you choose a back-end engine from within their application preferences.
The only major holdouts at present are the audio editor Audacity and the video player MPlayer. Both require fetching development version code from their respective project pages. More importantly, because of the way Audacity controls the sound device, you must shut down PulseAudio before you use it. Changes are in the works, but have not made it to the stable code base yet.
Basic service is well and good, but PulseAudio earned a place in Linux distros’ default audio stacks by doing considerably more. PulseAudio can route audio from multiple sources to multiple sinks, both locally and over the network. You can use it to combine multiple soundcards into a single virtual device, to forward music from one PC to another, or to share a single microphone as an input between multiple PCs.
The best place to start learning is at the PulseAudio wiki. The latter half of the FAQ page is devoted to specific examples showing how to take advantage of PulseAudio’s advanced features. You specify most of the setup options in text configuration files, but you can test them out without a restart — at most, relaunching the pulseaudio executable itself will allow you to try out your new configuration.
The project also recommends looking into Lennart Poettering‘s GUI tools. They include PulseAudio-specific volume control, volume meter, and device choosers much like the default tools from GNOME or KDE — all of which can be accessed through a panel applet. There is also a more feature-filled PulseAudio Manager app, which allows you to connect and disconnect devices and modules on the fly.
There is no need to feel intimidated by all of the complexity built into PulseAudio. The project has done a good job of making the common tasks simple enough that they don’t demand any expertise. But in traditional open source fashion, the power to do more is there, too, right in front of your ears.