Keep Internet junk at bay with content filters

78

Author: Shashank Sharma

Each day, I come across someone on the blogosphere complaining about the design of a Web site. Some don’t like screaming text, others don’t like banners, and still others hate ads. My pet peeves include pop-ups and unwanted JavaScript and cookies. Removing such junk can speed up your Internet connection, since you’re no longer wasting bandwidth downloading data you find useless. Here are some tools you can use to filter the content a Web site renders to you.

Privoxy

Privoxy is a standalone application full of impressive features. It’s a breeze to install. Its default settings are ideal for most users. Fedora and Ubuntu users can respectively install it with the commands yum install privoxy and sudo apt-get install privoxy, or you can grab the source tarball and install it with the commands ./configure, make, make install. Once installed, Privoxy will bind to localhost (127.0.0.1) at port 8118. You can choose a different port and network interface during the manual installation, or specify it under section 4.1 of the /etc/privoxy/config file.

You need to inform your browser about Privoxy before you begin using it. In the Firefox preferences dialog box, click Advanced and then click the Network tab, and click the Settings button under Connection. Choose Manual proxy configuration and fill in 127.0.0.1 and port 8118 for HTTP proxy. Make sure localhost or 127.0.0.1 are not listed in the “No proxy for” field. You can now access the Privoxy browser interface at http://p.p/

Privoxy has two sets of configuration files. You can filter HTML, CSS, JavaScript, and other content using the filter files. The default filter rules are defined in the /etc/privoxy/default.filter file. The other set of files, which are called action files, define what action Privoxy should take for each Web site it encounters. The rules for cookies, ads, and other objects are defined in action files. On a default installation, Privoxy disables banner ads based on size, all popup windows, and Google ads. The default action rules are contained in the /etc/privoxy/default.action file. All configuration files rely on regular expressions, so unless you are confident about what you are doing, do not edit the default.filter or the default.action files. Unfortunately you can’t edit the files through the browser interface.

Webcleaner

Webcleaner can do everything Privoxy can, and it also has an antivirus filter and can reduce images on a Web site to low-bandwidth JPEGs. Unlike Privoxy, the Webcleaner browser interface can be used to handle the configuration. However, Webcleaner makes users suffer through a demanding installation. It requires tools such as Python Image Libraries (PIL) and Clamav to be installed if you respectively wish to use the image compression and virus filter features. To apply content filtering to SSL-encrypted Web pages, you need to additionally install the Open-SSL and Python-openssl packages, in addition to Python 2.4 and Runit (a replacement for the init system), and although the software’s requirements don’t list it, I had to also install the python-devel package.

If you follow the installation instructions, you should be able to bring up the browser interface. Make sure your browser proxy settings point to 127.0.0.1 and port 8080 instead of the 8118 used for Privoxy. Open a terminal window, start Webcleaner with the command webcleaner, then point your browser to http://127.0.0.1:8080. You will see two passwords on your screen; edit the /usr/share/webcleaner/config/webcleaner.conf file and look for the adminpass="" line. Fill in the MD5 password generated by Webcleaner into this field. The other password you’ll use to log in to Webcleaner. Now you can restart Webcleaner. This time, visiting http://127.0.0.1:8080 will bring up a login screen.

I find the Webcleaner interface a little complicated. While it’s well-categorized and moving around is easy, when it comes to adding your own filter settings, Privoxy wins hands down. Like Privoxy, Webcleaner blocks ads and deanimates GIFs out of the box. But on a few sites, I found it had corrupted the text flow by removing the ads.

Dansguardian and Squid

A third alternative, and one of the most popular content filter setups, involves two applications. The actual filtering is performed by DansGuardian. It allows you to filter Web pages based on exact phrase matching, and also supports Platform for Internet Content Selection (PICS), which means you can filter pages with possible objectionable content. You can even configure DansGuardian to use third-party blacklists or use it to maintain one of your own.

Squid is a proxy server that supports HTTP and FTP, but it has limited support for other protocols, such as TLS and SSL. In this setup, Squid fetches Web pages and feeds them to DansGuardian. (You can configure DansGuardian to work with any proxy server.)

Once installed, Squid binds to port 3128 — but DansGuardian listens on port 8080. This means you need to use iptables to redirect all traffic to port 8080 as we’ve discussed in the past.

Final verdict

Both Webcleaner and DansGuardian are useful if you’re prone to virus attacks. Their only downside is that while Webcleaner has an uncomfortable configuration system, Dansguardian is not a standalone application. Privoxy, with its ease of use and impressive features, should suffice for any home user on most networks.

Categories:

  • System Administration
  • Internet & WWW