The Debian Universe


/read/online

^contents

19: Package Caching: Updating Multiple Machines

One of the areas Debian is very strong is installations involving multiple machines such as computer labs, server farms, and compute clusters. Many of these sorts of installations have the same or similar packages installed on every machine, and when a package is upgraded on one machine it's also upgraded on the others.

In situations where you need to manage many machines Apt can be a real lifesaver since it makes the task of upgrading or applying security patches to each machine very simple. And since Debian machines can generally just go right on being upgraded year after year without re-installing from scratch with every upgrade, most of the packages you install won't come off a CD - they come from a Debian package server via the Internet.

That sounds good until you start adding up all the bandwidth required. Doing an update on one machine may require a lot of packages to be downloaded, and doing it on many machines means downloading them once per machine.

Wouldn't it be nice if a package installed on one machine could also be installed on other local machines without Apt downloading it all over again?

Package Storage Options

Of course I wouldn't be writing this if there wasn't a solution, and in typical Linux fashion there are many ways to solve the problem.

Running A Local Mirror

If you run a truly large number of Debian machines it may be worthwhile running a local software mirror of your own. That can be a lot of work though, and unless you are careful about setting up the mirror it could result in actually using more bandwidth than you save. It's also a fair bit of work to get running initially.

NFS Mounting /var/cache/apt

When you install a package on a Debian machine it's stored locally in a directory called '/var/cache/apt'. When a package is requested Apt looks first in this directory to see if it's already cached. As a result one primitive approach to sharing packages locally is to have one computer share its cache directory on the network, and have all the other computers mount that directory. There can be problems with this approach though, such as file locking issues. It's not a widely used solution.

Moving Packages

Rather than sharing a common '/var/cache/apt' directory, another approach is to leave each machine running their own local cache directory but prime them all with packages by copying them from one machine to another. Tools to make this easier include 'apt-move', but it's not transparent to the end user and can end up using a lot of disk space unnecessarily since all packages are duplicated on all machines.

Traditional HTTP Proxy

Apt generally uses HTTP to fetch packages from package servers: as a result it's pretty easy to use a normal HTTP proxy like Squid to cache packages locally. However, Squid is designed to cache lots of small items while software packages are usually a few large items. You may find Squid drops large packages from cache, while those are the very packages most important to store for re-use. To make Apt use a proxy you can configure the option permanently in the config file (see 'man apt.conf' for details) or just export the 'http_proxy' environment variable by doing something like 'export http_proxy=proxy.example.com:8080' prior to running Apt.

Dedicated Caching Systems

The most elegant solution is local package caching with a dedicated caching system. Running a local package cache is very simple and can even provide benefits if you only run a couple of machines. There are a number of systems designed specifically to work with Apt, including apt-cacher, apt-proxy, and apt-cached.

The one I'll cover in detail is apt-cacher.

Apt-cacher Background

Apt-cacher was originally written by Nick Andrew to maintain two Debian boxes on a slow modem connection. He got sick of having to download all the packages twice and none of the available options seemed to do quite what he wanted, so he decided to write a system specifically designed as a cache for Apt.

Since then Apt-cacher has seen major development and is now a very comprehensive package caching system. It even works with Fink on MacOS X, and with Apt on Red Hat and other distributions.

Apt-cacher Structure

Apt-cacher is different to many other caching systems because rather than being a stand-alone program it runs as a CGI under Apache. That has a number of advantages, such as making it small and simple and therefore more robust because it doesn't need its own protocol-handling code, and also very flexible because you can use Apache's built in access control mechanism in case you want to only let certain machines use your cache.

Apt-cacher itself only needs to be set up on one machine, the one you decide to use as your local cache. Then all computers on your local network have a setting modified to tell them to direct all package requests to your cache machine rather than directly to the package server.

Apt-cacher works by intercepting requests for packages and fetching them on behalf of local machines, while simultaneously storing them on disk in case other machines later ask for the same package. Once set up there is no need to do anything differently to install packages: just install a package on one machine with Apt or Synaptic and it comes off the Internet, then when you install it on other machines it comes from the local cache. Easy!

Installing Apt-cacher

Getting Apt-cacher working involves two parts: setting up the cache server itself, and then telling your local machines to use it.

Server Setup

First select a machine to use as your cache server. Apt-cacher puts very little load on the system so you can safely run it on just about any machine you have available, even one that's normally used as a workstation. Probably the most critical things are to make sure your cache server has a fixed IP address so other computers on your network can find it, and that there is plenty of disk space because the cache itself can become quite large. Disk usage depends on how many packages you have cached, so the greater variety of software you run the more space you will need. A few hundred megabytes is common while large caches may need several gigabytes.

On the machine nominated to be your cache server issue the command

apt-get install apt-cacher

as root, and Apt-cacher will be installed and set up for you. It will also install Apache plus a couple of other packages unless they were already in place. Then just restart Apache by typing

/etc/init.d/apache restart

and you're done. You can test that the installation worked properly by opening a web browser and going to the address 'http://[cache.example.com]/apt-cacher', where [cache.example.com] is the hostname or IP address of your cache server. If all went well you'll see an information page generated by Apt-cacher that looks something like this:

Client Setup

Client machines don't need to have anything installed to use Apt-cacher: they just need to have their list of package sources modified so they send their package requests to the cache server.

The list of package sources is stored in a file called '/etc/apt/sources.list'. If you open this file in a text editor such as Vim or Anjuta you'll see a number of lines that look something like this:

deb http://ftp.au.debian.org/debian unstable main contrib non-free

Each HTTP entry needs to have the address of your cache server prepended, so the example above becomes something like this:

deb http://cache.example.com/apt-cacher/ftp.au.debian.org/debian unstable main contrib non-free

Once you've done that, do

apt-get update

to tell your machine to update its package list, and you're set. Any packages you install from then on will come via the cache server.

Configuration Options

At this point you'll probably have a working installation of Apt-cacher without touching a single config setting on the cache server. However, Apt-cacher has a number of options you can set by editing the file '/etc/apt-cacher/apt-cacher.conf'. You don't need to restart anything after editing the file, all changes are immediate.

The config file is very well commented so for all the gory details just read the file itself, but for reference the items you can set include:

admin_email

The email address displayed in traffic reports and when problems occur.

generate_reports

If this option is set to 1 Apt-cacher will generate a daily traffic report. If set to 0 it won't.

cache_dir

Specifies the location on disk to use for storing cached packages and package lists.

logfile

Specifies the location of the log file used to record package requests. The log file is used by Apt-cacher to generate traffic reports.

errorfile

Specifies the location of the error log. Very useful when debugging problems.

expire_hours

How many hours to keep package lists around before they are deleted from the cache. Note that this doesn't directly affect package expiry: packages are expired on a special algorithm that determines if they still exist in available package lists.

http_proxy

The address and port of an HTTP proxy to use. This allows you to tell Apt-cacher to use an external proxy server if you've got one available. This probably won't save you much bandwidth, but some ISPs give you discounted data rates if you use their proxies.

use_proxy

Specifies whether to activate the http_proxy setting above: 1 is active, 0 is inactive. Make sure you set this to 1 if you want to use http_proxy.

debug

Turns debug mode on or off. When this is turned on Apt-cacher will store heaps of internal information in the errorlog for every package that's requested. This is really only useful for developers, and in normal use should be left at 0 (off).

Reporting and Cleaning

Apt-cacher does scheduled maintenance on itself to generate traffic reports and clean old packages out of the cache.

Traffic Reports

Apt-cacher can be configured to generate traffic reports daily. Generating reports is extremely fast even with a high-traffic cache and only happens once per day, so this option can safely be turned on without impacting performance. To access the report just point your browser at 'http://[cache.example.com]/apt-cacher/report', and you should see something like this:

Cache Cleaning

Over time your cache will start to fill up with old packages that aren't required anymore. Apt-cacher runs a cleanup script every 24 hours to find old packages that are no longer referenced by package lists and delete them from the cache.

While I've covered a lot of options here, you generally won't need to care about any of them: in most situations you can install Apt-cacher and it will just work.

Copyright 2003-2004Jonathan Oxer. All rights reserved.
-:Site powered by Internet Vision Technologies:-