| BitTorrent for Clusters |
 |
 |
Introduction
When staging large input files to the many nodes of a cluster, the
traditional network copy approach (scp, rcp, NFS) quickly overwhelms
the machine holding the original copy of the input file. BitTorrent, a peer-to-peer file
distribution protocol is naturally suited for file distribution when
the file is large, and many clients want the file simultaneously -
precisely the scenario when a staging large input files for a
distributed cluster job.
This infrastructure make it possible to simultaneously distribute 40
copies of a 4Gb file to the nodes of a Linux cluster without bringing
down the local network or the submit machine.
Installation
- Download the BitTorrent source from the SourceForge CVS
repository. The patch assumes version 3.4.2 (CVS tip as of December 2004).
cvs -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent login
cvs -z3 -d:pserver:anonymous@bittorrent.cvs.sourceforge.net:/cvsroot/bittorrent co BitTorrent
- Download the patch and support files from http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz.
cd BitTorrent
wget http://www.umiacs.umd.edu/~nedwards/research/downloads/script.tar.gz
- Unpack patch and support files, and apply patch.
gunzip -c script.tar.gz | tar xovf -
patch -p0 < script.patch
- Install in suitable location. I use
$HOME as the
prefix argument, which installs files in $HOME/bin and
$HOME/lib/python2.4/site-packages.
python setup.py install --prefix=$HOME
- Make sure your path contains the binary install location,
$HOME/bin in my case; and the environment variable
PYTHONPATH contains the library install location,
$HOME/lib/python2.4/site-packages in my case.
Wrapper Scripts
The wrapper scripts are very simple. They make distributing files to a
cluster as simple as using scp.
- btinit.sh [ -D n [ -E ] ] file1 ... fileN
- Starts the BitTorrent tracker server, if necessary, and a single
seeding client per file. A 4-tuple of information is output for each
input file: filename, server, port, and info-hash. Usually run on the
cluster job submission machine. Computes a checksum of the file to
avoid creating multiple torrents for the same file. Will also handle
directories, in which case the checksum is not computed - if a seeder
is serving a torrent with the same name is found, it is killed; and if
a torrent with the same name is found the torrent is recomputed.
The option -D instructs the seeding client to terminate after n
clients have downloaded the file or directory. The -E option instructs
the tracker to terminate when no clients have contacted the tracker in
the last 10 minutes. Use these options with caution - they have not been
particularly well tested. The -E option, in particular, may terminate
the tracker if only the seeding client is running, since the seeding
client has a complete copy of the file or directory, and doesn't need
to contact the tracker to find out where to get torrent chunks.
- bt.sh filename server port info-hash
- Downloads the file tracked by the server server on port
port with hash info-hash to the local file
filename. Must be idle for 1 minute before exiting, to give its
peers a chance to ask for pieces of the file. Usually run by each node
before running its job to download the required input files. Failure
to download the file is indicated by non-zero exit status.
- btfini.sh file1 ... fileN
- btfini.sh -a
- Kills the seeding client for each listed file (or all files with
-a) and removes the torrent file from the tracker's directory. If no
torrent files remain to be tracked, the tracker is killed and all
tracker files are removed from /tmp. Run this on the same machine as
btinit.sh.
- btstatus.sh
- Print a table showing various statistics for each of the torrents
being tracked by the tracker. Run this on the same machine as
btinit.sh.
Example
- Initiate the torrent on genesub00 for the files nraa.fasta, nrdb.fasta and msdb.fasta.
genesub00> ls -l nraa.fasta nrdb.fasta msdb.fasta
-rw-rw-r-- 1 nedwards nedwards 651374714 Nov 25 19:41 msdb.fasta
-rw-rw-r-- 1 nedwards nedwards 1100528784 Nov 25 19:42 nraa.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Nov 25 19:45 nrdb.fasta
genesub00> btinit.sh nraa.fasta nrdb.fasta
Started torrent tracker server...
Make torrent file for nraa.fasta...
Make torrent file for nrdb.fasta...
Started torrent uploader for nraa.fasta...
Started torrent uploader for nrdb.fasta...
nraa.fasta genesub00.umiacs.umd.edu 7878 3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta genesub00.umiacs.umd.edu 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca
genesub00> btstatus.sh
name size complete dling dled transferr
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 0 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 1.73GB 1/2 0/0 0/0 0B
genesub00> btinit.sh msdb.fasta nrdb.fasta
Torrent tracker server already running...
Make torrent file for msdb.fasta...
File nrdb.fasta torrent already made and file hasn't changed
Started torrent uploader for msdb.fasta...
Torrent uploader already running for nrdb.fasta...
msdb.fasta genesub00 7878 26a9afd2c998e94bc25f8798394e4b8eb3845225
nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 0 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 1/3 0/0 0/0 0B
- Download the file nrdb.fasta on genesub01 and genesub02.
genesub01> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &
genesub02> bt.sh nrdb.fasta genesub00 7878 538b7e44f5c53ef6806502589d0f291b9aa8efca &
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 1 2 0 0B
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 1/3 2/2 0/0 0B
- Wait for completion.
genesub01> ls -l nrdb.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta
genesub02> ls -l nrdb.fasta
-rw-rw-r-- 1 nedwards nedwards 760915396 Dec 21 16:10 nrdb.fasta
genesub00> btstatus.sh
name size complete dling dled transferr
msdb.fasta 621MB 1 0 0 0B
26a9afd2c998e94bc25f8798394e4b8eb3845225
nraa.fasta 1.02GB 1 0 0 0B
3b89f9eb2b8ba2e464e7d2184b523f73896e5568
nrdb.fasta 725MB 3 0 2 1.41GB
538b7e44f5c53ef6806502589d0f291b9aa8efca
Total 2.34GB 3/5 0/0 2/2 1.41GB
- Cleanup.
genesub00> btfini.sh -a
genesub00> btstatus.sh
Notes
There are a few things going on behind the scenes that are important to know about.
Temporary Files
The btinit.sh creates torrents and various other files the
tracker server uses in /tmp. All files used by a
particular tracker match the template /tmp/bt*.UUUU.PPPP,
where UUUU is the username of the user running the tracker and PPPP is
the port number it is bound to.
Tracker Server Port
The tracker server needs to bind to some port to listen to requests
from clients. The temporary files in /tmp are used to
indicate whether or not a port is currently in use by some user. The
btinit.sh script will search for an unused port before starting
the tracker server. The port used by the tracker server can be
set explicitly by setting the environment variable BTTPORT.
Permitted IP Addresses
The tracker server and BitTorrent clients take a regular expression
argument which constrains the IP addresses that will be permitted to
contact the tracker server or client. Currently, the scripts use a
regular expression allows any valid IP address to connect to the tracker and downloading and seeding clients.
The regular expression can be
set explicitly in the bt.sh and btinit.sh scripts or by setting the environment variable
BTIPRE. Note however, that this security measure is only
as good as the firewalls' ability to block IP spoofing attacks.
Other Assumptions
The scripts assume the following typical unix/linux programs are
available in the user's path: python, sh, ls, wc, whoami, uname, ps, cat,
test (as '['), echo, sed, kill, sleep, rm, wget, mkdir, cksum, expr,
awk, fgrep; and that they behave in relatively standard ways.
|