JLdL 06Mar05.

In this directory you will find examples of disks partitioned for clusters
of compute nodes or of X11 terminals, as well as the corresponding outputs
of the command df, showing the actual occupations of each partition. These
examples are taken from actual production systems and their main technical
characteristics will also be given. There are 4 examples. In all cases the
disk shown contain both the system of the cluster server and the system of
all the nodes of the cluster, and the root, var and usr filesystems of the
nodes are hard-linked.

These 4 examples will give you a good idea of the actual disk space needed
for clusters of various sizes either of compute nodes or of X11 terminals.
Since there are 2 examples of clusters of compute nodes, and 2 examples of
clusters of X11 terminals, and since in either case the nodes have all the
same content in terms of packages, using each pair of examples you will be
able to figure out, by means of linear extrapolation, the disk occupations
for clusters of other sizes, of each type.

It is important to point out that disk space is not usually a problem, the
main limitation is the physical I/O throughput of a single disk, specially
in the case of X11 terminals. Therefore you should use SCSI disks, and not
IDE disks, unless the cluster is very small. Performance of Serial ATA for
this purpose has not yet been measured. The I/O performance estimates that
are mentioned are based on the 2.4 Linux kernels.  Changing to the new 2.6
kernels will very probably improve the I/O performance significantly.  The
use of the newer 15000 RPM SCSI disks would also help a lot.

Have the systems of the server and of the nodes on separate disks if it is
at all possible.  This will give you more effective I/O throughput than if
you put everything on a single disk. Using several disks on a RAID0 in the
case of the /var partition of the nodes is even better, it can take you to
the throughput limit of the SCSI bus, improving performance by a factor of
up to 10, proportionally to the number of disks used.

However, since disks are getting so large so fast, it may become difficult
to justify this approach financially, because there are no cheap and small
disks available that you can assemble in a RAID. If you are going to use a
mirror disk for backup purposes, consider swapping the original and backup
filesystems of the nodes on the two disks so that they will use mostly the
backup disk rather than the system disk.  This may double your actual disk
I/O throughput and maximum increase the number of nodes you can install.

Here is a short description of each of the files containing the examples:

server_pmc_18+1.cfdisk: partition table of a 18 GB disk for a cluster of
18 real plus one virtual compute nodes.

server_pmc_38+1.cfdisk: partition table of a 18 GB disk for a cluster of
38 real plus one virtual compute nodes.

server_trm_17+2.cfdisk: partition table of a 36 GB disk for a cluster of
17 real plus two virtual X11 terminals.

server_trm_30+1.cfdisk: partition table of a 36 GB disk for a cluster of
30 real plus one virtual X11 terminals.

servers_pmc.df: the output of the command df showing the disk occupations
for each one of the partitions of the two clusters of compute nodes.

servers_trm.df: the output of the command df showing the disk occupations
for each one of the partitions of the two clusters of X11 terminals.

It will be noted in these files that the usr filesystem of the nodes seems
to be a bit too large, in the case of X11 terminals, with relatively small
percentile occupation.  This is necessary and due to the fact that a large
amount of maneuvering space is needed when the security upgrade of a large
package or set of packages is made in a cluster with many nodes.  Consider
the largest package set, which currently is the Open Office suite. It will
unlink files requiring 160 MB of disk space for each node that is upgraded
within the cluster.  If there are 30 nodes, this will require about 4.8 GB
of extra space during the transient period, while the upgrade is under way
and before the tools to hard-link the files back together are used.  These
transient requirements are much smaller in the other filesystems, and also
in the case of the usr filesystem of compute nodes.

