Installing A Cluster

From New IAC Wiki
Jump to navigation Jump to search

Network configuration

Head nodes works as NAT for slave nodes

eth0 connects to outside world eth1 is internal

Internal network is 10.200.0.0/255.255.0.0

OS installation

Normal server Linux install with the following packages:

  • openssh-server
  • tftpd-hpa
  • dhcp3-server
  • nfs-kernel-server
  • debootstrap
  • libpmi
  • mpich2
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • syslinux

Setting up a chroot for the node root

  1. Boot to a live CD and mount the root drive as /new
  2. Copy the root fs to a chroot rsync -av /new/ /new/nodes/precise/ --exclude /nodes

NFS

On the head node, add the following to /etc/exports

/nodes/lucid    10.200.0.0/24(ro,async,no_root_squash,no_subtree_check)
/home           10.200.0.0/24(rw,async,root_squash,no_subtree_check)

see man 5 exports for more information. There are also more nfs settings in

/etc/default/nfs-common 
/etc/default/nfs-kernel-server

Netbooting

Setting up dhcp

Edit /etc/dhcp3/dhcpd.conf as follows:

 ddns-update-style none;
 option domain-name "iac.isu.edu";
 option domain-name-servers 134.50.254.5, 134.50.57.57;
 authoritative;

 default-lease-time 600;
 max-lease-time 1200;

 subnet 10.0.200.0 netmask 255.255.255.0  
 {
  range 10.0.34.1 10.0.34.200; #we really don't even need a range
  option routers 10.0.200.1;
  #address of the TFTP server, optional if the same as the dhcp server
  next-server 10.0.200.1;
  filename "pxelinux.0";
 }
 #eth0 on brems2
 host brems2_0
 {
  hardware ethernet 00:50:45:5C:10:54;
  fixed-address 10.0.200.2;
  option host-name "brems2";
 }


Edit /etc/default/dhcp3-server

INTERFACES=eth1

This will avoid dhcp serving on the outside network!

service dhcp3-server restart

Setting up tftp

Edit /etc/default/tftpd-hpa

RUN_DAEMON="yes" #had problems with inetd in the past
OPTIONS="-l -a 10.200.0.1 -s /var/lib/tftpboot"

Set up the filesystem to over pxe/tftpd:

mkdir -p /var/lib/tftpboot/pxelinux.cfg
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/

Make a file similiar to the following as /var/lib/tftpboot/boot.msg

Booting Brems!!

Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/

TIMEOUT 5
DISPLAY boot.msg


Setting up pxelinux

Testing

Scheduler installation

The Quick Start Administrator Guide is very helpful.

  1. Install
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • slurm-llnl-doc
    • mkdir /var/run/slurm-llnl

Munge

Munge is an authentication framework recommended by slurm. All the configuration it needs is:

root@brems:# /usr/sbin/create-munge-key
Generating a pseudo-random key using /dev/urandom completed.
root@brems:# /etc/init.d/munge start

Adding a node to the cluster

  1. Set node to PXE boot
  2. Add a new entry to /etc/dhcp3/dhcpd.conf
  3. Reload dhcpd
  4. Add new host to /etc/hosts file
  5. Copy /etc/hosts to /nodes/lucid/etc/hosts
  6. Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
  7. Boot the node
  8. Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
  9. Add the host to cluster ssh
  10. Add the node to slurm config

Customising initrd

Sometimes a customised initrd is necessary.

Extracting

Create a directory to extract into:

mkdir init_test; cd init_test

Extract an existing initrd

gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv

Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).

Then package up the directory into a new initrd:

find ./ | cpio -ov -H newc | gzip > ../initrd.new