Installing A Cluster

From New IAC Wiki
Jump to navigation Jump to search

Network configuration

Head nodes works as NAT for slave nodes

eth0 connects to outside world eth1 is internal

Internal network is 10.200.0.0/255.255.0.0

OS installation

Normal server Linux install with the following packages:

  • ssh server
  • tftpd-hpa
  • dhcp3-server
  • nfs-kernel-server
  • debootstrap
  • libpmi
  • mpich2
  • slurm-llnl
  • syslinux?

Grab pxelinux from the web

Setting up a chroot for the node root

The chroot is a directory on the head nodes that hosts the root filesystem that the slave nodes will use. A chroot file system is installed with debootstrap and can be chrooted into so it acts like a standard install.

mkdir -p /nodes/lucid
debootstrap lucid /nodes/lucid/ http://backup.iac.isu.edu:9999/ubuntu/
cp /etc/apt/sources.list /nodes/lucid/etc/apt/
cp /etc/resolv.conf /nodes/lucid/resolv.conf
chroot /nodes/lucid/
apt-get update
locale-gen en_US.UTF-8
dpkg-reconfigure tzdata

More information about chroots, especially helpful locale setup.

In the chroot we want to install the following packages:

  • slurm-llnl
  • munge
  • nfs-common
  • openssh-client
  • openssh-server

NFS

On the head node, add the following to /etc/exports

/nodes/lucid    10.200.0.0/24(ro,async,no_root_squash,no_subtree_check)
/home           10.200.0.0/24(rw,async,root_squash,no_subtree_check)

see man 5 exports for more information. There are also more nfs settings in

/etc/default/nfs-common 
/etc/default/nfs-kernel-server

Netbooting

Setting up dhcp

Edit /etc/dhcp3/dhcpd.conf as follows:

 # The ddns-updates-style parameter controls whether or not the server will
 # attempt to do a DNS update when a lease is confirmed. We default to the
 # behavior of the version 2 packages ('none', since DHCP v2 didn't
 # have support for DDNS.)
 ddns-update-style none;

 # option definitions common to all supported networks...
 option domain-name "iac.isu.edu";
 #note the comma between dns servers
 option domain-name-servers 134.50.254.5, 134.50.57.57;

 #these short times are for testing only
 default-lease-time 60;
 max-lease-time 120;

 # If this DHCP server is the official DHCP server for the local
 # network, the authoritative directive should be uncommented.
 authoritative;

 # This is a very basic subnet declaration.
 subnet 10.0.200.0 netmask 255.255.255.0  
 {
  range 10.0.34.1 10.0.34.200; #we really don't even need a range
  option routers 10.0.200.1;
  filename "pxelinux.0";
 }
 #eth0 on brems2
 host brems2_0
 {
  hardware ethernet 00:50:45:5C:10:54;
  fixed-address 10.0.200.2;
  option host-name "brems2";
 }


Edit /etc/default/dhcp3-server

INTERFACES=eth1

This will avoid dhcp serving on the outside network!

service dhcp3-server start

Setting up tftp

Edit /etc/default/tftpd-hpa

RUN_DAEMON="yes" #had problems with inetd in the past
OPTIONS="-l -a 10.200.0.1 -s /var/lib/tftpboot"

Set up the filesystem to over pxe/tftpd:

mkdir -p /var/lib/tftpboot/pxelinux.cfg
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/

Make a file similiar to the following as /var/lib/tftpboot/boot.msg

Booting Brems!!

Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/

TIMEOUT 5
DISPLAY boot.msg


Setting up pxelinux

Testing

Scheduler installation

The Quick Start Administrator Guide is very helpful.

  1. Install
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • slurm-llnl-doc
    • mkdir /var/run/slurm-llnl

Munge

Munge is an authentication framework recommended by slurm. All the configuration it needs is:

root@brems:# /usr/sbin/create-munge-key
Generating a pseudo-random key using /dev/urandom completed.
root@brems:# /etc/init.d/munge start

Adding a node to the cluster

  1. Set node to PXE boot
  2. Add a new entry to /etc/dhcp3/dhcpd.conf
  3. Reload dhcpd
  4. Add new host to /etc/hosts file
  5. Copy /etc/hosts to /nodes/lucid/etc/hosts
  6. Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
  7. Boot the node
  8. Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
  9. Add the host to cluster ssh
  10. Add the node to slurm config

Customising initrd

Sometimes a customised initrd is necessary.

Extracting

Create a directory to extract into:

mkdir init_test; cd init_test

Extract an existing initrd

gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv

Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).

Then package up the directory into a new initrd:

find ./ | cpio -ov -H newc | gzip > ../initrd.new