Difference between revisions of "Installing A Cluster"

From New IAC Wiki
Jump to navigation Jump to search
Line 9: Line 9:
 
==OS installation==
 
==OS installation==
 
Normal server Linux install with the following packages:
 
Normal server Linux install with the following packages:
* ssh server
+
* openssh-server
 
* tftpd-hpa
 
* tftpd-hpa
 
* dhcp3-server
 
* dhcp3-server
 
* nfs-kernel-server
 
* nfs-kernel-server
 
* debootstrap
 
* debootstrap
 
 
* libpmi
 
* libpmi
 
* mpich2
 
* mpich2
 
* slurm-llnl
 
* slurm-llnl
* syslinux?
+
* slurm-llnl-slurmdbd
 
+
* syslinux
Grab pxelinux from the web
 
  
 
===Setting up a chroot for the node root===
 
===Setting up a chroot for the node root===
The chroot is a directory on the head nodes that hosts the root filesystem that the slave nodes will use.
+
# Boot to a live CD and mount the root drive as /new
A chroot file system is installed with '''debootstrap''' and can be chrooted into so it acts like a standard install.
+
# Copy the root fs to a chroot '''rsync -av /new/ /new/nodes/precise/ --exclude /nodes'''
mkdir -p /nodes/lucid
 
debootstrap lucid /nodes/lucid/ http://backup.iac.isu.edu:9999/ubuntu/
 
cp /etc/apt/sources.list /nodes/lucid/etc/apt/
 
cp /etc/resolv.conf /nodes/lucid/resolv.conf
 
chroot /nodes/lucid/
 
apt-get update
 
locale-gen en_US.UTF-8
 
dpkg-reconfigure tzdata
 
 
 
More information about [https://wiki.ubuntu.com/DebootstrapChroot chroots], especially helpful locale setup.
 
 
 
In the chroot we want to install the following packages:
 
* slurm-llnl
 
* munge
 
* nfs-common
 
* openssh-client
 
* openssh-server
 
  
 
===NFS===
 
===NFS===

Revision as of 20:42, 26 April 2012

Network configuration

Head nodes works as NAT for slave nodes

eth0 connects to outside world eth1 is internal

Internal network is 10.200.0.0/255.255.0.0

OS installation

Normal server Linux install with the following packages:

  • openssh-server
  • tftpd-hpa
  • dhcp3-server
  • nfs-kernel-server
  • debootstrap
  • libpmi
  • mpich2
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • syslinux

Setting up a chroot for the node root

  1. Boot to a live CD and mount the root drive as /new
  2. Copy the root fs to a chroot rsync -av /new/ /new/nodes/precise/ --exclude /nodes

NFS

On the head node, add the following to /etc/exports

/nodes/lucid    10.200.0.0/24(ro,async,no_root_squash,no_subtree_check)
/home           10.200.0.0/24(rw,async,root_squash,no_subtree_check)

see man 5 exports for more information. There are also more nfs settings in

/etc/default/nfs-common 
/etc/default/nfs-kernel-server

Netbooting

Setting up dhcp

Edit /etc/dhcp3/dhcpd.conf as follows:

 # The ddns-updates-style parameter controls whether or not the server will
 # attempt to do a DNS update when a lease is confirmed. We default to the
 # behavior of the version 2 packages ('none', since DHCP v2 didn't
 # have support for DDNS.)
 ddns-update-style none;

 # option definitions common to all supported networks...
 option domain-name "iac.isu.edu";
 #note the comma between dns servers
 option domain-name-servers 134.50.254.5, 134.50.57.57;

 #these short times are for testing only
 default-lease-time 60;
 max-lease-time 120;

 # If this DHCP server is the official DHCP server for the local
 # network, the authoritative directive should be uncommented.
 authoritative;

 # This is a very basic subnet declaration.
 subnet 10.0.200.0 netmask 255.255.255.0  
 {
  range 10.0.34.1 10.0.34.200; #we really don't even need a range
  option routers 10.0.200.1;
  filename "pxelinux.0";
 }
 #eth0 on brems2
 host brems2_0
 {
  hardware ethernet 00:50:45:5C:10:54;
  fixed-address 10.0.200.2;
  option host-name "brems2";
 }


Edit /etc/default/dhcp3-server

INTERFACES=eth1

This will avoid dhcp serving on the outside network!

service dhcp3-server start

Setting up tftp

Edit /etc/default/tftpd-hpa

RUN_DAEMON="yes" #had problems with inetd in the past
OPTIONS="-l -a 10.200.0.1 -s /var/lib/tftpboot"

Set up the filesystem to over pxe/tftpd:

mkdir -p /var/lib/tftpboot/pxelinux.cfg
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/

Make a file similiar to the following as /var/lib/tftpboot/boot.msg

Booting Brems!!

Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/

TIMEOUT 5
DISPLAY boot.msg


Setting up pxelinux

Testing

Scheduler installation

The Quick Start Administrator Guide is very helpful.

  1. Install
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • slurm-llnl-doc
    • mkdir /var/run/slurm-llnl

Munge

Munge is an authentication framework recommended by slurm. All the configuration it needs is:

root@brems:# /usr/sbin/create-munge-key
Generating a pseudo-random key using /dev/urandom completed.
root@brems:# /etc/init.d/munge start

Adding a node to the cluster

  1. Set node to PXE boot
  2. Add a new entry to /etc/dhcp3/dhcpd.conf
  3. Reload dhcpd
  4. Add new host to /etc/hosts file
  5. Copy /etc/hosts to /nodes/lucid/etc/hosts
  6. Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
  7. Boot the node
  8. Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
  9. Add the host to cluster ssh
  10. Add the node to slurm config

Customising initrd

Sometimes a customised initrd is necessary.

Extracting

Create a directory to extract into:

mkdir init_test; cd init_test

Extract an existing initrd

gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv

Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).

Then package up the directory into a new initrd:

find ./ | cpio -ov -H newc | gzip > ../initrd.new