Difference between revisions of "Installing A Cluster"
Line 39: | Line 39: | ||
Edit /etc/dhcp3/dhcpd.conf as follows: | Edit /etc/dhcp3/dhcpd.conf as follows: | ||
<pre> | <pre> | ||
− | |||
− | |||
− | |||
− | |||
ddns-update-style none; | ddns-update-style none; | ||
− | |||
− | |||
option domain-name "iac.isu.edu"; | option domain-name "iac.isu.edu"; | ||
− | |||
option domain-name-servers 134.50.254.5, 134.50.57.57; | option domain-name-servers 134.50.254.5, 134.50.57.57; | ||
+ | authoritative; | ||
− | + | default-lease-time 600; | |
− | default-lease-time | + | max-lease-time 1200; |
− | max-lease-time | ||
− | |||
− | |||
− | |||
− | |||
− | |||
subnet 10.0.200.0 netmask 255.255.255.0 | subnet 10.0.200.0 netmask 255.255.255.0 | ||
{ | { | ||
range 10.0.34.1 10.0.34.200; #we really don't even need a range | range 10.0.34.1 10.0.34.200; #we really don't even need a range | ||
option routers 10.0.200.1; | option routers 10.0.200.1; | ||
+ | #address of the TFTP server, optional if the same as the dhcp server | ||
+ | next-server 10.0.200.1; | ||
filename "pxelinux.0"; | filename "pxelinux.0"; | ||
} | } | ||
Line 79: | Line 69: | ||
This will avoid dhcp serving on the outside network! | This will avoid dhcp serving on the outside network! | ||
− | service dhcp3-server | + | service dhcp3-server restart |
====Setting up tftp==== | ====Setting up tftp==== |
Revision as of 20:52, 26 April 2012
Network configuration
Head nodes works as NAT for slave nodes
eth0 connects to outside world eth1 is internal
Internal network is 10.200.0.0/255.255.0.0
OS installation
Normal server Linux install with the following packages:
- openssh-server
- tftpd-hpa
- dhcp3-server
- nfs-kernel-server
- debootstrap
- libpmi
- mpich2
- slurm-llnl
- slurm-llnl-slurmdbd
- syslinux
Setting up a chroot for the node root
- Boot to a live CD and mount the root drive as /new
- Copy the root fs to a chroot rsync -av /new/ /new/nodes/precise/ --exclude /nodes
NFS
On the head node, add the following to /etc/exports
/nodes/lucid 10.200.0.0/24(ro,async,no_root_squash,no_subtree_check) /home 10.200.0.0/24(rw,async,root_squash,no_subtree_check)
see man 5 exports for more information. There are also more nfs settings in
/etc/default/nfs-common /etc/default/nfs-kernel-server
Netbooting
Setting up dhcp
Edit /etc/dhcp3/dhcpd.conf as follows:
ddns-update-style none; option domain-name "iac.isu.edu"; option domain-name-servers 134.50.254.5, 134.50.57.57; authoritative; default-lease-time 600; max-lease-time 1200; subnet 10.0.200.0 netmask 255.255.255.0 { range 10.0.34.1 10.0.34.200; #we really don't even need a range option routers 10.0.200.1; #address of the TFTP server, optional if the same as the dhcp server next-server 10.0.200.1; filename "pxelinux.0"; } #eth0 on brems2 host brems2_0 { hardware ethernet 00:50:45:5C:10:54; fixed-address 10.0.200.2; option host-name "brems2"; }
Edit /etc/default/dhcp3-server
INTERFACES=eth1
This will avoid dhcp serving on the outside network!
service dhcp3-server restart
Setting up tftp
Edit /etc/default/tftpd-hpa
RUN_DAEMON="yes" #had problems with inetd in the past OPTIONS="-l -a 10.200.0.1 -s /var/lib/tftpboot"
Set up the filesystem to over pxe/tftpd:
mkdir -p /var/lib/tftpboot/pxelinux.cfg cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/
Make a file similiar to the following as /var/lib/tftpboot/boot.msg
Booting Brems!!
Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/
TIMEOUT 5 DISPLAY boot.msg
Setting up pxelinux
Testing
Scheduler installation
The Quick Start Administrator Guide is very helpful.
- Install
- slurm-llnl
- slurm-llnl-slurmdbd
- slurm-llnl-doc
- mkdir /var/run/slurm-llnl
Munge
Munge is an authentication framework recommended by slurm. All the configuration it needs is:
root@brems:# /usr/sbin/create-munge-key Generating a pseudo-random key using /dev/urandom completed. root@brems:# /etc/init.d/munge start
Adding a node to the cluster
- Set node to PXE boot
- Add a new entry to /etc/dhcp3/dhcpd.conf
- Reload dhcpd
- Add new host to /etc/hosts file
- Copy /etc/hosts to /nodes/lucid/etc/hosts
- Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
- Boot the node
- Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
- Add the host to cluster ssh
- Add the node to slurm config
Customising initrd
Sometimes a customised initrd is necessary.
Extracting
Create a directory to extract into:
mkdir init_test; cd init_test
Extract an existing initrd
gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv
Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).
Then package up the directory into a new initrd:
find ./ | cpio -ov -H newc | gzip > ../initrd.new