Difference between revisions of "Installing A Cluster"
| (19 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
==Network configuration== | ==Network configuration== | ||
| − | Head nodes works as NAT for slave nodes | + | Head nodes works as NAT router for slave nodes |
eth0 connects to outside world | eth0 connects to outside world | ||
eth1 is internal | eth1 is internal | ||
| − | Internal network is 10.0 | + | Internal network is 10.200.0.0/255.255.0.0 |
==OS installation== | ==OS installation== | ||
Normal server Linux install with the following packages: | Normal server Linux install with the following packages: | ||
| − | * | + | * openssh-server |
* tftpd-hpa | * tftpd-hpa | ||
* dhcp3-server | * dhcp3-server | ||
* nfs-kernel-server | * nfs-kernel-server | ||
* debootstrap | * debootstrap | ||
| − | |||
* libpmi | * libpmi | ||
* mpich2 | * mpich2 | ||
* slurm-llnl | * slurm-llnl | ||
| + | * slurm-llnl-slurmdbd | ||
| + | * syslinux | ||
| − | + | ===Setting up a chroot for the node root=== | |
| + | * Boot to a live CD and mount the root drive as /new | ||
| + | * Copy the root fs to a chroot '''rsync -av /new/ /new/nodes/precise/ --exclude /nodes''' | ||
| + | * Enter the chroot '''chroot /nodes/precise/''' | ||
| + | * Change the following lines in ''/etc/initramfs-tools/initramfs.conf'' | ||
| + | <pre> | ||
| + | MODULES=netboot | ||
| + | BOOT=nfs | ||
| + | </pre> | ||
| + | The DEVICE= and NFSROOT= lines may also be of use if the tftpboot configuration isn't working | ||
| + | * Update the initramfs '''update-initramfs -c -k all''' | ||
| − | === | + | ===NFS=== |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | On the head node, add the following to /etc/exports | |
| + | /nodes/lucid 10.200.0.0/24(ro,async,no_root_squash,no_subtree_check) | ||
| + | /home 10.200.0.0/24(rw,async,root_squash,no_subtree_check) | ||
| + | see '''man 5 exports''' for more information. There are also more nfs settings in | ||
| − | + | /etc/default/nfs-common | |
| − | + | /etc/default/nfs-kernel-server | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | Reload the NFS server settings | |
| − | + | /etc/init.d/nfs-kernel-server force-reload | |
===Netbooting=== | ===Netbooting=== | ||
| Line 50: | Line 49: | ||
====Setting up dhcp==== | ====Setting up dhcp==== | ||
Edit /etc/dhcp3/dhcpd.conf as follows: | Edit /etc/dhcp3/dhcpd.conf as follows: | ||
| − | + | <pre> | |
| + | ddns-update-style none; | ||
| + | option domain-name "iac.isu.edu"; | ||
| + | option domain-name-servers 134.50.254.5, 134.50.57.57; | ||
| + | authoritative; | ||
| + | |||
| + | default-lease-time 600; | ||
| + | max-lease-time 1200; | ||
| + | |||
| + | subnet 10.0.200.0 netmask 255.255.255.0 | ||
| + | { | ||
| + | range 10.0.34.1 10.0.34.200; #we really don't even need a range | ||
| + | option routers 10.0.200.1; | ||
| + | #address of the TFTP server, optional if the same as the dhcp server | ||
| + | next-server 10.0.200.1; | ||
| + | filename "pxelinux.0"; | ||
| + | } | ||
| + | #eth0 on brems2 | ||
| + | host brems2_0 | ||
| + | { | ||
| + | hardware ethernet 00:50:45:5C:10:54; | ||
| + | fixed-address 10.0.200.2; | ||
| + | option host-name "brems2"; | ||
| + | } | ||
| + | </pre> | ||
| + | |||
| + | |||
Edit /etc/default/dhcp3-server | Edit /etc/default/dhcp3-server | ||
INTERFACES=eth1 | INTERFACES=eth1 | ||
This will avoid dhcp serving on the outside network! | This will avoid dhcp serving on the outside network! | ||
| − | service dhcp3-server | + | service dhcp3-server restart |
| − | |||
====Setting up tftp==== | ====Setting up tftp==== | ||
Edit /etc/default/tftpd-hpa | Edit /etc/default/tftpd-hpa | ||
| − | + | TFTP_USERNAME="tftp" | |
| − | + | TFTP_DIRECTORY="/var/lib/tftpboot" | |
| + | #Keep TFTP on the inside network | ||
| + | TFTP_ADDRESS="10.0.200.1:69" | ||
| + | TFTP_OPTIONS="--secure" | ||
| + | |||
| + | Set up the filesystem to boot using pxe and tftpd: | ||
| + | mkdir -p /var/lib/tftpboot/pxelinux.cfg | ||
| + | cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/ | ||
| + | |||
| + | Make a file similiar to the following as /var/lib/tftpboot/boot.msg | ||
| + | Booting Brems!! | ||
| + | |||
| + | Put the following in '''/var/lib/tftpboot/pxelinux.cfg/default'''. | ||
| + | The files are referenced relative to /var/lib/tftpboot/ | ||
| + | TIMEOUT 5 | ||
| + | DISPLAY boot.msg | ||
| + | DEFAULT vmlinuz | ||
| + | APPEND root=/dev/nfs initrd=initrd.img netboot=nfs nfsroot=10.0.200.1:/nodes/precise,nolock,ro nomodeset | ||
| + | Set up a kernel and initrd in tftpboot | ||
| + | cp /nodes/precise/boot/vmlinuz-3.2.0-24-generic /var/lib/tftpboot/vmlinuz | ||
| + | chmod a+r /var/lib/tftpboot/vmlinuz | ||
| + | cp /nodes/precise/boot/initrd.img-3.2.0-24-generic /var/lib/tftpboot/initrd.img | ||
| − | ====Setting up | + | ====Setting up pxelinux==== |
====Testing==== | ====Testing==== | ||
| Line 82: | Line 127: | ||
Generating a pseudo-random key using /dev/urandom completed. | Generating a pseudo-random key using /dev/urandom completed. | ||
root@brems:# /etc/init.d/munge start | root@brems:# /etc/init.d/munge start | ||
| + | |||
| + | ==Adding a node to the cluster== | ||
| + | #Set node to PXE boot | ||
| + | #Add a new entry to /etc/dhcp3/dhcpd.conf | ||
| + | #Reload dhcpd | ||
| + | #Add new host to /etc/hosts file | ||
| + | #Copy /etc/hosts to /nodes/lucid/etc/hosts | ||
| + | #Create a new var '''cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost''' | ||
| + | #Boot the node | ||
| + | #Add the ssh key to system-wide known-hosts '''ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts''' | ||
| + | #Add the host to cluster ssh | ||
| + | #Add the node to slurm config | ||
| + | |||
| + | ==Customising initrd== | ||
| + | Sometimes a customised initrd is necessary. | ||
| + | ====Extracting==== | ||
| + | Create a directory to extract into: | ||
| + | mkdir init_test; cd init_test | ||
| + | Extract an existing initrd | ||
| + | gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv | ||
| + | Edit the file tree as needed. You can add files (modules for instance), | ||
| + | or editing boot script (/init is run by default). | ||
| + | |||
| + | Then package up the directory into a new initrd: | ||
| + | find ./ | cpio -ov -H newc | gzip > ../initrd.new | ||
Latest revision as of 21:09, 6 August 2012
Network configuration
Head nodes works as NAT router for slave nodes
eth0 connects to outside world eth1 is internal
Internal network is 10.200.0.0/255.255.0.0
OS installation
Normal server Linux install with the following packages:
- openssh-server
- tftpd-hpa
- dhcp3-server
- nfs-kernel-server
- debootstrap
- libpmi
- mpich2
- slurm-llnl
- slurm-llnl-slurmdbd
- syslinux
Setting up a chroot for the node root
- Boot to a live CD and mount the root drive as /new
- Copy the root fs to a chroot rsync -av /new/ /new/nodes/precise/ --exclude /nodes
- Enter the chroot chroot /nodes/precise/
- Change the following lines in /etc/initramfs-tools/initramfs.conf
MODULES=netboot BOOT=nfs
The DEVICE= and NFSROOT= lines may also be of use if the tftpboot configuration isn't working
- Update the initramfs update-initramfs -c -k all
NFS
On the head node, add the following to /etc/exports
/nodes/lucid 10.200.0.0/24(ro,async,no_root_squash,no_subtree_check) /home 10.200.0.0/24(rw,async,root_squash,no_subtree_check)
see man 5 exports for more information. There are also more nfs settings in
/etc/default/nfs-common /etc/default/nfs-kernel-server
Reload the NFS server settings
/etc/init.d/nfs-kernel-server force-reload
Netbooting
Setting up dhcp
Edit /etc/dhcp3/dhcpd.conf as follows:
ddns-update-style none;
option domain-name "iac.isu.edu";
option domain-name-servers 134.50.254.5, 134.50.57.57;
authoritative;
default-lease-time 600;
max-lease-time 1200;
subnet 10.0.200.0 netmask 255.255.255.0
{
range 10.0.34.1 10.0.34.200; #we really don't even need a range
option routers 10.0.200.1;
#address of the TFTP server, optional if the same as the dhcp server
next-server 10.0.200.1;
filename "pxelinux.0";
}
#eth0 on brems2
host brems2_0
{
hardware ethernet 00:50:45:5C:10:54;
fixed-address 10.0.200.2;
option host-name "brems2";
}
Edit /etc/default/dhcp3-server
INTERFACES=eth1
This will avoid dhcp serving on the outside network!
service dhcp3-server restart
Setting up tftp
Edit /etc/default/tftpd-hpa
TFTP_USERNAME="tftp" TFTP_DIRECTORY="/var/lib/tftpboot" #Keep TFTP on the inside network TFTP_ADDRESS="10.0.200.1:69" TFTP_OPTIONS="--secure"
Set up the filesystem to boot using pxe and tftpd:
mkdir -p /var/lib/tftpboot/pxelinux.cfg cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/
Make a file similiar to the following as /var/lib/tftpboot/boot.msg
Booting Brems!!
Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/
TIMEOUT 5 DISPLAY boot.msg DEFAULT vmlinuz APPEND root=/dev/nfs initrd=initrd.img netboot=nfs nfsroot=10.0.200.1:/nodes/precise,nolock,ro nomodeset
Set up a kernel and initrd in tftpboot
cp /nodes/precise/boot/vmlinuz-3.2.0-24-generic /var/lib/tftpboot/vmlinuz chmod a+r /var/lib/tftpboot/vmlinuz cp /nodes/precise/boot/initrd.img-3.2.0-24-generic /var/lib/tftpboot/initrd.img
Setting up pxelinux
Testing
Scheduler installation
The Quick Start Administrator Guide is very helpful.
- Install
- slurm-llnl
- slurm-llnl-slurmdbd
- slurm-llnl-doc
- mkdir /var/run/slurm-llnl
Munge
Munge is an authentication framework recommended by slurm. All the configuration it needs is:
root@brems:# /usr/sbin/create-munge-key Generating a pseudo-random key using /dev/urandom completed. root@brems:# /etc/init.d/munge start
Adding a node to the cluster
- Set node to PXE boot
- Add a new entry to /etc/dhcp3/dhcpd.conf
- Reload dhcpd
- Add new host to /etc/hosts file
- Copy /etc/hosts to /nodes/lucid/etc/hosts
- Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
- Boot the node
- Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
- Add the host to cluster ssh
- Add the node to slurm config
Customising initrd
Sometimes a customised initrd is necessary.
Extracting
Create a directory to extract into:
mkdir init_test; cd init_test
Extract an existing initrd
gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv
Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).
Then package up the directory into a new initrd:
find ./ | cpio -ov -H newc | gzip > ../initrd.new