Difference between revisions of "Installing A Cluster"

From New IAC Wiki
Jump to navigation Jump to search
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==Network configuration==
 
==Network configuration==
Head nodes works as NAT for slave nodes
+
Head nodes works as NAT router for slave nodes
  
 
eth0 connects to outside world
 
eth0 connects to outside world
 
eth1 is internal
 
eth1 is internal
  
Internal network is 10.0.200.0/255.0.0.0
+
Internal network is 10.200.0.0/255.255.0.0
  
 
==OS installation==
 
==OS installation==
 
Normal server Linux install with the following packages:
 
Normal server Linux install with the following packages:
* ssh server
+
* openssh-server
 
* tftpd-hpa
 
* tftpd-hpa
 
* dhcp3-server
 
* dhcp3-server
 
* nfs-kernel-server
 
* nfs-kernel-server
 
* debootstrap
 
* debootstrap
 +
* libpmi
 +
* mpich2
 +
* slurm-llnl
 +
* slurm-llnl-slurmdbd
 +
* syslinux
 +
 +
===Setting up a chroot for the node root===
 +
* Boot to a live CD and mount the root drive as /new
 +
* Copy the root fs to a chroot '''rsync -av /new/ /new/nodes/precise/ --exclude /nodes'''
 +
* Enter the chroot '''chroot /nodes/precise/'''
 +
* Change the following lines in ''/etc/initramfs-tools/initramfs.conf''
 +
<pre>
 +
MODULES=netboot
 +
BOOT=nfs
 +
</pre>
 +
The DEVICE= and NFSROOT= lines may also be of use if the tftpboot configuration isn't working
 +
* Update the initramfs '''update-initramfs -c -k all'''
 +
 +
===NFS===
 +
 +
On the head node, add the following to /etc/exports
 +
/nodes/lucid    10.200.0.0/24(ro,async,no_root_squash,no_subtree_check)
 +
/home          10.200.0.0/24(rw,async,root_squash,no_subtree_check)
 +
see '''man 5 exports''' for more information. There are also more nfs settings in
 +
 +
/etc/default/nfs-common
 +
/etc/default/nfs-kernel-server
  
Grab pxelinux from the web
+
Reload the NFS server settings
 +
/etc/init.d/nfs-kernel-server force-reload
  
 
===Netbooting===
 
===Netbooting===
Line 21: Line 49:
 
====Setting up dhcp====
 
====Setting up dhcp====
 
Edit /etc/dhcp3/dhcpd.conf as follows:
 
Edit /etc/dhcp3/dhcpd.conf as follows:
  dhcpd.conf - note comma between dns servers
+
<pre>
 +
  ddns-update-style none;
 +
option domain-name "iac.isu.edu";
 +
option domain-name-servers 134.50.254.5, 134.50.57.57;
 +
authoritative;
 +
 
 +
default-lease-time 600;
 +
max-lease-time 1200;
 +
 
 +
subnet 10.0.200.0 netmask 255.255.255.0 
 +
{
 +
  range 10.0.34.1 10.0.34.200; #we really don't even need a range
 +
  option routers 10.0.200.1;
 +
  #address of the TFTP server, optional if the same as the dhcp server
 +
  next-server 10.0.200.1;
 +
  filename "pxelinux.0";
 +
}
 +
#eth0 on brems2
 +
host brems2_0
 +
{
 +
  hardware ethernet 00:50:45:5C:10:54;
 +
  fixed-address 10.0.200.2;
 +
  option host-name "brems2";
 +
}
 +
</pre>
 +
 
 +
 
 
Edit /etc/default/dhcp3-server
 
Edit /etc/default/dhcp3-server
 
  INTERFACES=eth1
 
  INTERFACES=eth1
 
This will avoid dhcp serving on the outside network!
 
This will avoid dhcp serving on the outside network!
  
  service dhcp3-server start
+
  service dhcp3-server restart
 
 
  
 
====Setting up tftp====
 
====Setting up tftp====
 
Edit /etc/default/tftpd-hpa
 
Edit /etc/default/tftpd-hpa
  RUN_DAEMON="yes" #had problems with inetd in the past
+
  TFTP_USERNAME="tftp"
  OPTIONS="-l -a 10.0.200.1 -s /var/lib/tftpboot"
+
TFTP_DIRECTORY="/var/lib/tftpboot"
 +
#Keep TFTP on the inside network
 +
  TFTP_ADDRESS="10.0.200.1:69"
 +
TFTP_OPTIONS="--secure"
  
 +
Set up the filesystem to boot using pxe and tftpd:
 +
mkdir -p /var/lib/tftpboot/pxelinux.cfg
 +
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/
  
====Setting up nfs====
+
Make a file similiar to the following as /var/lib/tftpboot/boot.msg
 +
Booting Brems!!
  
====Testing====
+
Put the following in '''/var/lib/tftpboot/pxelinux.cfg/default'''.
 +
The files are referenced relative to /var/lib/tftpboot/
 +
TIMEOUT 5
 +
DISPLAY boot.msg
 +
DEFAULT vmlinuz
 +
APPEND root=/dev/nfs initrd=initrd.img netboot=nfs nfsroot=10.0.200.1:/nodes/precise,nolock,ro nomodeset
  
 +
Set up a kernel and initrd in tftpboot
 +
cp /nodes/precise/boot/vmlinuz-3.2.0-24-generic /var/lib/tftpboot/vmlinuz
 +
chmod a+r /var/lib/tftpboot/vmlinuz
 +
cp /nodes/precise/boot/initrd.img-3.2.0-24-generic /var/lib/tftpboot/initrd.img
  
 +
====Setting up pxelinux====
 +
 +
====Testing====
  
 
==Scheduler installation==
 
==Scheduler installation==
Install
+
The [https://computing.llnl.gov/linux/slurm/quickstart_admin.html Quick Start Administrator Guide] is very helpful.
 +
 
 +
# Install
 
* slurm-llnl
 
* slurm-llnl
 
* slurm-llnl-slurmdbd
 
* slurm-llnl-slurmdbd
 
* slurm-llnl-doc
 
* slurm-llnl-doc
 
** mkdir /var/run/slurm-llnl
 
** mkdir /var/run/slurm-llnl
 +
 +
===Munge===
 +
[http://code.google.com/p/munge/ Munge] is an authentication framework recommended by slurm. All the configuration it needs is:
 +
root@brems:# /usr/sbin/create-munge-key
 +
Generating a pseudo-random key using /dev/urandom completed.
 +
root@brems:# /etc/init.d/munge start
 +
 +
==Adding a node to the cluster==
 +
#Set node to PXE boot
 +
#Add a new entry to /etc/dhcp3/dhcpd.conf
 +
#Reload dhcpd
 +
#Add new host to /etc/hosts file
 +
#Copy /etc/hosts to /nodes/lucid/etc/hosts
 +
#Create a new var '''cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost'''
 +
#Boot the node
 +
#Add the ssh key to system-wide known-hosts '''ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts'''
 +
#Add the host to cluster ssh
 +
#Add the node to slurm config
 +
 +
==Customising initrd==
 +
Sometimes a customised initrd is necessary.
 +
====Extracting====
 +
Create a directory to extract into:
 +
mkdir init_test; cd init_test
 +
Extract an existing initrd
 +
gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv
 +
Edit the file tree as needed. You can add files (modules for instance),
 +
or editing boot script (/init is run by default).
 +
 +
Then package up the directory into a new initrd:
 +
find ./ | cpio -ov -H newc | gzip > ../initrd.new

Latest revision as of 21:09, 6 August 2012

Network configuration

Head nodes works as NAT router for slave nodes

eth0 connects to outside world eth1 is internal

Internal network is 10.200.0.0/255.255.0.0

OS installation

Normal server Linux install with the following packages:

  • openssh-server
  • tftpd-hpa
  • dhcp3-server
  • nfs-kernel-server
  • debootstrap
  • libpmi
  • mpich2
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • syslinux

Setting up a chroot for the node root

  • Boot to a live CD and mount the root drive as /new
  • Copy the root fs to a chroot rsync -av /new/ /new/nodes/precise/ --exclude /nodes
  • Enter the chroot chroot /nodes/precise/
  • Change the following lines in /etc/initramfs-tools/initramfs.conf
 MODULES=netboot
 BOOT=nfs

The DEVICE= and NFSROOT= lines may also be of use if the tftpboot configuration isn't working

  • Update the initramfs update-initramfs -c -k all

NFS

On the head node, add the following to /etc/exports

/nodes/lucid    10.200.0.0/24(ro,async,no_root_squash,no_subtree_check)
/home           10.200.0.0/24(rw,async,root_squash,no_subtree_check)

see man 5 exports for more information. There are also more nfs settings in

/etc/default/nfs-common 
/etc/default/nfs-kernel-server

Reload the NFS server settings

/etc/init.d/nfs-kernel-server force-reload

Netbooting

Setting up dhcp

Edit /etc/dhcp3/dhcpd.conf as follows:

 ddns-update-style none;
 option domain-name "iac.isu.edu";
 option domain-name-servers 134.50.254.5, 134.50.57.57;
 authoritative;

 default-lease-time 600;
 max-lease-time 1200;

 subnet 10.0.200.0 netmask 255.255.255.0  
 {
  range 10.0.34.1 10.0.34.200; #we really don't even need a range
  option routers 10.0.200.1;
  #address of the TFTP server, optional if the same as the dhcp server
  next-server 10.0.200.1;
  filename "pxelinux.0";
 }
 #eth0 on brems2
 host brems2_0
 {
  hardware ethernet 00:50:45:5C:10:54;
  fixed-address 10.0.200.2;
  option host-name "brems2";
 }


Edit /etc/default/dhcp3-server

INTERFACES=eth1

This will avoid dhcp serving on the outside network!

service dhcp3-server restart

Setting up tftp

Edit /etc/default/tftpd-hpa

TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/var/lib/tftpboot"
#Keep TFTP on the inside network
TFTP_ADDRESS="10.0.200.1:69"
TFTP_OPTIONS="--secure"

Set up the filesystem to boot using pxe and tftpd:

mkdir -p /var/lib/tftpboot/pxelinux.cfg
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/

Make a file similiar to the following as /var/lib/tftpboot/boot.msg

Booting Brems!!

Put the following in /var/lib/tftpboot/pxelinux.cfg/default. The files are referenced relative to /var/lib/tftpboot/

TIMEOUT 5
DISPLAY boot.msg
DEFAULT vmlinuz
APPEND root=/dev/nfs initrd=initrd.img netboot=nfs nfsroot=10.0.200.1:/nodes/precise,nolock,ro nomodeset

Set up a kernel and initrd in tftpboot

cp /nodes/precise/boot/vmlinuz-3.2.0-24-generic /var/lib/tftpboot/vmlinuz
chmod a+r /var/lib/tftpboot/vmlinuz
cp /nodes/precise/boot/initrd.img-3.2.0-24-generic /var/lib/tftpboot/initrd.img

Setting up pxelinux

Testing

Scheduler installation

The Quick Start Administrator Guide is very helpful.

  1. Install
  • slurm-llnl
  • slurm-llnl-slurmdbd
  • slurm-llnl-doc
    • mkdir /var/run/slurm-llnl

Munge

Munge is an authentication framework recommended by slurm. All the configuration it needs is:

root@brems:# /usr/sbin/create-munge-key
Generating a pseudo-random key using /dev/urandom completed.
root@brems:# /etc/init.d/munge start

Adding a node to the cluster

  1. Set node to PXE boot
  2. Add a new entry to /etc/dhcp3/dhcpd.conf
  3. Reload dhcpd
  4. Add new host to /etc/hosts file
  5. Copy /etc/hosts to /nodes/lucid/etc/hosts
  6. Create a new var cp -r /nodes/lucidvar/template /nodes/lucidvar/newhost
  7. Boot the node
  8. Add the ssh key to system-wide known-hosts ssh-keyscan newhost >> /etc/ssh/ssh_known_hosts
  9. Add the host to cluster ssh
  10. Add the node to slurm config

Customising initrd

Sometimes a customised initrd is necessary.

Extracting

Create a directory to extract into:

mkdir init_test; cd init_test

Extract an existing initrd

gzip -d < /var/lib/tftpboot/initrd.img | cpio -iv

Edit the file tree as needed. You can add files (modules for instance), or editing boot script (/init is run by default).

Then package up the directory into a new initrd:

find ./ | cpio -ov -H newc | gzip > ../initrd.new