Brems
Brems
Specfications
Brems is made of 21 nodes and 60 CPU cores.
- 12 nodes have two 2.0 GHz Opteron CPUs, 2 GB of ECC RAM, and a 40 GB hard drive
- 9 nodes have two dual-core 2.0 GHz Opteron CPUs, 4 GB of ECC RAM, and a 80 GB hard drive
- The head node also has 1.75 TB of redundant RAID5 storage.
Running on Brems
MPI
Querying Brems usage
This is a typical output from the command cps2
brian@brems:~$ cps2 ************************* brems ************************* --------- brems1--------- pctcpu: 26.00 fname: process.pl time: 260000 uname: brian pctcpu: 2.50 fname: python time: 50000 uname: brian pctcpu: 1.00 fname: ssh time: 20000 uname: brian --------- brems2--------- pctcpu: inf fname: process.pl time: 210000 uname: brian pctcpu: 1.31 fname: mcnpx-mpi-e2 time: 16761650000 uname: makavakh pctcpu: 1.08 fname: mcnpx-mpi-e2 time: 25739210000 uname: makavakh --------- brems3--------- pctcpu: 23.00 fname: process.pl time: 230000 uname: brian pctcpu: 0.01 fname: rpciod/1 time: 517950000 uname: root pctcpu: 0.00 fname: sshd time: 0 uname: brian ... --------- brems21--------- pctcpu: 20.00 fname: process.pl time: 200000 uname: brian pctcpu: 0.00 fname: sshd time: 0 uname: brian pctcpu: 0.00 fname: sshd time: 0 uname: root pctcpu: 0.00 fname: lockd time: 0 uname: root pctcpu: 0.00 fname: rpciod/3 time: 20000 uname: root pctcpu: 0.00 fname: rpciod/2 time: 0 uname: root
The cps2 command shows all the processes taking up CPU time on all the cluster nodes. If someone is already running, you will see output similar to the following:
Paste busy cps2 here
Screen
Screen is currently the preferred way to run processes on Brems. It allows you to run programs in such a way that they won't terminate when you log off, and you can control the application and see the output in real time.
Basic screen usage
- Use screen -R to connect a screen session, or start a new one if none exist.
- Ctrl-a disconnects from within a screen session
Examples
Start a screen session:
brian@brems:~$ screen -R
Do some work in the screen session:
brian@brems:~/work$ /brems/bin/mpirun -np 4 /brems/bin/mcnpx-mpi-e2 i=test1.i mcnpx ver=2.5e ld=Mon Feb 23 09:00:00 MST 2004 06/26/08 15:34:56 ***************************************************** * * * Copyright Notice for MCNPX *
Hitting Ctrl-a disconnects from the screen, and ps shows MCNPX still running:
brian@brems:~$ screen -R [detached] brian@brems:~$ ps ux | grep mcnpx brian 30339 0.5 0.0 9132 1844 pts/10 S+ 15:40 0:00 /bin/sh /brems/bin/mpirun -np 4 /brems/bin/mcnpx-mpi-e2 i=test1.i brian 30467 9.5 0.4 43300 9400 pts/10 S+ 15:40 0:00 /brems/bin/mcnpx-mpi-e2 i=test1.i -p4pg /home/brian/work/PI30339 -p4wd /home/brian/work brian 30468 0.0 0.0 21476 544 pts/10 S+ 15:40 0:00 /brems/bin/mcnpx-mpi-e2 i=test1.i -p4pg /home/brian/work/PI30339 -p4wd /home/brian/work brian 30469 0.3 0.1 21344 2380 pts/10 S+ 15:40 0:00 /usr/bin/rsh brems2 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems2 \-p4rmrank 1 brian 30470 0.0 0.1 21340 2376 pts/10 S+ 15:40 0:00 /usr/bin/rsh brems3 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems3 \-p4rmrank 2 brian 30471 0.3 0.1 21340 2376 pts/10 S+ 15:40 0:00 /usr/bin/rsh brems4 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems4 \-p4rmrank 3 brian 30473 0.0 0.0 4372 656 pts/0 S+ 15:40 0:00 grep mcnpx http://www.hpcwire.com/topic/processors/GPGPUs_Make_Headway_in_Bioscience.html
And we can connect back to the screen session to see the MCNPX output:
brian@brems:~$ screen -R
run terminated when 100000 particle histories were done. warning. tally 11 tfc bin did not pass 1 of 10 statistical checks. warning. 1 of 7 tallies did not pass all 10 statistical checks. warning. 2 of 7 tallies had bins with large relative errors. dump 11 on file runtpr nps = 100000 coll = 31827944 ctm = 3.75 nrn = 514286097 mcrun is done FORTRAN STOP FORTRAN STOP FORTRAN STOP FORTRAN STOP brian@brems:~/work$
Advanced screen tricks
- Using screen without -r or -R allows you to start a new screen session even if one already exists
- Using screen -list shows you the current screen sessions you have
brian@brems:~$ screen [detached] brian@brems:~$ screen -list There are screens on: 29819.pts-0.brems (Detached) 30502.pts-0.brems (Detached) 2 Sockets in /var/run/screen/S-brian. brian@brems:~$
You can choose which one to reattach to by specifying a unique part of the name:
brian@brems:~$ screen -r 298