Difference between revisions of "Brems"

From New IAC Wiki
Jump to navigation Jump to search
(No difference)

Revision as of 15:55, 26 June 2008

Brems

Specfications

Brems is made of 21 nodes and 60 CPU cores.

  • 12 nodes have two 2.0 GHz Opteron CPUs, 2 GB of ECC RAM, and a 40 GB hard drive
  • 9 nodes have two dual-core 2.0 GHz Opteron CPUs, 4 GB of ECC RAM, and a 80 GB hard drive
  • The head node also has 1.75 TB of redundant RAID5 storage.

Running on Brems

MPI

Querying Brems usage

This is a typical output from the command cps2

brian@brems:~$ cps2
************************* brems *************************
--------- brems1---------
pctcpu: 26.00    fname: process.pl    time: 260000    uname: brian    
pctcpu: 2.50    fname: python    time: 50000    uname: brian    
pctcpu: 1.00    fname: ssh    time: 20000    uname: brian    
--------- brems2---------
pctcpu: inf    fname: process.pl    time: 210000    uname: brian    
pctcpu: 1.31    fname: mcnpx-mpi-e2    time: 16761650000    uname: makavakh    
pctcpu: 1.08    fname: mcnpx-mpi-e2    time: 25739210000    uname: makavakh    
--------- brems3---------
pctcpu: 23.00    fname: process.pl    time: 230000    uname: brian    
pctcpu: 0.01    fname: rpciod/1    time: 517950000    uname: root    
pctcpu: 0.00    fname: sshd    time: 0    uname: brian    

...

--------- brems21---------
pctcpu: 20.00    fname: process.pl    time: 200000    uname: brian    
pctcpu: 0.00    fname: sshd    time: 0    uname: brian    
pctcpu: 0.00    fname: sshd    time: 0    uname: root    
pctcpu: 0.00    fname: lockd    time: 0    uname: root    
pctcpu: 0.00    fname: rpciod/3    time: 20000    uname: root    
pctcpu: 0.00    fname: rpciod/2    time: 0    uname: root    

The cps2 command shows all the processes taking up CPU time on all the cluster nodes. If someone is already running, you will see output similar to the following:

Paste busy cps2 here

Screen

Screen is currently the preferred way to run processes on Brems. It allows you to run programs in such a way that they won't terminate when you log off, and you can control the application and see the output in real time.

Basic screen usage

  • Use screen -R to connect a screen session, or start a new one if none exist.
  • Ctrl-a disconnects from within a screen session
Examples

Start a screen session:

brian@brems:~$ screen -R

Do some work in the screen session:

brian@brems:~/work$ /brems/bin/mpirun -np 4 /brems/bin/mcnpx-mpi-e2 i=test1.i
 mcnpx    ver=2.5e  ld=Mon Feb 23 09:00:00 MST 2004   06/26/08 15:34:56

 *****************************************************
 *                                                   *
 *            Copyright Notice for MCNPX             *

Hitting Ctrl-a disconnects from the screen, and ps shows MCNPX still running:

brian@brems:~$ screen -R
[detached]
brian@brems:~$ ps ux | grep mcnpx
brian    30339  0.5  0.0   9132  1844 pts/10   S+   15:40   0:00 /bin/sh /brems/bin/mpirun -np 4 /brems/bin/mcnpx-mpi-e2 i=test1.i
brian    30467  9.5  0.4  43300  9400 pts/10   S+   15:40   0:00 /brems/bin/mcnpx-mpi-e2 i=test1.i -p4pg /home/brian/work/PI30339 -p4wd /home/brian/work
brian    30468  0.0  0.0  21476   544 pts/10   S+   15:40   0:00 /brems/bin/mcnpx-mpi-e2 i=test1.i -p4pg /home/brian/work/PI30339 -p4wd /home/brian/work
brian    30469  0.3  0.1  21344  2380 pts/10   S+   15:40   0:00 /usr/bin/rsh brems2 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems2 \-p4rmrank 1
brian    30470  0.0  0.1  21340  2376 pts/10   S+   15:40   0:00 /usr/bin/rsh brems3 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems3 \-p4rmrank 2
brian    30471  0.3  0.1  21340  2376 pts/10   S+   15:40   0:00 /usr/bin/rsh brems4 -l brian -n /brems/bin/mcnpx-mpi-e2 brems.iac.isu.edu 45735 \-p4amslave \-p4yourname brems4 \-p4rmrank 3
brian    30473  0.0  0.0   4372   656 pts/0    S+   15:40   0:00 grep mcnpx
http://www.hpcwire.com/topic/processors/GPGPUs_Make_Headway_in_Bioscience.html

And we can connect back to the screen session to see the MCNPX output:

brian@brems:~$ screen -R
 run terminated when    100000 particle histories were done.
 warning.  tally  11 tfc bin did not pass  1 of 10 statistical checks.
 warning.     1 of   7 tallies did not pass all 10 statistical checks.
 warning.     2 of   7 tallies had bins with large relative errors.
 dump   11 on file runtpr     nps =      100000  coll =       31827944
                              ctm =      3.75     nrn =      514286097
 mcrun  is done
FORTRAN STOP
FORTRAN STOP
FORTRAN STOP
FORTRAN STOP
brian@brems:~/work$ 

Advanced screen tricks

  • Using screen without -r or -R allows you to start a new screen session even if one already exists
  • Using screen -list shows you the current screen sessions you have
brian@brems:~$ screen   
[detached]
brian@brems:~$ screen -list
There are screens on:
	29819.pts-0.brems	(Detached)
	30502.pts-0.brems	(Detached)
2 Sockets in /var/run/screen/S-brian.

brian@brems:~$ 

You can choose which one to reattach to by specifying a unique part of the name:

brian@brems:~$ screen -r 298


MCNPX

Compiling on Brems

gcc

Portland Group