Running With Slurm

From New IAC Wiki
Revision as of 22:26, 2 August 2010 by Oborn (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

[Slurm https://computing.llnl.gov/linux/slurm/] is the queuing system used on Brems. It allows multiple users to put jobs into a queue and the system to negotiate running them optimally.

The following instructions are for running MCNPX on Brems.

Adding you run to the queue

Method 1: Easy

Use the squeue script to submit multiple input files. This will run each of them in parallel with the -n inputfile option given to MCNPX.

brian@brems:~/work$ queuemcnpx 14MeV.i 18MeV.i 22MeV.i 9MeV.i 
adding mcnpx n=14MeV.i | sbatch: Submitted batch job 119
adding mcnpx n=18MeV.i | sbatch: Submitted batch job 120
adding mcnpx n=22MeV.i | sbatch: Submitted batch job 121
adding mcnpx n=9MeV.i | sbatch: Submitted batch job 122

The new jobs are now in the queue

brian@brems:~/work$ squeue
 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(RE
   119     brems  14MeV.i    brian  PD       0:00      1 (Resources)
   120     brems  18MeV.i    brian  PD       0:00      1 (Resources)
   121     brems  22MeV.i    brian  PD       0:00      1 (Resources)
   122     brems   9MeV.i    brian  PD       0:00      1 (Resources)

Method 2: Allows custom options

Create a script file like this one, substituting your MCNPX parameters at the end of the last line:

#!/bin/bash
#number of processes to run:
#SBATCH -n 8 
export DATAPATH=/opt/mcnpx/data/
srun /opt/mcnpx/v27b_64_mpi_i8_slurm/bin/mcnpx i=14MeV.i o=14MeV.o

Add your run to the queue using sbatch. The -J is optional and allows you to specify a name for your job so you can keep track on them in the queue better.

brian@brems:~/work$ sbatch ./runmcnpx -J myjobname
sbatch: Submitted batch job 3

Checking runs currently in the queue

brian@brems:~/work$ squeue
 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
   119     brems  14MeV.i    brian  PD       0:00      1 (Resources)
   120     brems  18MeV.i    brian  PD       0:00      1 (Resources)
   121     brems  22MeV.i    brian  PD       0:00      1 (Resources)
   122     brems   9MeV.i    brian  PD       0:00      1 (Resources)
   106     brems    1.0mm     neba   R    1:01:30      1 brems

Check the output of your run

Replace 33 with the JOBID of your run:

brian@brems:~/work$ less slurm-33.out
mcnpx    ver=27b   ld=Tue Aug 18 08:00:00 MST 2009   03/31/10 19:03:50
*************************************************************
*                                                           *
*                   MCNPX                                   *
*                                                           *
* Copyright 2007. Los Alamos National Security, LLC.        *
* All rights reserved.                                      *
*                                                           *
* This material was produced under U.S. Government contract *

Cancel a queued or running job

Use the scancel command, replacing 33 with your JOBID

brian@brems:~/work$ scancel 33


Attach to a currently running job

Use the sattach command, replacing 33 with your JOBID, but making sure to retain the .0

brian@brems:~/work$ sattach 106.0
mcnpx    ver=27b   ld=Tue Aug 18 08:00:00 MST 2009   04/14/10 15:32:29
*************************************************************
*                                                           *
*                   MCNPX                                   *
*                                                           *
* Copyright 2007. Los Alamos National Security, LLC.        *
* All rights reserved.                                      *