Difference between revisions of "Running With Slurm"

From New IAC Wiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
[Slurm https://computing.llnl.gov/linux/slurm/] is the queuing system used on Brems. It allows multiple users to put jobs into a queue and the system to negotiate running them optimally. To use slurm with MCNPX do the following: '''(beta instructions)'''
+
[Slurm https://computing.llnl.gov/linux/slurm/] is the queuing system used on Brems. It allows multiple users to put jobs into a queue and the system to negotiate running them optimally.
  
Create a script file like this one, substituting your MCNPX parameters:
+
The following instructions are for running MCNPX on Brems.
brian@brems:~/work/maxwell/run15$ cat runmcnpx
+
 
 +
==Adding you run to the queue==
 +
 
 +
====Method 1: Easy====
 +
Use the squeue script to submit multiple input files. This will run each of them in parallel with the ''-n inputfile'' option given to MCNPX.
 +
brian@brems:~/work$ queuemcnpx 14MeV.i 18MeV.i 22MeV.i 9MeV.i
 +
adding mcnpx n=14MeV.i | sbatch: Submitted batch job 119
 +
adding mcnpx n=18MeV.i | sbatch: Submitted batch job 120
 +
adding mcnpx n=22MeV.i | sbatch: Submitted batch job 121
 +
adding mcnpx n=9MeV.i | sbatch: Submitted batch job 122
 +
 
 +
The new jobs are now in the queue
 +
brian@brems:~/work$ squeue
 +
  JOBID PARTITION    NAME    USER  ST      TIME  NODES NODELIST(RE
 +
    119    brems  14MeV.i    brian  PD      0:00      1 (Resources)
 +
    120    brems  18MeV.i    brian  PD      0:00      1 (Resources)
 +
    121    brems  22MeV.i    brian  PD      0:00      1 (Resources)
 +
    122    brems  9MeV.i    brian  PD      0:00      1 (Resources)
 +
 
 +
====Method 2: Allows custom options====
 +
Create a script file like this one, substituting your MCNPX parameters at the end of the last line:
 
  #!/bin/bash
 
  #!/bin/bash
 
  #number of processes to run:
 
  #number of processes to run:
 
  #SBATCH -n 8  
 
  #SBATCH -n 8  
  DATAPATH=/opt/mcnpx/data/
+
  export DATAPATH=/opt/mcnpx/data/
  srun /opt/mcnpx/v27b_64_mpi_i8_slurm/bin/mcnpx i=14MeV.i
+
  srun /opt/mcnpx/v27b_64_mpi_i8_slurm/bin/mcnpx i=14MeV.i o=14MeV.o
  
Add your run to the queue:
+
Add your run to the queue using '''sbatch'''. The -J is optional and allows you to specify a name for your job so you can keep track on them in the queue better.
  brian@brems:~/work/maxwell/run15$ sbatch ./runmcnpx
+
  brian@brems:~/work$ sbatch ./runmcnpx -J myjobname
 
  sbatch: Submitted batch job 3
 
  sbatch: Submitted batch job 3
  
Verify that your run is in the queue:
+
==Checking runs currently in the queue==
  brian@brems:~/work/maxwell/run15$ squeue
+
  brian@brems:~/work$ squeue
 
   JOBID PARTITION    NAME    USER  ST      TIME  NODES NODELIST(REASON)
 
   JOBID PARTITION    NAME    USER  ST      TIME  NODES NODELIST(REASON)
       3     brems runmcnpx   brian  PD      0:00      1 (Resources)
+
    119    brems  14MeV.i    brian  PD       0:00      1 (Resources)
       2     brems runmcnpx   brian  R     11:54     1 brems
+
    120     brems 18MeV.i   brian  PD      0:00      1 (Resources)
 +
    121    brems  22MeV.i    brian  PD       0:00      1 (Resources)
 +
    122     brems   9MeV.i   brian PD      0:00      1 (Resources)
 +
    106    brems    1.0mm    neba   R   1:01:30     1 brems
  
You can check the output of your run:
+
==Check the output of your run==
  brian@brems:~/work/maxwell/run15$ cat slurm-3.out
+
Replace 33 with the '''JOBID''' of your run:
 +
  brian@brems:~/work$ less slurm-33.out
 
  mcnpx    ver=27b  ld=Tue Aug 18 08:00:00 MST 2009  03/31/10 19:03:50
 
  mcnpx    ver=27b  ld=Tue Aug 18 08:00:00 MST 2009  03/31/10 19:03:50
 
 
  *************************************************************
 
  *************************************************************
 
  *                                                          *
 
  *                                                          *
Line 32: Line 55:
 
  * This material was produced under U.S. Government contract *
 
  * This material was produced under U.S. Government contract *
  
To cancel a queued or running job use scancel:
+
==Cancel a queued or running job==
  brian@brems:~$ scancel 3
+
Use the '''scancel''' command, replacing 33 with your '''JOBID'''
  brian@brems:~$ squeue
+
  brian@brems:~/work$ scancel 33
   JOBID PARTITION    NAME    USER ST      TIME NODES NODELIST(REASON)
+
 
      4    brems runmcnpx makavakh PD      0:00      1 (Resources)
+
 
      2    brems runmcnpx    brian  R      33:02      1 brems
+
==Attach to a currently running job==
 +
Use the '''sattach''' command, replacing 33 with your '''JOBID''', but making sure to retain the .0
 +
  brian@brems:~/work$ sattach 106.0
 +
mcnpx    ver=27b  ld=Tue Aug 18 08:00:00 MST 2009   04/14/10 15:32:29
 +
*************************************************************
 +
  *                                                          *
 +
  *                  MCNPX                                  *
 +
*                                                          *
 +
  * Copyright 2007. Los Alamos National Security, LLC.        *
 +
* All rights reserved.                                      *

Latest revision as of 22:26, 2 August 2010

[Slurm https://computing.llnl.gov/linux/slurm/] is the queuing system used on Brems. It allows multiple users to put jobs into a queue and the system to negotiate running them optimally.

The following instructions are for running MCNPX on Brems.

Adding you run to the queue

Method 1: Easy

Use the squeue script to submit multiple input files. This will run each of them in parallel with the -n inputfile option given to MCNPX.

brian@brems:~/work$ queuemcnpx 14MeV.i 18MeV.i 22MeV.i 9MeV.i 
adding mcnpx n=14MeV.i | sbatch: Submitted batch job 119
adding mcnpx n=18MeV.i | sbatch: Submitted batch job 120
adding mcnpx n=22MeV.i | sbatch: Submitted batch job 121
adding mcnpx n=9MeV.i | sbatch: Submitted batch job 122

The new jobs are now in the queue

brian@brems:~/work$ squeue
 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(RE
   119     brems  14MeV.i    brian  PD       0:00      1 (Resources)
   120     brems  18MeV.i    brian  PD       0:00      1 (Resources)
   121     brems  22MeV.i    brian  PD       0:00      1 (Resources)
   122     brems   9MeV.i    brian  PD       0:00      1 (Resources)

Method 2: Allows custom options

Create a script file like this one, substituting your MCNPX parameters at the end of the last line:

#!/bin/bash
#number of processes to run:
#SBATCH -n 8 
export DATAPATH=/opt/mcnpx/data/
srun /opt/mcnpx/v27b_64_mpi_i8_slurm/bin/mcnpx i=14MeV.i o=14MeV.o

Add your run to the queue using sbatch. The -J is optional and allows you to specify a name for your job so you can keep track on them in the queue better.

brian@brems:~/work$ sbatch ./runmcnpx -J myjobname
sbatch: Submitted batch job 3

Checking runs currently in the queue

brian@brems:~/work$ squeue
 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
   119     brems  14MeV.i    brian  PD       0:00      1 (Resources)
   120     brems  18MeV.i    brian  PD       0:00      1 (Resources)
   121     brems  22MeV.i    brian  PD       0:00      1 (Resources)
   122     brems   9MeV.i    brian  PD       0:00      1 (Resources)
   106     brems    1.0mm     neba   R    1:01:30      1 brems

Check the output of your run

Replace 33 with the JOBID of your run:

brian@brems:~/work$ less slurm-33.out
mcnpx    ver=27b   ld=Tue Aug 18 08:00:00 MST 2009   03/31/10 19:03:50
*************************************************************
*                                                           *
*                   MCNPX                                   *
*                                                           *
* Copyright 2007. Los Alamos National Security, LLC.        *
* All rights reserved.                                      *
*                                                           *
* This material was produced under U.S. Government contract *

Cancel a queued or running job

Use the scancel command, replacing 33 with your JOBID

brian@brems:~/work$ scancel 33


Attach to a currently running job

Use the sattach command, replacing 33 with your JOBID, but making sure to retain the .0

brian@brems:~/work$ sattach 106.0
mcnpx    ver=27b   ld=Tue Aug 18 08:00:00 MST 2009   04/14/10 15:32:29
*************************************************************
*                                                           *
*                   MCNPX                                   *
*                                                           *
* Copyright 2007. Los Alamos National Security, LLC.        *
* All rights reserved.                                      *