SLURM

SLURM, formerly known as the Simple Linux Utility for Resource Management, is a type of program called a workload manager.1 On large, multi-user systems it can be advantageous and equitable for a program to control the allocation of computational resources, SLURM does just that.

When you want to run a job on the CHEM cluster you have to ask the SLURM daemon for resources to allocate to your job. It then takes your script, figures out how much compute power you want, and, if the nodes/memory are available, runs your script on them. If not, it places them in a queue until the requested resources become available to you.

SLURM has its own set of commands, and its full documentation can be found here,2 but here we’ll go over only the most important ones: sinfo, pestat, squeue, sbatch, and scancel.

Gathering information

sinfo gives us information about the status of the cluster’s computing nodes (a node is a single computer in the cluster).

nml64@as-chm-cluster | ~ $ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
chemq        up   infinite      1  alloc chem001 
chemq        up   infinite      5   idle chem[002-006] 
collumq      up   infinite      4  down* dbc[001-003,005] 
collumq      up   infinite      1    mix dbc009 
collumq      up   infinite      1  alloc dbc007 
collumq      up   infinite      4   idle dbc[004,006,008,010] 
widomq       up   infinite      1  drain bw001 
widomq       up   infinite      1    mix bw007 
widomq       up   infinite      5   idle bw[002-006]

By default, sinfo lists the nodes by their partition; the highest level of cluster organization (a partition is a set of compute nodes). On our shared system nodes are partitioned by ownership, but other systems may have partitions based on usage (e.g. large jobs, small jobs, post-processing, data visualization, etc…) allowing the admin to install different programs on different partitions.

By default SLURM commands only show us nodes we have access to (more on how to change that below). So, for example, in the above snippet we have 3 partitions chemq, collumq, and widdomq. collumq has 10 total nodes, 4 of which (dbc1-3 and dbc5) are currently down, 1 of which (dbc7) is fully allocated to a running job, 1 of which (dob9) is mixed, which means it still has resources available, and 4 of which (dbc4,6,8,10) are idle. sinfo doesn’t provide the most readable output, so sometimes its easier to use pestat.

Notice how the terminal’s command prompt has changed from NathanLui@Local to nml64@as-chm-cluster. This is because I’m now connected to the cluster, instead of working locally on my own computer (more on how to do this in the next section).

pestat is quite similar to sinfo -N (-N provides a node-oriented view of the cluster), but I find the layout much easier to read. pestat also gives us data as to the CPU and memory capacities of each node which will be helpful later.

nml64@as-chm-cluster | ~ $ pestat
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
                            State Use/Tot              (MB)     (MB)  JobId User ...
   bw001          widomq    drain*  0  12    0.00     48277    37640   
   bw002          widomq     idle   0  12    0.00     64382    62151   
   bw003          widomq     idle   0  12    0.00     64382    62139   
   bw004          widomq     idle   0  12    0.00     64382    62144   
   bw005          widomq     idle   0  12    0.00     64382    62135   
   bw006          widomq     idle   0  12    0.00     64382    62135   
   bw007          widomq      mix   2  12    1.00*    64382    14752  8609 m----  
 chem001           chemq    alloc  16  16   15.98     31935    24991  8691 j-----  
 chem002           chemq     idle   0  16    0.00     31935    29712   
 chem003           chemq     idle   0  16    0.00     31935    29711   
 chem004           chemq     idle   0  16    0.00     31935    29721   
 chem005           chemq     idle   0  16    0.00     31935    29728   
 chem006           chemq     idle   0  16    0.00     31935    29730   
  dbc001         collumq    down*   0   8    0.00*    16032        0   
  dbc002         collumq    down*   0   8    0.00*     7968        0   
  dbc003         collumq    down*   0   8    0.00*    16032        0   
  dbc004         collumq     idle   0  16    0.00     24085    21791   
  dbc005         collumq    down*   0  16    0.00*    24085        0   
  dbc006         collumq     idle   0  16    0.00     24085    21805   
  dbc007         collumq    alloc  12  12   11.96     32126    26506  8652 nml64  
  dbc008         collumq     idle   0  12    0.00     32126    29959   
  dbc009         collumq      mix  12  40   11.82    192049   182642  8679 nml64  
  dbc010         collumq     idle   0  40    0.00    192049   189551   

squeue displays the current job queue:

nml64@as-chm-cluster | ~ $ squeue
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST 
     8609     chemq matlab_t    m----  R 5-13:38:51      1 bw007 
     8652   collumq trans-Na    nml64  R 3-20:58:56      1 dbc007 
     8679   collumq cis-NaTB    nml64  R 1-03:59:54      1 dbc009 
     8691     chemq A2HMPA3_   j-----  R      39:33      1 chem001 

Notice how squeue gives us a lot of information about the running jobs; it tells us the job number, who’s running the job, the number of node(s), which node(s), their respective partitions, and how long the jobs have been running for.

By default, squeue and sinfo only gives us data on the nodes we have permission to use, but if we wanted to check on other nodes we can use the -all switch.

nml64@as-chm-cluster | ~ $ sinfo -all
Tue Dec 21 15:29:47 2021
PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST 
chemq        up   infinite 1-infinite   no       NO chemit,col      1   allocated chem001 
chemq        up   infinite 1-infinite   no       NO chemit,col      5        idle chem[002-006] 
slinq        up   infinite 1-infinite   no       NO slin,chemi      1   allocated sl001 
slinq        up   infinite 1-infinite   no       NO slin,chemi      1        idle sl002 
wilsonq      up   infinite 1-infinite   no       NO wilson,che      2        idle jjw[001-002] 
chenq        up   infinite 1-infinite   no       NO chen,chemi      1       mixed pc002 
chenq        up   infinite 1-infinite   no       NO chen,chemi      1        idle pc001 
loringq      up   infinite 1-infinite   no       NO loring,che      2       down* rl[001,003] 
loringq      up   infinite 1-infinite   no       NO loring,che      1     drained rl004 
loringq      up   infinite 1-infinite   no       NO loring,che      1        idle rl002 
collumq      up   infinite 1-infinite   no       NO collum,che      4       down* dbc[001-003,005] 
collumq      up   infinite 1-infinite   no       NO collum,che      1       mixed dbc009 
collumq      up   infinite 1-infinite   no       NO collum,che      1   allocated dbc007 
collumq      up   infinite 1-infinite   no       NO collum,che      4        idle dbc[004,006,008,010] 
widomq       up   infinite 1-infinite   no       NO chemit,col      1     drained bw001 
widomq       up   infinite 1-infinite   no       NO chemit,col      1       mixed bw007 
widomq       up   infinite 1-infinite   no       NO chemit,col      5        idle bw[002-006] 
lambertq     up   infinite 1-infinite   no       NO lambert,ch      2   allocated tl[001-002] 
nml64@as-chm-cluster | ~ $ squeue -all
Tue Dec 21 15:30:50 2021
JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMIT   NODES NODELIST(REASON) 
 8590     slinq B3LYP-D3    y----  RUNNING 6-01:40:46 UNLIMITED        1 sl001 
 8608     chenq matlab_t    m----  RUNNING 5-13:51:21 37-12:00:00      1 pc002 
 8609    widomq matlab_t    m----  RUNNING 5-13:48:40 37-12:00:00      1 bw007 
 8614     chenq matlab_t    m----  RUNNING 5-01:41:36 37-12:00:00      1 pc002 
 8652   collumq trans-Na    nml64  RUNNING 3-21:08:45 17-12:00:00      1 dbc007 
 8679   collumq cis-NaTB    nml64  RUNNING 1-04:09:43 17-12:00:00      1 dbc009 
 8686     slinq B3LYP-D3    y----  RUNNING    3:18:46 UNLIMITED        1 sl001 
 8688     slinq B3LYP-D3    y----  RUNNING      52:00 UNLIMITED        1 sl001 
 8691     chemq A2HMPA3_   j-----  RUNNING      49:22 17-12:00:00      1 chem001 
 8692  lambertq Dimer-6m     k---  RUNNING      19:07 5-00:00:00       1 tl001 
 8693  lambertq Dimer-6m     k---  RUNNING      19:07 5-00:00:00       1 tl002 

There are many switches you can use to filter the output of squeue and sinfo by user --user, partition --partition, node state --state, etc.

These are some of the most important commands we’ll use in this tutorial. A short cheat sheet can be found here.3

Submitting jobs

sbatch and scancel are mirror commands. sbatch <script> submits the job script to the SLURM daemon for resource allocation, and returns a job ID number. scancel <job ID> cancels a job after allocation i.e., before or after a job starts running. Any files that have already been written will be preserved as they are when scancel is executed (keep this in mind if you choose to write any large scratch files to your job directory instead of /scratch). In the next section, we’ll learn about how to format submission scripts and submit our first SLURM job.


Previous
My First Script
Home
Next
My First SLURM Job


References

(1) SLURM Workload Manager
(2) SLURM Documentation
(3) SLURM Cheat Sheet