My First SLURM Job

Now that we know how to gather information about the system, how do we ask it to run a job for us?

SLURM needs to know two things to run a job: what we want to do and the resources we need to do it. We’ll use a shell script to specify both of these parameters.

Let’s make a new script called submit.sh. In your text editor copy and paste the following (minimal) submission script:

Save this file in its own folder with a descriptive name like myFirstSlurmJob. Place the script hello.sh from the first exercise into this folder too. Now, in order to run this job you need to be on a system that is managed by SLURM. So let’s log on to the AS-CHEM cluster.

You’ll need to be connected to the Cornell VPN to access the cluster. If you are a Cornell chemistry student and don’t have access to the cluster go see ChemIT (or your group IT representative) to set up your cluster account. If you are not a Cornell chemistry student you’ll need to follow your institution’s cluster login instructions. Depending on how your cluster is set up some of the instructions below may not work, when in doubt contact your system administrator.

Open up the command-line, type ssh <yourNetID>@cluster2020.chem.cornell.edu and you’ll see a password prompt appear. As you enter your password nothing will appear; this is normal. The terminal is recording your keystrokes as usual, but will not display them for security purposes.

NathanLui@local | ~ $ ssh nml64@cluster2020.chem.cornell.edu
Password: 

Last login: Sun Dec 26 14:04:09 2021 from <IP address>
nml64@as-chm-cluster | ~ $ 

See how the terminal prompt has now changed from NathanLui@local to nml64@as-chm-cluster to indicate that I’m now working on the cluster. We can navigate the cluster with the same commands we learned earlier. To test our script we’ll need to use our SFTP client (FileZilla) to transfer our scripts to the cluster. If you haven’t yet, go set up FileZilla using the directions in section 1. Once you’ve done that, open FileZilla and connect to the AS-CHEM cluster. Drag your whole myFirstSlurmJob folder into the cluster pane to transfer it.

Of course, you have the option to create a new folder on the cluster directly using mkdir and then drag the individual shell scripts into that file, but as your experience grows as will the number of files you’ll have to keep track of. It will be much more manageable if the organization of your local system mirrors that of the cluster. Transferring whole directories ensures that paths will remain the same. For more, see best practices.

Now, lets navigate into that folder and take a look:

nml64@as-chm-cluster | ~ $ cd myFirstSlurmJob/
nml64@as-chm-cluster | ~/myFirstSlurmJob $ ls -l
total 8.0K
-rwxr-xr-x 1 nml64 collum  79 Dec 26 15:59 hello.sh
-rwxr-xr-x 1 nml64 collum 338 Dec 26 15:59 submit.sh

Now we can submit our job to the SLURM workload manager:

nml64@as-chm-cluster | ~/myFirstSlurmJob $ sbatch submit.sh
Submitted batch job 8716

Checking the job queue and node status shows us the progress of our new job:

nml64@as-chm-cluster | ~/myFirstSlurmJob $ squeue
 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
  8716     chemq    hello    nml64  R       0:16      1 chem006 
nml64@as-chm-cluster | ~/myFirstSlurmJob $ pestat
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
                            State Use/Tot              (MB)     (MB)  JobId User ... 
...
 chem006           chemq      mix  16  16    16.00    31935    29651  8716 nml64  
...

But wait a second! Where is our output? We’ve tasked 16 CPUs with 28 GB of memory to tell the whole world “Hello!”, where did it all go? Let’s take a look at our directory.

nml64@as-chm-cluster | ~/myFirstSlurmJob $ ls -l
total 12K
-rwxr-xr-x 1 nml64  79 Dec 26 15:59 hello.sh
-rw-r--r-- 1 nml64  81 Dec 26 17:35 out.txt
-rwxr-xr-x 1 nml64 338 Dec 26 15:59 submit.sh
nml64@as-chm-cluster | ~/myFirstSlurmJob $ cat out.txt
Starting job
Hello world!
My favorite food is pizza.
I am  years old.
Resting 30 sec
Job complete

So that’s where its all gone to! SLURM redirects all standard output from the terminal to the output file that we specified in the resource requests section.

There’s still another issue! The program doesn’t know how old we are because the environmental variable we declared in the last tutorial doesn’t get transferred with the file (i.e. we’re in a different environment). So we have to redeclare age in this environment.

This fixes our issue and if we run the job again we can see that the script works as it’s supposed to!

nml64@as-chm-cluster | ~/myFirstSlurmJob $ sbatch submit.sh
Submitted batch job 8719
nml64@as-chm-cluster | ~/myFirstSlurmJob $ ls -l
total 12K
-rwxr-xr-x 1 nml64 collum 107 Dec 26 17:46 hello.sh
-rw-r--r-- 1 nml64 collum 100 Dec 26 17:56 out.txt
-rwxr-xr-x 1 nml64 collum 352 Dec 26 17:55 submit.sh
nml64@as-chm-cluster | ~/myFirstSlurmJob $ cat out.txt
Starting job
Hello world!
My favorite food is pizza.
I am 26 years old.
Resting 30 sec
Job complete

🍾👏🍾 Congrats!!! You just ran your first SLURM job 🍾👏🍾


SLURM will overwrite data files with the same name

One important thing to note is that we ran this job multiple times in the same directory. So SLURM wrote over out.txt the second time we ran the job. There is no way to get back our first out.txt (trivially, you could scroll up in the terminal history looking for our previous cat out.txt call, but this isn’t really a generalizable solution). This could be problematic since we might not remember how we got to the previous out.txt and how to recreate its results. In general, a single folder should represent a single program call so that unintentional overwrites cannot happen. In other words:

Every new job should begin in its own new folder.

In the next section, we’ll talk about the final part of our recipe: the Gaussian input file.


Previous
SLURM Basics
Home
Next
Gaussian Input Files