# Instruction for users

## The summer school cluster

The [beluga-cloud](https://docs.alliancecan.ca/wiki/Cloud_resources) resource are provided by Alliance Canada / Calcul Quebec and powered by the by [Magic Castle](https://github.com/ComputeCanada/magic_castle) HPC management software project.


## Connection with the ssh client


Windows users, you should install a SSH client to assez the summer school compute cluster, such as [MobaXterm](https://mobaxterm.mobatek.net/) or [WSL](https://docs.microsoft.com/en-us/windows/wsl/).

( Linux and MacOS users, you can skip this step. Your laptop already have aterminal installed. )

Connection to the summer school cloud cluster is accessible by ssh. Open a terminal and enter (remplace XX by your user number):

```
ssh userXX@quantum-cloud.ccs.usherbrooke.ca
```

Then type your password. Note the character won't appear but they are still entered.

It's important to do not launch job directly on the login node, please follow the "sumitting interactive job" section.


## Connection with JupyterHub

You can also reach the cluster with a web browser at the following link:
[https://c2.quantum.ccs.usherbrooke.ca](https://c2.quantum.ccs.usherbrooke.ca/)


## Interacting with the cluster

Interacting with the cluster requires knowlodge of the Unix command line.
Basic Unix commands to navigate through files and directories are:

| Command      | Description   | 
|--------------|-----------|
| `pwd` | Return the absolute path to the current location |
| `ls` | List files and directories in the current location  | 
| `cd <dir>` | (Change Directory) Enter the directory named `<dir>` |
| `mv <orig> <dest>` | (MoVe) Move or rename the given `<orig>` file or directory to `<dest>` |
| `cp <orig> <dest>` | (CoPy) Copy the given `<orig>` file to a new file `<dest>` |
| `cp -r <orig> <dest>` | Copy the given `<orig>` directory to a new directory `<dest>` |
| `rm <file>` | (ReMove) Delete the file `<file>` (Warning: deleted permanently!) |
| `rm -r <dir>` | (ReMove) Delete the directory `<dir>` (Warning: deleted permanently!) |

Note `./` is a shortcut for the current directory, for example `cd ./` do nothing, or `cp folder1/foo.txt ./` copy the file `foo.txt` located in the the directory `folder1` to the current directory.
Also `..` indicates one directory back. For example `cd ..` means go back to the previous directory.
Finally, `~` indicates your home directory, i.e. the directory you lie on when you connect to the clsuter.

Command to read and edit ASCII files:

| Command      | Description   | 
|--------------|-----------|
| `cat <file>` | Print the contents of file `<file>` to the terminal |
| `head -n X <file>` | Print the X first line of the file `<file>` to the terminal |
| `tail -n X <file>` | Print the X last line of the file `<file>` to the terminal |
| `less <file>` | Print the contents of file `<file>` with scroling (press `q` to quit) |
| `touch <file>` | Create an empty file named `<file>` |
| `nano <file>` | Enter a interactive editor for file `<file>`. Press `Crtl+O` to save and `Ctrl+X` to quit |

For a deeper introduction to the command line interface, please see this online tutorial at [https://swcarpentry.github.io/shell-novice/](https://swcarpentry.github.io/shell-novice/)

### Loading software on your session


Each new session opened on the cluster comes by default with no softwares installed that must be loaded manually as modules.
To search in the software stack if a particular software in the right version is available:

```
module spider <software_name>
```

Then, more details on how to load the specific version can be found with:
```
module spider <software_name>/<version>
```

Sometimes, multiple modules need to be loaded before the software itself (for example the dependencies of the software).

For example (and a complex one), to see if the software TRIQS is installed, which version is available and finally loading it:

```
module spider triqs #say TRIQS available in version 3.1.0
module spider triqs/3.1.0 #say we need StdEnv/2020 gcc/10.3.0 openmpi/4.1.1
module load StdEnv/2020 gcc/10.3.0 openmpi/4.1.1 #load dependencies
module load triqs #finally
```

## Access summer school material

Shared directories containing summer school material are available in the `/project/` directory. 
To access this directory:

```
cd /project
```

In `/project` there is one directory per software used in the summer school and one per lesson (example `/project/nessi` and `/project/DMFT`).
These directories can be accessed from the internet at `quantum-cloud.ccs.usherbrooke.ca/<directory_name>` (example: `quantum-cloud.ccs.usherbrooke.ca/nessi`)


## Submitting jobs to the compute cluster

Once log into the compute cluster, you have access to the login which is design to manipulate file and prepare your compute tasks (called jobs).
NO jobs should be run on this login node (or they would be aborded and you will be warn).
Instead, jobs should be submitted to a scheduler, which will redirect them to mor epowerful dedicated compute nodes.
Jobs are submitted to the scheduler using a batch script which containts the ressources being asked (CPUs, RAM, time), followed by the sequence of command to run the compute tasks.
See examples below, or the Alliance for Digital Research [website](https://docs.computecanada.ca/wiki/Running_jobs) for further examples.

Please remember that the ressources you have access is shared ressources. 
Ressources hardware configuration is:
* 20x nodes, 240GB RAM, 32cores (Xeon Cascadelake 2.2GHz)
* 10TB disk NFS shared

Therefore, each user have access to approximately 8 CPU cores and 60 GB of RAM.
Please consult the lecturer if you want to request more.


### Sumitting interactive job

During the hands-on, most of the time, you will use a "interactive job" with the `salloc` command.
For example, this command will reserve 4 CPUs and 12GB of RAM for 3 hours on a "compute node":

````
salloc --time=3:0:0 --cpus-per-task=4 --mem=12G
````

Note that, when you submit, the hostname will change from **login1** to **nodeXX**, for example:

````
[user001@node18 ~]$
````

When you `exit` the shell, the job will terminate and you will go back to the "login" node.


### Submitting a simple job with one CPU core

1. Create a empty file to write your scheduler (SLURM) script and start editting it using `nano`:
```
touch ./my_job.sh
nano ./my_job.sh
```

2. Edit your SLURM script. For example, to submit a ABINIT compute task defined in `abinit_simulation.abi` :

```
#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --mem=4GB           #memory requested

module load abinit/9.6.2 wannier90/3.0.1      #load software 
abinit ./abinit_simulation.abi    #path to my_job.abi relative to where the ABINIT input is
                                  #here: same directory
```
Note the scheduler give access to the compute node to your file in your home directory, so users don't have to care about data transfert.

3. Submit of the previous SLURM script to the scheduler:
```
sbatch my_job.sh
```
Right after submittion, the scheduler gives you a job identifier for you to task down the results of the simulation.

4. After calculation, terminal output (some results for example) are redirected in the file `slurm-<job_id>.out`. 
You can read it with for example `less slurm-<job_id>.out`.


### Submitting a threaded (OpenMP) job

A threaded (OpenMP) job is a job where each processor (or thread) shared the same memory. 
For example, during a matrix-vector product, each processor can be assign a part of the whole matrix to proceed.
Threaded jobs are however limited to one compute node.
The user have the responsibility to ask the requested number of threads to the scheduler.
Submission is similar to a simple CPU job, but the SLURM script (step 2) now reads:
```
#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --cpus-per-task=8   #number of CPU requested
#SBATCH --mem=4GB           #memory requested and shared by all CPU cores

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   #MANDATORY

module load abinit/9.6.2 wannier90/3.0.1      #load software 
abinit ./abinit_simulation.abi           #path to my_job.abi relative to where the script is
```

### Submitting a distributed (MPI) job on multiple nodes

A distributed (MPI) job is different from a threaded job in the sense that each processor (or process) own a part of the memory for itself (it is no more shared).
For example, during a matrix-vector product, the matrix is first decomposed in several parts, then each processor proceed its part, and finally the matrix is reassembled.
Therefore, the job (the matrix for example) can be distributed on several compute nodes.
Submission is similar to the two previous jobs, but the SLURM script (step 2) now reads:
```
#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --ntasls=8   #number of CPU requested
#SBATCH --mem-per-cpu=1024M #memory requested per CPU cores

module load abinit/9.6.2 wannier90/3.0.1      #load software 
srun abinit ./abinit_simulation.abi           #path to my_job.abi relative to where the script is
```

### Interact with running jobs

* `squeue -u userXX` gives a list of all the jobs submitted to the scheduler and their state
* `scancel <jobid>` cancels a job
* `seff <jobid>` returns statistics about the job (percentage memory and cpu used, efficiency, total runtime).


### Getting your results

For each job submitted to the scheduler, a file `slurm-<job_id>.out` is created in which the output of the terminal is redirected.
If new file are created during the job, they will be located according to the program default behavior (i.e. if a program creates and writes results in a new directory called `results`, this behavior is not changed by the scheduler).