Instruction for users

Connecting to the cloud platform

Open https://quantum2026.ccs.usherbrooke.ca/ in your browser. Type in your username and password, then select Sign in. On the next page, choose the resources needed for the intended calculations. If the lecturer did not suggest any changes, keep the default values. In User interface, select JupyterLab. Finally, select Start.

It might take up to a few minutes for the cloud platform to allocate resources for your session. Once this is done, the JupyterLab interface will be displayed. Use the buttons in the Launcher tab to start a Python Notebook, Terminal, or other application, as instructed by the lecturer.

To increase editor or terminal font size or otherwise customise the interface, use the Settings menu at the top.

To log out of JupyterLab and close your session, select File > Log Out from the top menu. Please do not leave an idle session open, otherwise the cloud platform cannot reallocate the resources to other sessions and researchers.

Acknowledgments

Cloud computing resources are provided by Calcul Québec and the Digital Research Alliance of Canada on beluga-cloud, and powered by Magic Castle, a virtual cluster management software.

Using the Linux terminal

The cloud platform uses the Linux operating system. If you are not familiar yet with the Linux/Unix command line, we recommend to have a look at the introductory lesson The Unix Shell, by the Software Carpentry.

Calcul Québec offers this printable (PDF) “cheat sheet” of basic Linux commands. FOSSWire also offers a compact printable (PDF) Unix/Linux Command Reference.

Software modules

When you start your session, your environment contains only minimal software. In a terminal, you can manage software with the module command. Use module list to display the currently loaded software modules.

Use e.g. module load python to load the default version of a given module, here Python. Use module load python/3.11.5 to load a specific version.

To search for modules, use module spider. For instance, module spider abinit will list all available ABINIT versions, along with a short description of the software. Use e.g. module spider abinit/10.4.7 to display information about a specific version, including how to load the module. In this case, this would be:

module load StdEnv/2023 intel/2023.2.1 openmpi/4.1.5
module load abinit/10.4.7

Some modules can be loaded directly, such as python/3.11.5 in the above example. Some depend on other modules that must be loaded first, as is the case for abinit/10.4.7.

Use module purge to go back to the default modules.

Modules loaded in a terminal are only available in that terminal, not in other terminals or applications (e.g. Python notebooks).

You can also manage modules from the JupyterLab interface, by using the Software Modules tab on the left. Look for the hexagonal bolt-shaped icon at the bottom of the tabbar, directly underneath the puzzle-shaped icon. Modules loaded in the JupyterLab interface will be available in all newly started applications, including Python notebooks and terminals.

Summer School material

Each user has a personal home directory, where calculations should be done. For example:

/home/user99

Shared material is available in /project. Lecturers will place their material in a subdirectory, for instance:

/project/soft/abinit

Documentation (such as this page) is available in /project/doc, which you can access from your browser at https://doc.quantum2026.ccs.usherbrooke.ca/.

Old instructions

Connection with the ssh client

Windows users, you should install a SSH client to assez the summer school compute cluster, such as MobaXterm or WSL.

( Linux and MacOS users, you can skip this step. Your laptop already have aterminal installed. )

Connection to the summer school cloud cluster is accessible by ssh. Open a terminal and enter (remplace XX by your user number):

ssh userXX@quantum2024.ccs.usherbrooke.ca

Then type your password. Note the character won’t appear but they are still entered.

It’s important to do not launch job directly on the login node, please follow the “sumitting interactive job” section.

Submitting jobs to the compute cluster

Once log into the compute cluster, you have access to the login which is design to manipulate file and prepare your compute tasks (called jobs). NO jobs should be run on this login node (or they would be aborded and you will be warn). Instead, jobs should be submitted to a scheduler, which will redirect them to mor epowerful dedicated compute nodes. Jobs are submitted to the scheduler using a batch script which containts the ressources being asked (CPUs, RAM, time), followed by the sequence of command to run the compute tasks. See examples below, or the Alliance for Digital Research website for further examples.

Please remember that the ressources you have access is shared ressources. Ressources hardware configuration is: * 16x nodes, 240GB RAM, 32cores (Xeon Cascadelake 2.2GHz) * 10TB disk NFS shared

Therefore, each user have access to approximately 8 CPU cores and 60 GB of RAM max. Please consult the lecturer if you want to request more

Sumitting interactive job

During the hands-on, most of the time, you will use a “interactive job” with the salloc command. For example, this command will reserve 4 CPUs and 30GB of RAM for 3 hours on a “compute node”:

salloc --time=3:0:0 --cpus-per-task=4 --mem=30G

Note that, when you submit, the hostname will change from login1 to nodeXX, for example:

[user001@node18 ~]$

When you exit the shell, the job will terminate and you will go back to the “login” node.

Submitting a simple job with one CPU core

Create a empty file to write your scheduler (SLURM) script and start editting it using nano:

touch ./my_job.sh
nano ./my_job.sh

Edit your SLURM script. For example, to submit a ABINIT compute task defined in abinit_simulation.abi :

#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --mem=4GB           #memory requested

module load abinit/9.6.2 wannier90/3.0.1      #load software 
abinit ./abinit_simulation.abi    #path to my_job.abi relative to where the ABINIT input is
                                  #here: same directory

Note the scheduler give access to the compute node to your file in your home directory, so users don’t have to care about data transfert.

Submit of the previous SLURM script to the scheduler:

sbatch my_job.sh

Right after submittion, the scheduler gives you a job identifier for you to task down the results of the simulation.

After calculation, terminal output (some results for example) are redirected in the file slurm-<job_id>.out. You can read it with for example less slurm-<job_id>.out.

Submitting a threaded (OpenMP) job

A threaded (OpenMP) job is a job where each processor (or thread) shared the same memory. For example, during a matrix-vector product, each processor can be assign a part of the whole matrix to proceed. Threaded jobs are however limited to one compute node. The user have the responsibility to ask the requested number of threads to the scheduler. Submission is similar to a simple CPU job, but the SLURM script (step 2) now reads:

#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --cpus-per-task=8   #number of CPU requested
#SBATCH --mem=4GB           #memory requested and shared by all CPU cores

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   #MANDATORY

module load abinit/9.6.2 wannier90/3.0.1      #load software 
abinit ./abinit_simulation.abi           #path to my_job.abi relative to where the script is

Submitting a distributed (MPI) job on multiple nodes

A distributed (MPI) job is different from a threaded job in the sense that each processor (or process) own a part of the memory for itself (it is no more shared). For example, during a matrix-vector product, the matrix is first decomposed in several parts, then each processor proceed its part, and finally the matrix is reassembled. Therefore, the job (the matrix for example) can be distributed on several compute nodes. Submission is similar to the two previous jobs, but the SLURM script (step 2) now reads:

#!/bin/bash
#SBATCH --time=01:00:00     #time: hours:minutes:seconds
#SBATCH --ntasls=8   #number of CPU requested
#SBATCH --mem-per-cpu=1024M #memory requested per CPU cores

module load abinit/9.6.2 wannier90/3.0.1      #load software 
srun abinit ./abinit_simulation.abi           #path to my_job.abi relative to where the script is

Interact with running jobs

squeue -u userXX gives a list of all the jobs submitted to the scheduler and their state
scancel <jobid> cancels a job
seff <jobid> returns statistics about the job (percentage memory and cpu used, efficiency, total runtime).

Getting your results

For each job submitted to the scheduler, a file slurm-<job_id>.out is created in which the output of the terminal is redirected. If new file are created during the job, they will be located according to the program default behavior (i.e. if a program creates and writes results in a new directory called results, this behavior is not changed by the scheduler).