Baltasar¶
Architecture¶
Baltasar Sete Sóis is accessible through the internet and will hop between three entry nodes, that give access to the cluster system.
Below is a summary of the specs for the computational nodes available.
Node name | CPUs | RAM (MB) |
---|---|---|
compute-1 | 48 | 256000 |
compute-2 | 48 | 256000 |
compute-3 | 48 | 256000 |
compute-4 | 48 | 248000 |
compute-5 | 48 | 120000 |
compute-6 | 48 | 128000 |
compute-7 | 48 | 128000 |
compute-8 | 48 | 128000 |
compute-9 | 48 | 128000 |
compute-10 | 48 | 128000 |
compute-11 | 96 | 256000 |
compute-12 | 96 | 256000 |
Totals:
- 672 CPUs
- 2610 GB RAM
- 90 TB Storage
Access Baltasar¶
After your access request has been authorized, you should be able to connect to Baltasar via ssh using your username and ssh key as follows:
username@localmachine:~$ ssh username@baltasar.tecnico.ulisboa.pt
or, if you want X11 forwarding for graphical applications
username@localmachine:~$ ssh -X username@baltasar.tecnico.ulisboa.pt
This should get you connected to one of the entry nodes. To run processes in the computational nodes, Baltasar uses a batch queuing system named Slurm.
Never run your jobs directly in the entry nodes, any jobs caught will be killed.
Slurm Workload Manager¶
There is a small learning curve on how to use a cluster like Baltasar since programs now run as jobs in a shared Slurm queue.
Whereas in non queued servers you run your programs as you would on your regular computer. In a Slurm cluster you need to request resources, either interactively or by creating and submitting a Batch script.
A Batch Script consists on the definition of the amount of resources that you require to run your program. You can define the amount of nodes, cpus, memory, running hours limit, among others, in a structured comments section before scripting the steps to run your code.
Below follows a sample Batch script to run “program” that sits on “/home/username/program” with “parameters”.
Sample Batch Script¶
To use Slurm you should have a Batch script like the one below.
#!/bin/bash
#SBATCH --job-name=my-job-name
#SBATCH --output=/home/username/my-job-name-%j.out
#SBATCH --error=/home/username/my-job-name-%j.err
#SBATCH --mail-user=mail@example.com
#SBATCH --mail-type=ALL
#SBATCH --time=72:00:00
#SBATCH --mem=4G
RUNPATH=/home/username/
cd $RUNPATH
./program parameters
Batch Script Explained¶
--job-name=<name> | |
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just “sbatch” if the script is read on sbatch’s standard input. | |
--output=<filename pattern> | |
Instruct Slurm to connect the batch script’s standard output directly to the file name specified in the “filename pattern”. By default both standard output and standard error are directed to the same file. For job arrays, the default file name is “slurm-%A_%a.out”, “%A” is replaced by the job ID and “%a” with the array index. For other jobs, the default file name is “slurm-%j.out”, where the “%j” is replaced by the job ID. See Filename Specifications for filename specification options. | |
--error=<filename pattern> | |
Instruct Slurm to connect the batch script’s standard error directly to the file name specified in the “filename pattern”. See –output . | |
--nodes=<minnodes[-maxnodes]> | |
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count. | |
--ntasks=<number> | |
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the –cpus-per-task option will change this default. | |
--cpus-per-task=<ncpus> | |
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task. | |
--mail-user=<mail@example.com> | |
Specifies the email address to which the messages are sent. | |
--mail-type=<type> | |
Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send emails for each array task). Multiple type values may be specified in a comma separated list. The user to be notified is indicated with –mail-user. Unless the ARRAY_TASKS option is specified, mail notifications on job BEGIN, END and FAIL apply to a job array as a whole rather than generating individual email messages for each task in the job array. | |
--time=<time-format> | |
Specifies the amount of time (in minutes) that your program will run before being automatically killed. Acceptable time formats include “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”. | |
--mem=<memory-unit> | |
Specify the real memory required per node in megabytes. Different units can be specified using the suffix [KMGT]. |
Filename Specifications¶
The filename pattern may contain one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j).
Supported replacement symbols are:
- \\
- Do not process any of the replacement symbols
- %%
- The character “%”
- %A
- Job array’s master job allocation number
- %a
- Job array ID (index) number
- %j
- Job allocation number
- %N
- Node name. Only one file is created, so %N will be replaced by the name of the first node in the job, which is the one that runs the script
- %u
- User name
Batch Script Environment Variables¶
Variable | Description |
---|---|
SLURM_JOB_ID | Job ID number given to this job. |
SLURM_JOB_NAME | Name of this job. |
SLURM_JOB_NODELIST | List of nodes allocated to the job. |
SLURM_CPUS_PER_TASK | Number of cores requested (per node) |
Slurm Usage Examples¶
Job Submission via batch script¶
username@baltasar-1:~$ sbatch my_job.sbatch
Submitted batch job 402
username@baltasar-1:~$
See the queue status¶
username@baltasar-1:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
370 base job9 username R 1-02:19:36 1 compute-7
401 base job123 sysadm R 14:07 1 compute-6
402 base my_job username R 00:07 1 compute-7
username@baltasar-1:~$
Check nodes status and information (CLI)¶
username@baltasar-1:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
base* up 3-00:00:00 1 mix compute-7
base* up 3-00:00:00 1 alloc compute-6
base* up 3-00:00:00 10 idle compute-[1-5,8-12]
username@baltasar-1:~$
Detailed Node status (GUI)¶
(Requires X Forwarding within SSH session)
username@baltasar-1:~$ sview &
Detailed Job status¶
username@baltasar-1:~$ scontrol show job 401
JobId=401 JobName=job123
UserId=sysadm(1111) GroupId=sysadm(1111)
Priority=4294901406 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:17:58 TimeLimit=3-00:00:00 TimeMin=N/A
SubmitTime=2018-11-28T16:23:39 EligibleTime=2018-11-28T16:23:39
StartTime=2018-11-28T16:23:39 EndTime=2018-12-01T16:23:39
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=base AllocNode:Sid=baltasar-1:24264
ReqNodeList=(null) ExcNodeList=(null)
NodeList=compute-6
BatchHost=compute-6
NumNodes=1 NumCPUs=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=48,mem=126976,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=124G MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=<path to executable declared within the batch script>
WorkDir=<path to working directory>
StdErr=<path to stderr redirect file>
StdIn=/dev/null
StdOut=<path to stdout redirect file>
Power= SICP=0
username@baltasar-1:~$
Cancel a Job¶
username@baltasar-1:~$ scancel 402
username@baltasar-1:~$
Re-queue a Job (job needs to be running)¶
username@baltasar-1:~$ scontrol requeue JOB_ID
username@baltasar-1:~$
or
username@baltasar-1:~$ scontrol requeue JOB_ID1,JOB_ID2,...,JOB_IDN
username@baltasar-1:~$
Interactive run of a bash shell¶
username@baltasar-1:~$ salloc
salloc: Granted job allocation 403
salloc: Waiting for resource configuration
salloc: Nodes compute-5 are ready for job
username@baltasar-1:~$ srun --pty /bin/bash
username@compute-5:~$ exit
username@baltasar-1:~$ exit
salloc: Relinquishing job allocation 403
username@baltasar-1:~$
Interactive run of a GUI program¶
(Requires X Forwarding within SSH session)
username@baltasar-1:~$ salloc
username@baltasar-1:~$ module load Mathematica
username@baltasar-1:~$ srun --pty --x11=first Mathematica
All other Slurm commands should work as in other computational clusters. Check the user documentation for details.
If you have any suggestions/questions please follow the FAQ at Baltasar’s home page.
MPI Computation¶
Baltasar has MPICH installed and configured. You must load the MPICH module whenever MPI is needed.
username@baltasar-1:~$ module load MPICH
username@baltasar-1:~$ mpicc mpi_code.c
or even better, you can compile your code in a computation node. Below is an example batch script:
username@baltasar-1:~$ cat program-v1.1-compile.batch
#!/bin/bash
#SBATCH --job-name=my-code-compile
#SBATCH --output=/home/username/program-v1.1/my-code-compile-%j.out
#SBATCH --error=/home/username/program-v1.1/my-code-compile-%j.err
#SBATCH --cpus-per-task=48
#SBATCH --mail-user=mail@example.com
#SBATCH --mail-type=ALL
#SBATCH --time=72:00:00
#SBATCH --mem=64G
module load MPICH
RUNPATH=/home/username/program-v1.1
cd $RUNPATH
mkdir -p /home/username/program-v1.1/bin
./configure --prefix=/home/username/program-v1.1/bin && make -j 48 && make install
username@baltasar-1:~$ module list
No modules loaded
username@baltasar-1:~$ sbatch program-v1.1-compile.batch
When running MPI enabled programs, you must have the MPICH module loaded beforehand or, like in the last script, explicitly declared. Here is an example:
username@baltasar-1:~$ cat program-v1.1-run.batch
#!/bin/bash
#SBATCH --ntasks=128
#SBATCH --time=00:01:00
#SBATCH --mem=1G
#SBATCH --job-name=my-code-run
#SBATCH --mail-user=mail@example.com
#SBATCH --workdir=/home/username/program-v1.1
#SBATCH --output=/home/username/program-v1.1/my-code-run-%j.out
#SBATCH --error=/home/username/program-v1.1/my-code-run-%j.err
srun bin/program-v1.1
username@baltasar-1:~$ module list
No modules loaded
username@baltasar-1:~$ module load MPICH
username@baltasar-1:~$ sbatch program-v1.1-run.batch
NOTE the absence of the option –cpus-per-task and the new –ntasks when running MPI jobs, each of the requested tasks will be assigned to a cpu as resources are available, using –cpus-per-task in MPI jobs can lead to unexpected results.
Compiler Optimization¶
Baltasar entry/storage nodes have different architectures from the computational nodes. If you were to use “native” detection of the architecture during compile time, code would be optimized to the wrong architecture and therefore your code would run slower in the computational nodes, and even have CPU instructions that are not supported by the compute nodes architecture, i.e: Illegal Instruction errors
Do not use -march=native or -mtune=native while compiling as it will make your program run slower or not run at all.
This happens because the Entry/Storage Nodes are Intel Xeons whereas the Computational Nodes are AMD, of different flavours.
Computational nodes can be thought of in three sets:
- Compute 1-5:
- AMD Opteron(tm) Processor 6180 SE
- Compute 6-10:
- AMD Opteron(tm) Processor 6344
- Compute 11-12:
- AMD EPYC Processor 7401
You should use the following compiler flags for code that will be run on all or specific computational nodes.
- all nodes (generic)
- -O3 -march=generic -mmmx -msse -msse2 -msse4a -mabm -mpopcnt
- Compute 1-5
- -O3 -march=amdfam10
- Compute 6-10
- -O3 -march=bdver2
- Compute 11-12
- -O3 -march=znver1
Unfortunately you cannot specify in which nodes you want to run, as if you do pass those through the slurm option –nodelist you would have to wait until all of the specified nodes be available, which is probably what you do not want if you are running a job in a single node.
Instead you can specify which nodes you do not want to run at, via the –except option. For example, if you want to run a program compiled specifically for the bdver2 architecture, you could use the following batch script:
username@baltasar-1:~$ cat program-v1.1-run-bdver2.batch
#!/bin/bash
#SBATCH --ntasks=128
#SBATCH --time=00:01:00
#SBATCH --except=compute-1,compute-2,compute-3,compute-4,compute-5,compute-11,compute-12
#SBATCH --mem=1G
#SBATCH --job-name=my-code-run
#SBATCH --mail-user=mail@example.com
#SBATCH --workdir=/home/username/program-v1.1
#SBATCH --output=/home/username/program-v1.1/my-code-run-%j.out
#SBATCH --error=/home/username/program-v1.1/my-code-run-%j.err
bin/program-v1.1
username@baltasar-1:~$
This way when the resources are being allocated the nodes listed in the –except list are not considered and what is left are the correct nodes we want to target.
Modules¶
By now you should have noticed the module command. This command is available in all BASH shells at be start of each session, or by reloading the profile configuration located at /etc/profile.d/lmod.sh
(For other shells or environments, take a look at the folder */home/share/lmod/lmod/init/** to find the corresponding init file)*
The use of modules allows us to have sets of compatible software, compiled and linked with each other with different versions and/or capabilities.
Listing loaded modules¶
username@baltasar-1:~$ module list
Listing available modules¶
username@baltasar-1:~$ module av
Loading a specific module/toolchain¶
username@baltasar-1:~$ module load <Module Name>
Unloading a specific module/toolchain¶
username@baltasar-1:~$ module unload <Module Name>
Unload all loaded modules¶
username@baltasar-1:~$ module purge
Default loaded modules¶
In the beginning of each bash session, a toolchain named setesois is loaded for you, it loads the following libraries/tools:
- GCCcore/7.3.0
- binutils/2.30-GCCcore-7.3.0
- GCC/7.3.0-2.30
- MPICH/3.2.1-GCC-7.3.0-2.30-large
- BLIS/0.4.1-GCC-7.3.0-2.30-generic
- FLAME/5.0.0-GCC-7.3.0-2.30-generic
- FFTW/3.3.8-GCC-7.3.0-2.30-generic
- HDF5/1.8.20-GCC-7.3.0-2.30-generic
- amdlibm/3.2.2
- setesois/2018.10-generic
the generic version means the libraries/tools were compiled to run on all computational nodes. If you wish you can load a specific version of the toolchain that will run on the architectures described in Compiler Optimization. The available versions are:
- setesois/2018.10-amdfam10
- setesois/2018.10-bdver2
- setesois/2018.10-znver1
If, however, you do not want any of these and want to manage your own set, you can execute module purge and load them one by one.
AMD Core Math Library¶
You may be familiar with ACML, which implements optimized BLAS and LAPACK routines for AMD CPUs. ACML was discontinued from its private development and has been split in three separate, open-source, libraries:
- AMDLibM
- BLIS
- LibFlame
AMDLibM¶
AMD LibM is a software library containing a collection of basic math functions optimized for x86-64 processor based machines. It provides many routines from the list of standard C99 math functions. AMD LibM is a C library, which users can link into their applications to replace compiler-provided math functions. Generally, programmers access basic math functions through their compiler, but those who want better accuracy or performance than their compiler’s math functions can use this library to help improve their applications.
This library can be linked against, after loading the appropriate module via module load amdlibm, with the compiler options:
-I$EBROOTAMDLIBM/include -L$EBROOTAMDLIBM/lib/dynamic -lamdlibm -lm
for dynamic version of the library, or
-I$EBROOTAMDLIBM/include -L$EBROOTAMDLIBM/lib/static -lamdlibm -lm
for the static version of the library.
For more information see the official website and the example files located in the cluster at $EBROOTAMDLIBM/examples.
BLIS¶
BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, enable optimized implementations of most of its commonly used and computationally intensive operations. Select kernels have been optimized for the AMD EPYCTM processor family. The optimizations are done for single and double precision routines.
If you are using -lblas in your code compiling, be it script or Makefile, you should switch that with:
-L$EBROOTBLIS/lib -lblis
and you should now have BLIS functions available, as well as its BLAS compatibility layer.
LibFlame¶
libFLAME is a portable library for dense matrix computations, providing much of the functionality present in LAPACK. It includes a compatibility layer, FLAPACK, which includes complete LAPACK implementation. The library provides scientific and numerical computing communities with a modern, high-performance dense linear algebra library that is extensible, easy to use, and available under an open source license. libFLAME is a C-only implementation and does not depend on any external FORTRAN libraries including LAPACK. There is a backward compatibility layer, lapack2flame, that maps LAPACK routine invocations to their corresponding native C implementations in libFLAME. This allows legacy applications to start taking advantage of libFLAME with virtually no changes to their source code.
In combination with the AMD optimized BLIS library, libFLAME enables running high performing LAPACK functionalities on AMD platforms. The performance of libFLAME on AMD platforms can be improved by just linking with the AMD optimized BLIS.
If you are using -llapack in your code compiling, be it script or Makefile, you should switch that with:
-L$EBROOTFLAME/lib -lflame
and you should now have LAPACK functions available.
Support Request¶
Feel free to contact us if you run into trouble while using Baltasar. We’ll be glad to help.
Note that if you are contacting on behalf of someone, please add in the e-mail message who that person is and a contact for that person.
E-mail Structure¶
This is an e-mail request form, you should copy this template into your e-mail as is in order to have feedback as quickly as possible. Thank you, the IT crowd.
Subject¶
Baltasar Help
Include the folowing information¶
- Username
- Any relevant code/script folder location
- Any relevant log files/error messages
You can add extra information you may find relevant concerning your request.
As an alternative to manually copy this template, you can click the Baltasar Help Template to use your local e-mail client mailto: capability.