|
|
MALTA Cluster
|
|
|
MALTA Overview
|
|
The IBM Blade Center based MALTA cluster is a 72-node cluster of which 71 are compute nodes. The las node is both, the master node that controls the cluster and the front end where the users log on to.
Each node holds 2 Intel Xeon four or six-core processors,
for a total of 720 cores across the whole machine, having, each core, a clock rate of at least 2.0GHz and 2Gbytes of
RAM memory.
User home directories as well as the /opt directory are located in a 6TB storage unit provided by two ethernet attached RAID arrays using Network File System (NFS)
over a Gbit network.
Each user can store files in either the ?private? home directory or the /opt/groupname
directory that shares with all the members of the group the user belongs to.
All the nodes in MALTA have a local storage capacity (not accesible from any other node) that ranges from 150GB t0 600GB. Files may only reside in
local space for the life of a job. When a job exits, any files remaining in it will be purged.
Permanent files should be moved to the user home directory.
An environment variable, "$SCRATCH", is defined at the beginning of each job,
pointing to the areas in scratch space on each node that are allocated to the job.
|
|
|
Login to MALTA
|
|
The login process is extremely easy if you?re running Linux/UNIX or Mac OS X on your personal
computer, and Windows doesn?t make it too much harder.
To connect to the MALTA Computing Centre (MCC) you must use the SSH (secure shell)
protocol that offers both high speed and escellent security. If you want to use windowing remotely via
SSH, you must enable some kind of X11 tunneling and have an X server running locally.
|
|
|
Desktop Virtualization
|
|
In Linux, everything can be done from a shell. However, if you don?t feel comfy with the shell
or just feel lazy, the cluster can be used as an VNC server.
VNC stands for Virtual Network Computing and allows you to see the desktop of a remote machine
and control it with your local mouse and keyboard, just like you would do it sitting in the
front of that computer.
To make the most of the VNC server, a VNC client must be running locally on your machine such as
TightVNC, vinagre, etc.
Once you?re logged in to MCC you have to start the server (i.e export your desktop) by using:
[user@malta ~]$ vncserver .
New ?malta:1 (user)? desktop is malta:1
Starting applications specified in /home/user/.vnc/xstartup
Log file is /home/mateo/.vnc/malta:1.log
Note that the first time you invoke the vncserver command, you will be asked for a password
that can be changed at any time by using vncpasswd.
Once the server is up and running you can start a new local session on your local machine following
the instructions of your vnc client.
To stop the virtualization, just type:
[user@malta ~]$ vncserver ?kill
Killing Xvnc process ID xxxx
More info: man vncserver
|
|
|
Operating System
|
|
Like any other operating system (OS), a cluster operating system must provide a user-friendly interface
between the user, the applications and the cluster software.
The operating system that is run on MALTA is Red Hat Enterprise 5.2 (Rhel).
Rhel is a commercial Linux distribution, therefore it is very similar to any Unix or Unix-like operating system.
If you are not familiar with this kind of OS we encourage you to look on the net for some basic information and/or
visit the links below:
1. Red Hat manuals
2. The Linux Documentation Project
|
|
|
Software |
|
The login process is extremely easy if you?re running Linux/UNIX or Mac OS X on your personal
computer, and Windows doesn?t make it too much harder.
To connect to the MALTA Computing Centre (MCC) you must use the SSH (secure shell)
protocol that offers both high speed and escellent security. If you want to use windowing remotely via
SSH, you must enable some kind of X11 tunneling and have an X server running locally.
|
Compilers |
|
• gcc 4.1.2 (gfortran, g77, gcc, g++) |
default location |
link |
• Intel (ifort, icc) v11.081 |
/opt/intel/Compiler |
link |
? Python-1.4.3-24 |
default location |
link |
|
Math Libraries |
• BLACS |
/opt/blacs |
link |
• CBLAS (BLAS Library for C) |
/opt/cblas |
link |
• FBLAS (BLAS Library for FORTRAN) |
/opt/fblas |
link |
• FFTW |
/opt/fftw/3.2.1 |
link |
• Intel mkl |
/opt/intel/mkl/10.1.1.019 |
link |
• LAPACK |
/opt/lapack/3.1.1 |
link |
|
? ScaLAPACK |
/opt/scalapack/1.8.0 |
link |
|
Parallel Libraries |
|
? MPICH |
/opt/mpich/1.2.7p1 |
link |
|
? MPICH2 |
/opt/mpich2/1.0.8 |
link |
|
|
/opt/mpich2/loadleveler |
link |
|
? OpenMPI |
/opt/openmpi/1.3.1 |
link |
|
Programs |
• abinit |
/opt/abinit/ |
link |
• critic2 |
/opt/critic2 |
link |
• elk |
/opt/elk |
link |
• quatum espresso |
/opt/espresso/ |
link |
• gamess |
/opt/gamess |
link |
• gibbs2 |
/opt/gibbs2 |
link |
• gnuplot |
default location |
link |
• gromacs |
/opt/gromacs |
link |
• gulp |
/opt/gulp |
link |
• octave |
default location |
link |
• siesta |
/opt/siesta/ |
link |
• VASP |
/opt/vasp/ |
link |
|
? Wien2K |
/opt/wien2k/ |
link |
|
|
|
How to submit a job: Loadleveler
|
|
LoadLeveler (LL) is a job management system that allows users to run more jobs in less time by matching
the jobs? processing needs with the available resources.
When a job is submitted to LL, a bunch of environmental variables are created. We list below some of the most relevant:
? $SCRATCH = local directory where the job is run.
This directory will be automatically removed once the job is finished
? $LL_WORKDIR = working directory where both the .log a
nd the output file(s) will be place after job completion.
? $LOG = .log file created during execution.
|
Serial Jobs
|
|
There are four serial classes or queues:
1. sexpress: up to 2h cpu time
2. ssmall: up to 24h cpu time
3. smedium: up to 1 week cpu time
4. slarge: up to 3 weeks cpu time
If none of these queues meet your needs, please contact the system administrator
and He kindly will do his best to deal with your request.
Example of serial jobs:
#@ job_name = test1
## job name
#@ class = sexpress | ssmall | smedium | slarge ## queue
#@ initialdir = /home/user/test
## working directory = $LL_WORKDIR
#@ input = myprog.input
## stdin = a.out < myprog.input
#@ output = myprog.output
## stdout= a.out > myprog.out
#@ error = myprog.error
## stdout= a.out > myprog.out
#@ executable = myprogram
## executable file
#@ arguments = arg1 arg2 arg3
## executable arguments
#@ queue
## submit
This type of job must be used only when the executable file doesn?t generate
large temporary or output files.
#@ = keyword
# = comment
If the executable keyword is not used, LL assumes that the script is the executable:
#!/bin/sh
#@ step_name = step_1
#@ initialdir = /home/user/test
#@ job_type = serial
#@ class = ssmall
#@ output = $(job_name).$(Process).out
## if no name is specified, an
#@ error = $(job_name).$(Process).err
## automatic one will be generated
#@ environment = COPY_ALL
## copy the environmental variables
#@ job_cpu_limit = 12:00
## 12h cpu time
#@ wall_clock_limit = 20:00
## 20h total time
#@ queue
# Copy all the necessary files from the initial
# directory in to the scratch one
cp $LL_WORKDIR/data.1 $SCRATCH/
# Everything is written in $SCRATCH
cd $SCRATCH
/home/usuario/myprogram.exe < data.1 > output.1
# Copy the output back in to the initial directory
cp output.1 $LL_WORKDIR/
#@ dependency = (step_1 == 0)
## only if the previous step results
## in a normal termination
#@ input = output.1
#@ output = $(job_name).$(job_step).$(Process).out
#@ error = $(job_name).$(job_step).$(Process).err
#@ queue
# Copy all the necessary files from the initial
# directory in to the scratch one
cp $LL_WORKDIR/output.1 $SCRATCH/
# Everything is written in $SCRATCH
cd $SCRATCH
/home/usuario/myprogram.exe < output.1 > output.2
# Copy the output back in to the initial directory
cp output.2 $LL_WORKDIR/
This job has two dependent process where the second one starts only if the first
job has finished in a proper way. If the keyword ?dependency? is not used both jobs
are processed at the same time.
|
|
Parallel jobs
|
|
There are six parallel queues:
1. p(12)small: up to 24h cpu time ? 1 node ? 8(12) processors
2. p(12)medium: up to 1 week cpu time ? 1 node ? 8(12) processors
3. p(12)large: up to 3 weeks cpu time ? 1 node ? 8(12) processors
These classes are not permanent and can be changed at any time depending on the users needs.
Examples of parallel jobs:
#!/bin/sh
#
#
#@ job_name = sample_mpich
#@ step_name = step1
#@ job_type = mpich
#@ output = test.$(job_name).$(job_step).$(Process).out
#@ error = test.$(job_name).$(job_step).$(Process).err
#@ class = pmedium
#@ environment = COPY_ALL
#@ node = 1
## number of nodes
#@ tasks_per_node = 1,8
## from 1 to 8 processors
#@ queue
# Copy all the necessary files from the initial
# directory in to the scratch one
cp ?rp $LL_WORKDIR/data $SCRATCH/
# Everything is written in $SCRATCH
cd $SCRATCH/data
/opt/mpich2/1.0.8/bin/mpirun ?np $LOADL_TOTAL_TASKS /home/user/prog.exe
# Only useful files are copied back in to the initial directory
rm ?f $SCRATCH/data/file1
rm ?f $SCRATCH/data/file2
rm ?f $SCRATCH/data/file3
cp ?rp $SCRATCH/data $LL_WORKDIR/
For the time being, the parallel queues are configured to use up to eight processors
within the same node. We are working hard to overcome this problem and hopefully will be sorted out soon.
In this example we are asking for one node and a number of processors that ranges from 1 to 8.
This is a good way to minimize the time a job spends in the queue waiting for a free node. The variable
$ LOADL_TOTAL_TASKS specifies the number of nodes that have been allocated to the job. Please
note that this variable is not available when the job type is set to parallel (i.e job_type = parallel).
#@ class = plarge
#@ job_type = parallel
#@ node = 1
#@ tasks_per_node = 6
#@ initialdir = /home/user/myprogs
#@ executable = myopenmpcode
#@ input = inputfile1
#@ output = $(job_name).$(job_step).output
#@ error = $(job_name).$(job_step).error
#@ environment = COPY_ALL; OMP_NUM_THREADS=6
#@ queue
Job_type=parallel, so the number of processors is fixed.
|
|
|
|
Some Loadleveler Commands
|
|
? llsubmit job.cmd |
submits the job to the queue |
|
? llq |
queue status |
|
? llq ?s job.xyz |
provides information on why a job or list of jobs remain in the NotQueued, Idle or Deferred state |
|
? llcancel job.xyz |
cancels one or more jobs from the queue |
|
? llclass |
displays the number of defined classes and usage information |
|
? llstatus |
provides information on the status of all the nodes in the cluster. |
|
|
|
|
|
|