About OMP and MPI parallelisation

Kanyu · August 11, 2021, 7:26pm

Hi, I started using SIESTA 4 months ago and I am still learning. Reading the manual there isn’t a lot of documentation on SIESTA performance with parallelisation routines. The manual suggests some tips and tricks to increase parallel performance, like using Metis or parallelisation over k points. It also says that for OMP+MPI it’s tricky to get a good performance.
My question here is why and how? Why should I use MPI over OMP or hybrid and in what situations?
How can I know what to do to get a good performance or what parallelisation should I use ?
I don’t have a GPU but in the future I might so in what cases should I use Elsa GPU ?

pouillon · August 12, 2021, 8:55am

Hi,

Performance is a very broad topic and you will need some time and efforts to get at ease with it. It is not only a question of selecting a parallelization strategy for your calculations but also of knowing what the program you want to execute has been optimized for.

MPI is usually a good start because it provides a fair speed-up in many situations and because programs like SIESTA have usually been parallelized for MPI first. This kind of parallelization is not too sensitive to the kind of system you have at hand.

OpenMP can be of interest in small computers as laptops to make tests and on clusters where the individual nodes have a large amount of RAM. Getting a high speed-up is quite sensitive to the kind of system you have, which means that you have to make tests with different numbers of threads (search for OMP_NUM_THREADS) before going to production.

GPU can give you very high speed-ups if your system is well-suited for GPU calculations but may degrade performance significantly if this is not the case. In brief, this is both system-sensitive and hardware-sensitive. What works for a specific GPU configuration might totally fail for another.

If, in addition, you want to combine these different approaches, you’ll have to benchmark a set of combinations first and decide what to do for production calculations. In such a case, in particular if you include GPU, the best is to ask for help to experts from your supercomputing center.

Kanyu · August 12, 2021, 1:09pm

Oh, I see. Have you got some advice for doing such tests?
2 . How has been your experience?
What type of parallelism do you prefer an when?
What can you tell me about MPI+OMP?
What about solution methods? Some of them imply a specific type of system, for example SIEST’s order N solver. Are these methods parallel sensitive?

pouillon · August 16, 2021, 5:05pm

My usual approach is to start with MPI only and only switch to MPI+OpenMP if the calculation is too slow with MPI. For GPU, I cannot tell you much, it’s been a long time I’ve not done such calculations.

It’s also difficult to extrapolate. In my case, MPI is enough for 85% of the calculations, but this is system-dependent, property-dependent, and above all supercomputer-dependent. Discovering what does work for you will be empirical and cannot be systematic. The general rule of thumb is: look for simplicity.

Should you get stuck at any time, do not hesitate to reach for help to a HPC expert. Most HPC centers offer free tutorials, as well as specific support for your needs.