mpi.rst 7.53 KB
Newer Older
1 2 3

.. _howto-mpi:

Matthieu Muffato's avatar
Matthieu Muffato committed
4 5
How to use MPI
==============
6

7
.. note::
8
        With this tutorial, our goal is to give insights on how to set up
9
        eHive to run Jobs using Shared Memory Parallelism (threads) and
10 11
        Distributed Memory Parallelism (MPI).

12 13 14 15

First of all, your institution / compute-farm provider may have
documentation on this topic. Please refer to them for implementation
details (intranet-only links:
16
`EBI <http://www.ebi.ac.uk/systems-srv/public-wiki/index.php/EBI_Good_Computing_Guide_new>`__,
17
`Sanger
18
Institute <http://mediawiki.internal.sanger.ac.uk/index.php/How_to_run_MPI_jobs_on_the_farm>`__)
19

20
You can find real examples in the
21
`ensembl-compara <https://github.com/Ensembl/ensembl-compara>`__
22
repository. It ships Runnables used for phylogenetic trees inference:
Matthieu Muffato's avatar
Matthieu Muffato committed
23
`RAxML <https://github.com/Ensembl/ensembl-compara/blob/HEAD/modules/Bio/EnsEMBL/Compara/RunnableDB/ProteinTrees/RAxML.pm>`__
24
and
Matthieu Muffato's avatar
Matthieu Muffato committed
25
`ExaML <https://github.com/Ensembl/ensembl-compara/blob/HEAD/modules/Bio/EnsEMBL/Compara/RunnableDB/ProteinTrees/ExaML.pm>`__.
26 27
They look very light-weight (only command-line definitions) because most
of the logic is in the base class (*GenericRunnable*), but nevertheless
28
show the command lines used and the parametrisation of multi-core and
29 30
MPI runs.

Matthieu Muffato's avatar
Matthieu Muffato committed
31 32 33 34
.. The default language is set to perl. Non-perl code-blocks have to define
   their own language setting
.. highlight:: perl

35 36 37
How to setup a module using Shared Memory Parallelism (threads)
---------------------------------------------------------------

38 39 40
If you have already compiled your code and know how to enable the
use of multiple threads / cores, this case should be very
straightforward. It basically consists in defining the proper
41
Resource Class in your pipeline.
42

43
1. You need to setup a Resource Class that encodes those requirements
44 45 46 47 48 49 50 51 52 53 54 55 56
   e.g. *16 cores and 24Gb of RAM*:

   ::

       sub resource_classes {
         my ($self) = @_;
         return {
           #...
           '24Gb_16_core_job' => { 'LSF' => '-n 16 -M24000  -R"select[mem>24000] span[hosts=1] rusage[mem=24000]"' },
           #...
         }
       }

57
2. You need to add the Analysis to your PipeConfig:
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

   ::

       {   -logic_name => 'app_multi_core',
           -module     => 'Namespace::Of::Thread_app',
           -parameters => {
                   'app_exe'    => $self->o('app_pthreads_exe'),
                   'cmd'        => '#app_exe# -T 16 -input #alignment_file#',
           },
           -rc_name    => '24Gb_16_core_job',
       },

   We would like to call your attention to the ``cmd`` parameter, where
   we define the command line used to run Thread\_app. Note that the
   actual command line would vary between different programs, but in
   this case, the parameter ``-T`` is set to 16 cores. You should check
   the documentation of the code you want to run to find out how to
   define the number of threads it will use.

77
Just with this basic configuration, eHive is able to run Thread\_app
78 79 80 81 82 83
in 16 cores.


How to setup a module using Distributed Memory Parallelism (MPI)
----------------------------------------------------------------

84 85 86 87
This case requires a bit more attention, so please be very careful
in including / loading the right libraries / modules.
The instructions below may not apply to your system. In doubt, contact your
systems administrators.
88 89 90 91

Tips for compiling for MPI
~~~~~~~~~~~~~~~~~~~~~~~~~~

92 93
MPI usually comes in two implementations: OpenMPI and MPICH. A
common source of problems is to compile the code with one MPI
94 95
implementation and try to run it with another. You must compile and run
your code with the **same** MPI implementation. This can be easily taken
96
care by properly setting up your .bashrc.
97 98 99 100 101 102 103 104

If you have access to Intel compilers, we strongly recommend you to try
compiling your code with it and checking for performance improvements.

If your compute environment uses `Module <http://modules.sourceforge.net/>`__
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

*Module* provides configuration files (module-files) for the dynamic
105
modification of your environment.
106 107 108

Here is how to list the modules that your system provides:

Matthieu Muffato's avatar
Matthieu Muffato committed
109
.. code-block:: none
110 111 112

        module avail

113
And how to load one (mpich3 in this example):
114

Matthieu Muffato's avatar
Matthieu Muffato committed
115
.. code-block:: none
116

117
        module load mpich3/mpich3-3.1-icc
118 119 120 121 122 123 124 125 126

Don't forget to put this line in your ``~/.bashrc`` so that it is
automatically loaded.

Otherwise, follow the recommended usage in your institute
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you don't have modules for the MPI environment available on your
system, please make sure you include the right libraries (PATH, and any
127
other environment variables).
128

129 130
The eHive bit
~~~~~~~~~~~~~
131 132

Here again, once the environment is properly set up, we only have to
133
define the correct Resource Class and command lines in eHive.
134

135
1. You need to setup a Resource Class that uses e.g. *64 cores and 16Gb
136 137 138 139 140 141 142 143
   of RAM*:

   ::

       sub resource_classes {
         my ($self) = @_;
         return {
           # ...
144
           '16Gb_64c_mpi' => {'LSF' => '-q mpi-rh7 -n 64 -M16000 -R"select[mem>16000] rusage[mem=16000] same[model] span[ptile=4]"' },
145 146 147 148
           # ...
         };
       }

149
   The Resource description is specific to our LSF environment, so adapt
150 151
   it to yours, but:

152
   -  ``-q mpi-rh7`` is needed to tell LSF you will run a job (Worker) in the
153 154 155 156 157
      MPI environment. Note that some LSF installations will require you
      to use an additional ``-a`` option.
   -  ``same[model]`` is needed to ensure that the selected compute nodes
      all have the same hardware. You may also need something like
      ``select[avx]`` to select the nodes that have the `AVX instruction
Matthieu Muffato's avatar
Matthieu Muffato committed
158
      set <https://en.wikipedia.org/wiki/Advanced_Vector_Extensions>`__
159 160
   -  ``span[ptile=4]``, this option specifies the granularity in which LSF
      will split the jobs/per node. In this example we ask for each machine
161
      to be allocated a multiple of four cores. This might affect queuing
162 163
      times. The memory requested is allocated for each _ptile_ (so
      64/4*16GB=256GB in total in the example).
164

165
2. You need to add the Analysis to your PipeConfig:
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182

   ::

       {   -logic_name => 'MPI_app',
           -module     => 'Bio::EnsEMBL::Compara::RunnableDB::ProteinTrees::MPI_app',
           -parameters => {
               'mpi_exe'     => $self->o('mpi_exe'),
           },
           -rc_name => '16Gb_64c_mpi',
           # ...
       },


How to write a module that uses MPI
-----------------------------------

Here is an excerpt of Ensembl Compara's
183
`ExaML <https://github.com/Ensembl/ensembl-compara/blob/HEAD/modules/Bio/EnsEMBL/Compara/RunnableDB/ProteinTrees/ExaML.pm>`__
184
MPI module. Note that LSF needs the MPI command to be run through
185
*mpirun*. You can also run several single-threaded commands in the same
186
Runnable.
187 188 189

::

190 191 192
    sub param_defaults {
        my $self = shift;
        return {
193
            %{ $self->SUPER::param_defaults },
194 195 196
            'cmd' => 'cmd 1 ; cmd  2 ; #mpirun_exe# #examl_exe# -examl_parameter_1 value1 -examl_parameter_2 value2',
        };
    }
197

198 199
.. _worker_temp_directory_name-mpi:

200 201
Temporary files
~~~~~~~~~~~~~~~
202

203 204 205 206
In our case, Examl uses MPI and wants to share data via the filesystem too.
In this specific Runnable, Examl is set to run in eHive's managed temporary
directory, which by default is under /tmp which is not shared across nodes on
our compute cluster.
207
We have to override the eHive method to use a shared directory (``$self->o('examl_dir')``) instead.
208

209 210 211
This can be done at the resource class level, by adding
``"-worker_base_tmp_dir ".$self->o('examl_dir')`` to the
``worker_cmd_args`` attribute of the resource-class
212