Commit 2e00b2b4 authored by Matthieu Muffato's avatar Matthieu Muffato
Browse files

Finished the section about the Runnable API

parent 5a3aa85f
.. _howto-mpi:
How to use MPI
==============
......@@ -192,6 +195,8 @@ Runnable.
};
}
.. _worker_temp_directory_name-mpi:
Temporary files
~~~~~~~~~~~~~~~
......
......@@ -6,32 +6,87 @@ eHive exposes an interface for Runnables (jobs) to interact with the
system:
- query their own parameters. See :ref:`parameters-in-jobs`
- control its own execution
- report issues
- run commands
- control its own execution and report issues
- run system commands
- trigger some *dataflow* events (e.g. create new jobs)
Execution control
-----------------
- worker_temp_directory
- worker_temp_directory_name
- cleanup_worker_temp_directory
Reporting and logging
---------------------
- warning
- say_with_header
- throw
- complete_early
- debug
System commands
---------------
Jobs can log messages to the standard output with the
``$self->say_with_header($message, $important)`` method. However they are only printed
when the *debug* mode is enabled (see below) or when the ``$important`` flag is switched on.
They will also be prefixed with a standard prefix consisting of the
runtime context (worker, role, job).
The debug mode is controlled by the ``--debug X`` option of
:ref:`script-beekeeper` and :ref:`script-runWorker`. *X* is an integer,
allowing multiple levels of debug, although most of the modules will only
check whether it is 0 or not.
``$self->warning($message)`` calls ``$self->say_with_header($message, 1)``
(so that the messages are printed on the standard output) but also stores
them in the database (in the ``log_message`` table).
To indicate that a job has to be terminated earlier (i.e. before reaching
the end of ``write_output``), you can call:
- ``$self->complete_early($message)`` to mark the job as *DONE*
(successful run). Beware that this will trigger the *autoflow*.
- ``$self->throw($message)`` to log a failed attempt. The job may be given
additional retries following the analysis' *max_retry_count* parameter,
or is marked as *FAILED* in the database.
System interactions
-------------------
All Runnables have access to the ``$self->run_system_command`` method to run
arbitrary system commands (the ``SystemCmd`` Runnable is merely a wrapper
around this method).
``run_system_command`` takes two arguments:
#. The command to run, given as a single string or an arrayref. Arrayrefs
are the preferred way as they simplify the handling of whitespace and
quotes in the command-line arguments. Arrayrefs that correspond to
straightforward commands, e.g. ``['find', '-type', 'd']``, are passed to
the underlying ``system`` function as lists. Arrayrefs can contain shell
meta-characters and delimiters such as ``>`` (to redirect the output to a
file), ``;`` (to separate two commands that have to be run sequentially)
or ``|`` (a pipe) and will be quoted and joined and passed to ``system``
as a single string
#. An hashref of options. Accepted options are:
- ``use_bash_pipefail``: Normally, the exit status of a pipeline (e.g.
``cmd1 | cmd2`` is the exit status of the last command, meaning that
errors in the first command are not captured. With the option turned
on, the exit status of the pipeline will capture errors in any command
of the pipeline, and will only be 0 if *all* the commands exit
successfully
- ``use_bash_errexit``: Exit immediately if a command fails. This is
mostly useful for cases like ``cmd1; cmd2`` where by default, ``cmd2``
would always be executed, regardless of the exit status of ``cmd1``
- ``timeout``: the maximum number of seconds the command is allowed to
run for. The exit status will be set to -2 if the command had to be
aborted
During their execution, jobs may certainly have to use temporary files.
eHive provides a directory that will exist throughout the lifespan of the
worker with the ``$self->worker_temp_directory`` method. The directory is created
the first time the method is called, and deleted when the worker ends. It is the Runnable's
responsibility to leave the directory in a clean-enough state for the next
job (by removing some files, for instance), or to clean it up completely
with ``$self->cleanup_worker_temp_directory``.
By default, this directory will be put under /tmp, but it can be overriden
by adding a ``worker_temp_directory_name`` method to the runnable. This can
be used to:
- use a faster filesystem (although /tmp is usually local to the machine)
- use a network filesystem (needed for distributed applications, e.g. over
MPI). See :ref:`worker_temp_directory_name-mpi` in the :ref:`howto-mpi` section.
- run_system_command
Dataflows
---------
......@@ -41,7 +96,7 @@ are immediately reacted upon. The main event is called **Dataflow** (see
:ref:`dataflows` for more information) and
consists of sending some data somewhere. The destination of a Dataflow
event must be defined in the pipeline graph itself, and is then referred to
by a *branch number* (see :ref:`dataflows`).
by a *branch number*.
Within a Runnable, Dataflow events are performed via the ``$self->dataflow_output_id($data,
$branch_number)`` method.
......@@ -55,8 +110,8 @@ The payload ``$data`` must be of one of these types:
The branch number defaults to 1 and can be skipped. Generally speaking, it
has to be an integer.
Runnables can also use ``dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``dataflow_output_id``, allowing external programs
Runnables can also use ``$self->dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``$self->dataflow_output_id``, allowing external programs
to easily generate events. The method takes two arguments:
#. The path to a file containing one JSON object per line. Each line can be
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment