runnable_api.rst 5.54 KB
Newer Older
1 2 3 4 5 6 7

Runnable API

eHive exposes an interface for Runnables (jobs) to interact with the

8 9 10 11
  - query their own parameters (see :ref:`parameters-in-jobs`),
  - control its own execution and report issues,
  - run system commands,
  - trigger some *dataflow* events (e.g. create new jobs).
12 13 14 15 16

Reporting and logging

17 18 19 20
Jobs can log messages to the standard output with the
``$self->say_with_header($message, $important)`` method. However they are only printed
when the *debug* mode is enabled (see below) or when the ``$important`` flag is switched on.
They will also be prefixed with a standard prefix consisting of the
runtime context (Worker, Role, Job).
22 23 24 25 26 27 28 29 30 31

The debug mode is controlled by the ``--debug X`` option of
:ref:`script-beekeeper` and :ref:`script-runWorker`. *X* is an integer,
allowing multiple levels of debug, although most of the modules will only
check whether it is 0 or not.

``$self->warning($message)`` calls ``$self->say_with_header($message, 1)``
(so that the messages are printed on the standard output) but also stores
them in the database (in the ``log_message`` table).

To indicate that a Job has to be terminated earlier (i.e. before reaching
33 34
the end of ``write_output``), you can call:

- ``$self->complete_early($message)`` to mark the Job as *DONE*
36 37
  (successful run) and record the message in the database. Beware that this
  will trigger the *autoflow*.
38 39
- ``$self->complete_early($message, $branch_code)`` is a variation of the
  above that will replace the autoflow (branch 1) with a dataflow on the
40 41
  branch given.
- ``$self->throw($message)`` to log a failed attempt. The Job may be given
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
  additional retries following the analysis' *max_retry_count* parameter,
  or is marked as *FAILED* in the database.

System interactions

All Runnables have access to the ``$self->run_system_command`` method to run
arbitrary system commands (the ``SystemCmd`` Runnable is merely a wrapper
around this method).

``run_system_command`` takes two arguments:

#. The command to run, given as a single string or an arrayref. Arrayrefs
   are the preferred way as they simplify the handling of whitespace and
   quotes in the command-line arguments. Arrayrefs that correspond to
   straightforward commands, e.g. ``['find', '-type', 'd']``, are passed to
   the underlying ``system`` function as lists. Arrayrefs can contain shell
   meta-characters and delimiters such as ``>`` (to redirect the output to a
   file), ``;`` (to separate two commands that have to be run sequentially)
   or ``|`` (a pipe) and will be quoted and joined and passed to ``system``
   as a single string.
63 64 65 66 67 68 69
#. An hashref of options. Accepted options are:

   - ``use_bash_pipefail``: Normally, the exit status of a pipeline (e.g.
     ``cmd1 | cmd2`` is the exit status of the last command, meaning that
     errors in the first command are not captured. With the option turned
     on, the exit status of the pipeline will capture errors in any command
     of the pipeline, and will only be 0 if *all* the commands exit
71 72
   - ``use_bash_errexit``: Exit immediately if a command fails. This is
     mostly useful for cases like ``cmd1; cmd2`` where by default, ``cmd2``
     would always be executed, regardless of the exit status of ``cmd1``.
74 75
   - ``timeout``: the maximum number of seconds the command is allowed to
     run for. The exit status will be set to -2 if the command had to be
77 78 79

During their execution, jobs may certainly have to use temporary files.
eHive provides a directory that will exist throughout the lifespan of the
80 81
Worker with the ``$self->worker_temp_directory`` method. The directory is created
the first time the method is called, and deleted when the Worker ends. It is the Runnable's
responsibility to leave the directory in a clean-enough state for the next
Job (by removing some files, for instance), or to clean it up completely
84 85 86 87 88 89
with ``$self->cleanup_worker_temp_directory``.

By default, this directory will be put under /tmp, but it can be overriden
by adding a ``worker_temp_directory_name`` method to the runnable. This can
be used to:

- use a faster filesystem (although /tmp is usually local to the machine),
91 92
- use a network filesystem (needed for distributed applications, e.g. over
  MPI). See :ref:`worker_temp_directory_name-mpi` in the :ref:`howto-mpi` section.
93 94 95 96 97 98


eHive is an *event-driven* system whereby agents trigger events that
are immediately reacted upon. The main event is called "Dataflow" (see
100 101 102
:ref:`dataflows` for more information) and
consists of sending some data somewhere. The destination of a Dataflow
event must be defined in the pipeline graph itself, and is then referred to
by a "branch number".
104 105 106 107 108 109

Within a Runnable, Dataflow events are performed via the ``$self->dataflow_output_id($data,
$branch_number)`` method.

The payload ``$data`` must be of one of these types:

110 111 112
- hash-reference that maps parameter names (strings) to their values,
- array-reference of hash-references of the above type,
- ``undef`` to propagate the job's input_id.
113 114 115 116

The branch number defaults to 1 and can be skipped. Generally speaking, it
has to be an integer.

117 118
Runnables can also use ``$self->dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``$self->dataflow_output_id``, allowing external programs
119 120 121 122 123
to easily generate events. The method takes two arguments:

#. The path to a file containing one JSON object per line. Each line can be
   prefixed with a branch number (and some whitespace), which will override
   the default branch number.
#. The default branch number (defaults to 1 too).
125 126