Commit 7928df15 authored by Brandon Walts's avatar Brandon Walts Committed by ens-bwalts
Browse files

Reworked small sections in creating_pipelines and creating_runnables

parent be6d743a
......@@ -212,6 +212,9 @@ funnel. In eHive, this process is referred to as "semaphore propagation".
{ -logic_name => 'Delta',
},
Here, Delta will be blocked until all Beta Jobs have completed. It will
also be blocked until any child Gamma Jobs that may have been seeded by the
Beta Jobs are complete.
Dataflow using special error handling branches
----------------------------------------------
......
......@@ -2,7 +2,14 @@
Standard library of Runnables
=============================
Several Runnables are included in the standard eHive distribution, providing a library of components that can be helpful when creating pipelines. All of these are located in the directory modules/Bio/EnsEMBL/Hive/RunnableDB/. In addition, there are Runnables included with the examples under modules/Bio/EnsEMBL/Hive/Examples/. Although those are written to fit into specific example pipelines to illustrate specific eHive concepts, you may find them useful in your own pipelines.
Several Runnables are included in the standard eHive distribution,
providing a library of components that can be helpful when creating
pipelines. All of these are located in the directory
modules/Bio/EnsEMBL/Hive/RunnableDB/. In addition, there are Runnables
included with the examples under
modules/Bio/EnsEMBL/Hive/Examples/. Although those are written to fit into
specific example pipelines to illustrate specific eHive concepts, you may
find them useful in your own pipelines.
The included examples are:
......
......@@ -100,46 +100,7 @@ param_required() 3 (die) 0 (die) 3 (die) 0 (die)
Exporting data from a Runnable (dataflow)
=========================================
eHive is an *event-driven* system whereby agents trigger events that
are immediately reacted upon. The main event is called "dataflow" (see
:ref:`dataflows` for more information). A dataflow event is made up of
two parts: An event, which is identified by a "branch number", with an
attached data payload, consisting of parameters. A Runnable can create
as many events as desired, whenever desired. The branch number can be
any integer, but note that "-2", "-1", "0", and "1" have special meaning
within eHive. -2, -1, and 0 are special branches for
:ref:`error handling <resource-limit-dataflow>`, and 1 is the autoflow branch.
.. warning::
If a Runnable explicitly generates a dataflow event on branch 1, then
no autoflow event will be generated when the Job finishes. This is
unusual behaviour -- many pipelines expect and depend on autoflow
coinciding with Job completion. Therefore, you should avoid explicitly
creating dataflow on branch 1, unless no alternative exists to produce
the correct logic in the Runnable. If you do override the autoflow by
creating an event on branch 1, be sure to clearly indicate this in the
Runnable's documentation.
Within a Runnable, dataflow events are performed via the ``$self->dataflow_output_id($data,
$branch_number)`` method.
The payload ``$data`` must be of one of these types:
- A hash-reference that maps parameter names (strings) to their values,
- An array-reference of hash-references of the above type, or
- ``undef`` to propagate the Job's input_id.
If no branch number is provided, it defaults to 1.
Runnables can also use ``dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``dataflow_output_id``, allowing external programs
to easily generate events. The method takes two arguments:
#. The path to a file containing one JSON object per line. Each line can be
prefixed with a branch number (and some whitespace), which will override
the default branch number.
#. The default branch number (defaults to 1).
Dataflow events (:ref:`dataflows <dataflows>`) are a key part of eHive pipelines. They provide both a mechanism for signalling other pipeline components, as well as a mechanism for transmitting data. Functions are provided to allow Runnables to generate dataflow events with control over timing and data payload. These functions are covered in detail in the :ref:`runnable API documentation <runnable_api_dataflows>`.
Reading in data from external files and databases
......
......@@ -91,36 +91,50 @@ be used to:
- use a network filesystem (needed for distributed applications, e.g. over
MPI). See :ref:`worker_temp_directory_name-mpi` in the :ref:`howto-mpi` section.
.. _runnable_api_dataflows:
Dataflows
---------
eHive is an *event-driven* system whereby agents trigger events that
are immediately reacted upon. The main event is called "Dataflow" (see
:ref:`dataflows` for more information) and
consists of sending some data somewhere. The destination of a Dataflow
event must be defined in the pipeline graph itself, and is then referred to
by a "branch number".
Within a Runnable, Dataflow events are performed via the ``$self->dataflow_output_id($data,
are immediately reacted upon. The main event is called "dataflow" (see
:ref:`dataflows` for more information). A dataflow event is made up of
two parts: An event, which is identified by a "branch number", with an
attached data payload, consisting of parameters. A Runnable can create
as many events as desired, whenever desired. The branch number can be
any integer, but note that "-2", "-1", "0", and "1" have special meaning
within eHive. -2, -1, and 0 are special branches for
:ref:`error handling <resource-limit-dataflow>`, and 1 is the autoflow branch.
.. warning::
If a Runnable explicitly generates a dataflow event on branch 1, then
no autoflow event will be generated when the Job finishes. This is
unusual behaviour -- many pipelines expect and depend on autoflow
coinciding with Job completion. Therefore, you should avoid explicitly
creating dataflow on branch 1, unless no alternative exists to produce
the correct logic in the Runnable. If you do override the autoflow by
creating an event on branch 1, be sure to clearly indicate this in the
Runnable's documentation.
Within a Runnable, dataflow events are performed via the ``$self->dataflow_output_id($data,
$branch_number)`` method.
The payload ``$data`` must be of one of these types:
- hash-reference that maps parameter names (strings) to their values,
- array-reference of hash-references of the above type,
- ``undef`` to propagate the job's input_id.
- A hash-reference that maps parameter names (strings) to their values,
- An array-reference of hash-references of the above type, or
- ``undef`` to propagate the Job's input_id.
The branch number defaults to 1 and can be skipped. Generally speaking, it
has to be an integer.
If no branch number is provided, it defaults to 1.
Runnables can also use ``$self->dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``$self->dataflow_output_id``, allowing external programs
Runnables can also use ``dataflow_output_ids_from_json($filename, $default_branch)``.
This method simply wraps ``dataflow_output_id``, allowing external programs
to easily generate events. The method takes two arguments:
#. The path to a file containing one JSON object per line. Each line can be
prefixed with a branch number (and some whitespace), which will override
the default branch number.
#. The default branch number (defaults to 1 too).
#. The default branch number (defaults to 1).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment