LongMult_conf.pm 6.62 KB
Newer Older
Leo Gordon's avatar
Leo Gordon committed
1 2 3 4 5

=pod 

=head1 NAME

Leo Gordon's avatar
Leo Gordon committed
6
    Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf;
Leo Gordon's avatar
Leo Gordon committed
7 8 9

=head1 SYNOPSIS

10
   # initialize the database and build the graph in it (it will also print the value of EHIVE_URL) :
Leo Gordon's avatar
Leo Gordon committed
11 12
init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf -password <mypass>

13 14
    # optionally also seed it with your specific values:
seed_pipeline.pl -url $EHIVE_URL -logic_name take_b_apart -input_id '{ "a_multiplier" => "12345678", "b_multiplier" => "3359559666" }'
Leo Gordon's avatar
Leo Gordon committed
15

16 17
    # run the pipeline:
beekeeper.pl -url $EHIVE_URL -loop
Leo Gordon's avatar
Leo Gordon committed
18 19 20 21


=head1 DESCRIPTION

Leo Gordon's avatar
Leo Gordon committed
22 23
    This is the PipeConfig file for the long multiplication pipeline example.
    The main point of this pipeline is to provide an example of how to write Hive Runnables and link them together into a pipeline.
Leo Gordon's avatar
Leo Gordon committed
24

Leo Gordon's avatar
Leo Gordon committed
25
    Please refer to Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf module to understand the interface implemented here.
Leo Gordon's avatar
Leo Gordon committed
26

27 28
    The setting. let's assume we are given two loooooong numbers to multiply. reeeeally long.
    soooo long that they do not fit into registers of the cpu and should be multiplied digit-by-digit.
Leo Gordon's avatar
Leo Gordon committed
29
    For the purposes of this example we also assume this task is very computationally intensive and has to be done in parallel.
Leo Gordon's avatar
Leo Gordon committed
30

31 32
    The long multiplication pipeline consists of three "analyses" (types of tasks):
        'take_b_apart', 'part_multiply' and 'add_together' that we use to examplify various features of the Hive.
Leo Gordon's avatar
Leo Gordon committed
33

34
        * A 'take_b_apart' job takes in two string parameters, 'a_multiplier' and 'b_multiplier',
Leo Gordon's avatar
Leo Gordon committed
35 36
          takes the second one apart into digits, finds what _different_ digits are there,
          creates several jobs of the 'part_multiply' analysis and one job of 'add_together' analysis.
Leo Gordon's avatar
Leo Gordon committed
37

38
        * A 'part_multiply' job takes in 'a_multiplier' and 'digit', multiplies them and accumulates the result in 'partial_product' accumulator.
Leo Gordon's avatar
Leo Gordon committed
39

Leo Gordon's avatar
Leo Gordon committed
40
        * An 'add_together' job waits for the first two analyses to complete,
41
          takes in 'a_multiplier', 'b_multiplier' and 'partial_product' hash and produces the final result in 'final_result' table.
Leo Gordon's avatar
Leo Gordon committed
42

Leo Gordon's avatar
Leo Gordon committed
43
    Please see the implementation details in Runnable modules themselves.
Leo Gordon's avatar
Leo Gordon committed
44 45 46

=head1 CONTACT

Leo Gordon's avatar
Leo Gordon committed
47
    Please contact ehive-users@ebi.ac.uk mailing list with questions/suggestions.
Leo Gordon's avatar
Leo Gordon committed
48 49 50

=cut

51 52 53 54 55

package Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf;

use strict;
use warnings;
Leo Gordon's avatar
Leo Gordon committed
56

Leo Gordon's avatar
Leo Gordon committed
57
use base ('Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf');  # All Hive databases configuration files should inherit from HiveGeneric, directly or indirectly
58

59

Leo Gordon's avatar
Leo Gordon committed
60 61 62 63 64 65 66
=head2 pipeline_create_commands

    Description : Implements pipeline_create_commands() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that lists the commands that will create and set up the Hive database.
                  In addition to the standard creation of the database and populating it with Hive tables and procedures it also creates two pipeline-specific tables used by Runnables to communicate.

=cut

67 68 69
sub pipeline_create_commands {
    my ($self) = @_;
    return [
Leo Gordon's avatar
Leo Gordon committed
70
        @{$self->SUPER::pipeline_create_commands},  # inheriting database and hive tables' creation
71 72

            # additional tables needed for long multiplication pipeline's operation:
73
        $self->db_cmd('CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'),
74 75 76
    ];
}

77

78 79 80 81 82 83 84 85 86 87 88 89 90
=head2 pipeline_wide_parameters

    Description : Interface method that should return a hash of pipeline_wide_parameter_name->pipeline_wide_parameter_value pairs.
                  The value doesn't have to be a scalar, can be any Perl structure now (will be stringified and de-stringified automagically).
                  Please see existing PipeConfig modules for examples.

=cut

sub pipeline_wide_parameters {
    my ($self) = @_;
    return {
        %{$self->SUPER::pipeline_wide_parameters},          # here we inherit anything from the base class

91
        'take_time'     => 1,
92 93 94 95
    };
}


Leo Gordon's avatar
Leo Gordon committed
96 97 98 99
=head2 pipeline_analyses

    Description : Implements pipeline_analyses() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that defines the structure of the pipeline: analyses, jobs, rules, etc.
                  Here it defines three analyses:
100
                    * 'take_b_apart' that is auto-seeded with a pair of jobs (to check the commutativity of multiplication).
Leo Gordon's avatar
Leo Gordon committed
101 102
                      Each job will dataflow (create more jobs) via branch #2 into 'part_multiply' and via branch #1 into 'add_together'.

103 104
                    * 'part_multiply' with jobs fed from take_b_apart#2.
                        It multiplies input parameters 'a_multiplier' and 'digit' and dataflows 'partial_product' parameter into branch #1.
105

106 107 108 109
                    * 'add_together' with jobs fed from take_b_apart#1.
                        It adds together results of partial multiplication computed by 'part_multiply'.
                        These results are accumulated in 'partial_product' hash.
                        Until the hash is complete the corresponding 'add_together' job is blocked by a semaphore.
Leo Gordon's avatar
Leo Gordon committed
110 111 112

=cut

113 114 115
sub pipeline_analyses {
    my ($self) = @_;
    return [
116 117
        {   -logic_name => 'take_b_apart',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::DigitFactory',
118
            -meadow_type=> 'LOCAL',     # do not bother the farm with such a simple task (and get it done faster)
119
            -analysis_capacity  =>  2,  # use per-analysis limiter
120
            -input_ids => [
121 122
                { 'a_multiplier' => '9650156169', 'b_multiplier' => '327358788' },
                { 'a_multiplier' => '327358788', 'b_multiplier' => '9650156169' },
123 124
            ],
            -flow_into => {
125 126 127 128
                    # will create a semaphored fan of jobs; will use a template to top-up the hashes:
                '2->A' => { 'part_multiply' => { 'a_multiplier' => '#a_multiplier#', 'digit' => '#digit#', 'take_time' => '#take_time#' } },
                    # will create a semaphored funnel job to wait for the fan to complete and add the results:
                'A->1' => [ 'add_together'  ],
129 130 131
            },
        },

132 133
        {   -logic_name => 'part_multiply',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply',
134
            -analysis_capacity  =>  4,  # use per-analysis limiter
135
            -flow_into => {
136
                1 => [ ':////accu?partial_product={digit}' ],
137
            },
138 139 140 141
        },
        
        {   -logic_name => 'add_together',
            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether',
142
#           -analysis_capacity  =>  0,  # this is a way to temporarily block a given analysis
143
            -flow_into => {
144
                1 => [ ':////final_result' ],
145
            },
146 147 148 149 150 151
        },
    ];
}

1;