@@ -109,7 +109,11 @@ sub pipeline_create_commands {
* 'part_multiply' initially without jobs (they will flow from 'start')
* 'add_together' initially without jobs (they will flow from 'start').
All 'add_together' jobs will wait for completion of *all* 'part_multiply' jobs before their own execution (to ensure all data is available).
All 'add_together' jobs will wait for completion of 'part_multiply' jobs before their own execution (to ensure all data is available).
There are two control modes in this pipeline:
A. The default mode is to use the '2' dataflow rule from 'start' analysis and a -wait_for rule in 'add_together' analysis for analysis-wide synchronization.
B. The semaphored mode is to use '2:1' semaphored dataflow rule from 'start' instead, and comment out the analysis-wide -wait_for rule, relying on semaphores.
usebase('Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf');# All Hive databases configuration files should inherit from HiveGeneric, directly or indirectly
=head2 default_options
Description : Implements default_options() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that is used to initialize default options.
In addition to the standard things it defines two options, 'first_mult' and 'second_mult' that are supposed to contain the long numbers to be multiplied.
=cut
sub default_options{
my($self)=@_;
return{
%{$self->SUPER::default_options()},# inherit other stuff from the base class
'pipeline_name'=>'sema_long_mult',# name used by the beekeeper to prefix job names on the farm
'first_mult'=>'9650516169',# the actual numbers to be multiplied can also be specified from the command line
'second_mult'=>'327358788',
};
}
=head2 pipeline_create_commands
Description : Implements pipeline_create_commands() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that lists the commands that will create and set up the Hive database.
In addition to the standard creation of the database and populating it with Hive tables and procedures it also creates two pipeline-specific tables used by Runnables to communicate.
=cut
sub pipeline_create_commands{
my($self)=@_;
return[
@{$self->SUPER::pipeline_create_commands},# inheriting database and hive tables' creation
# additional tables needed for long multiplication pipeline's operation:
'mysql '.$self->dbconn_2_mysql('pipeline_db',1)." -e 'CREATE TABLE intermediate_result (a_multiplier char(40) NOT NULL, digit tinyint NOT NULL, result char(41) NOT NULL, PRIMARY KEY (a_multiplier, digit))'",
'mysql '.$self->dbconn_2_mysql('pipeline_db',1)." -e 'CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'",
];
}
=head2 pipeline_analyses
Description : Implements pipeline_analyses() interface method of Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf that defines the structure of the pipeline: analyses, jobs, rules, etc.
Here it defines three analyses:
* 'sema_start' with two jobs (multiply 'first_mult' by 'second_mult' and vice versa - to check the commutativity of multiplivation).
Each job will dataflow (create more jobs) via branch #2 into 'part_multiply' and via branch #1 into 'add_together'.
Unlike LongMult_conf, there is no analysis-level control here, but SemaStart analysis itself is more intelligent
in that it can dataflow a group of partial multiplication jobs in branch #2 linked with one job in branch #1 by a semaphore.
* 'part_multiply' initially without jobs (they will flow from 'start')
* 'add_together' initially without jobs (they will flow from 'start').
Unlike LongMult_conf here we do not use analysis control and rely on job-level semaphores to keep the jobs in sync.
Description : Implements fetch_input() interface method of Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
Here the task of fetch_input() is to read in the two multipliers, split the second one into digits and create a set of input_ids that will be used later.
param('a_multiplier'): The first long number (a string of digits - doesn't have to fit a register).
param('b_multiplier'): The second long number (also a string of digits).
=cut
sub fetch_input{
my$self=shift@_;
my$a_multiplier=$self->param('a_multiplier')||die"'a_multiplier' is an obligatory parameter";
my$b_multiplier=$self->param('b_multiplier')||die"'b_multiplier' is an obligatory parameter";
my%digit_hash=();
foreachmy$digit(split(//,$b_multiplier)){
nextif(($digiteq'0')or($digiteq'1'));
$digit_hash{$digit}++;
}
# output_ids of partial multiplications to be computed:
Description : Implements run() interface method of Bio::EnsEMBL::Hive::Process that is used to perform the main bulk of the job (minus input and output).
Here we don't have any real work to do, just input and output, so run() remains empty.
=cut
sub run{
}
=head2 write_output
Description : Implements write_output() interface method of Bio::EnsEMBL::Hive::Process that is used to deal with job's output after the execution.
Here we first dataflow the original task down branch-1 (create the semaphored "funnel job") - this yields $funnel_job_id,
then "fan out" the partial multiplication tasks into branch-2, and pass the $funnel_job_id to all of them.
=cut
sub write_output{# nothing to write out, but some dataflow to perform:
my$self=shift@_;
my$output_ids=$self->param('output_ids');
# first we flow the branch#1 into the (semaphored) funnel job: