bringing docs up-to-date with new init_pipeline

6efd59d5 · Leo Gordon · 11863831 · 6efd59d5 · 6efd59d5 · 6efd59d5
Commit 6efd59d5 authored 14 years ago by Leo Gordon
--- a/README
+++ b/README
@@ -16,6 +16,15 @@ Summary:
  Bio::EnsEMBL::Analysis::RunnableDB perl wrapper objects as nodes/blocks in 
  the graphs but could be adapted more generally.

+12 May, 2010 : Leo Gordon
+
+* init_pipeline.pl can be given a PipeConfig file name instead of full module name.
+
+* init_pipeline.pl has its own help that displays pod documentation (same mechanism as other eHive scripts)
+
+* 3 pipeline initialization modes supported:
+    full (default), -analysis_topup (pipeline development mode) and -job_topup (add more data to work with)
+

 11 May, 2010 : Leo Gordon


--- a/docs/eHive_install_usage.txt
+++ b/docs/eHive_install_usage.txt
@@ -124,83 +124,3 @@ It will be convenient to set a variable pointing at this directory for future us

 3.4 In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult we keep bespoke RunnableDBs for long multiplication example pipeline.

-
-4   Long multiplication example pipeline.
-
-    Long multiplication pipeline solves a problem of multiplying two very long integer numbers by pretending the computations have to be done in parallel on the farm.
-    While performing the task it uses various features of eHive, so by studying this and other examples you can learn how to put together your own pipeines.
-
-4.1 The pipeline is defined in 4 files:
-
-        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/Start.pm            splits a multiplication job into sub-tasks and creates corresponding jobs
-
-        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/PartMultiply.pm     performs a partial multiplication and stores the intermediate result in a table
-
-        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/AddTogether.pm      waits for partial multiplication results to compute and adds them together into final result
-
-        * ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm             the pipeline configuration module that links the previous Runnables into one pipeline
-
-4.2 The main part of any PipeConfig file, pipeline_analyses() method, defines the pipeline graph whose nodes are analyses and whose arcs are control and dataflow rules.
-    Each analysis hash must have:
-        -logic_name     string name by which this analysis is referred to,
-        -module         a name of the Runnable module that contains the code to be run (several analyses can use the same Runnable)
-    Optionally, it can also have:
-        -input_ids      an array of hashes, each hash defining job-specific parameters (if empty it means jobs are created dynamically using dataflow mechanism)
-        -parameters     usually a hash of analysis-wide parameters (each such parameter can be overriden by the same name parameter contained in an input_id hash)
-        -wait_for       an array of other analyses, *controlling* this one (jobs of this analysis cannot start before all jobs of controlling analyses have completed)
-        -flow_into      usually a hash that defines dataflow rules (rules of dynamic job creation during pipeline execution) from this particular analysis.
-
-    The meaning of these parameters should become clearer after some experimentation with the pipeline.
-
-
-5   Initialization and running the long multiplication pipeline.
-
-5.1 Before running the pipeline you will have to initialize it using init_pipeline.pl script supplying PipeConfig module and the necessary parameters.
-    Have another look at LongMult_conf.pm file. The default_options() method returns a hash that pretty much defines what parameters you can/should supply to init_pipeline.pl .
-    You will probably need to specify the following:
-
-        $ init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf \
-            -ensembl_cvs_root_dir $ENS_CODE_ROOT \
-            -pipeline_db -host=<your_mysql_host> \
-            -pipeline_db -user=<your_mysql_username> \
-            -pipeline_db -user=<your_mysql_password> \
-
-    This should create a fresh eHive database and initalize it with long multiplication pipeline data (the two numbers to be multiplied are taken from defaults).
-
-    Upon successful completion init_pipeline.pl will print several beekeeper commands and
-    a mysql command for connecting to the newly created database.
-    Copy and run the mysql command in a separate shell session to follow the progress of the pipeline.
-
-5.2 Run the first beekeeper command that contains '-sync' option. This will initialize database's internal stats and determine which jobs can be run.
-
-5.3 Now you have two options: either to run the beekeeper.pl in automatic mode using '-loop' option and wait until it completes,
-    or run it in step-by-step mode, initiating every step by separate executions of 'beekeeper.pl ... -run' command.
-    We will use the step-by-step mode in order to see what is going on.
-
-5.4 Go to mysql window and check the contents of analysis_job table:
-
-        MySQL> SELECT * FROM analysis_job;
-
-    It will only contain jobs that set up the multiplication tasks in 'READY' mode - meaning 'ready to be taken by workers and executed'.
-
-    Go to the beekeeper window and run the 'beekeeper.pl ... -run' once.
-    It will submit a worker to the farm that will at some point get the 'start' job(s).
-
-5.5 Go to mysql window again and check the contents of analysis_job table. Keep checking as the worker may spend some time in 'pending' state.
-
-    After the first worker is done you will see that 'start' jobs are now done and new 'part_multiply' and 'add_together' jobs have been created.
-    Also check the contents of 'intermediate_result' table, it should be empty at that moment:
-
-        MySQL> SELECT * from intermediate_result;
-
-    Go back to the beekeeper window and run the 'beekeeper.pl ... -run' for the second time.
-    It will submit another worker to the farm that will at some point get the 'part_multiply' jobs.
-
-5.6 Now check both 'analysis_job' and 'intermediate_result' tables again.
-    At some moment 'part_multiply' jobs will have been completed and the results will go into 'intermediate_result' table;
-    'add_together' jobs are still to be done.
-    
-    Check the contents of 'final_result' table (should be empty) and run the third and the last round of 'beekeeper.pl ... -run'
-
-5.7 Eventually you will see that all jobs have completed and the 'final_result' table contains final result(s) of multiplication.
-
--- a/docs/long_mult_example_pipeline.txt
+++ b/docs/long_mult_example_pipeline.txt
-############################################################################################################################
-#
-#    Bio::EnsEMBL::Hive::RunnableDB::LongMult is an example eHive pipeline that demonstates the following features:
-#
-# A) A pipeline can have multiple analyses (this one has three: 'start', 'part_multiply' and 'add_together').
-#
-# B) A job of one analysis can create jobs of another analysis (one 'start' job creates up to 8 'part_multiply' jobs).
-#
-# C) A job of one analysis can "flow the data" into another analysis (a 'start' job "flows into" an 'add_together' job).
-#
-# D) Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed
-#    ('add_together' is blocked both by 'start' and 'part_multiply').
-#
-# E) As filesystems are frequently a bottleneck for big pipelines, it is advised that eHive processes store intermediate
-#    and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used).
-#
-############################################################################################################################
-
-# 0. Cache MySQL connection parameters in a variable (they will work as eHive connection parameters as well) :
-export MYCONN="--host=hostname --port=port_number --user=username --password=secret"
-#
-# also, set the ENS_CODE_ROOT to the directory where ensembl packages are installed:
-export ENS_CODE_ROOT="$HOME/ensembl_main"
-
-# 1. Create an empty database:
-mysql $MYCONN -e 'DROP DATABASE IF EXISTS long_mult_test'
-mysql $MYCONN -e 'CREATE DATABASE long_mult_test'
-
-# 2. Create eHive infrastructure:
-mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/tables.sql
-
-# 3. Create analyses/control_rules/dataflow_rules of the LongMult pipeline:
-mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/create_long_mult.sql
-
-# 4. "Load" the pipeline with a multiplication task:
-mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/load_long_mult.sql
-#
-# or you can add your own task(s). Several tasks can be added at once:
-mysql $MYCONN long_mult_test <<EoF
-INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }");
-INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }");
-EoF
-
-# 5. Initialize the newly created eHive for the first time:
-beekeeper.pl $MYCONN --database=long_mult_test -sync
-
-# 6. You can either execute three individual workers (each picking one analysis of the pipeline):
-runWorker.pl $MYCONN --database=long_mult_test
-#
-#
-# ... or run an automatic loop that will run workers for you:
-beekeeper.pl $MYCONN --database=long_mult_test -loop
-
-# 7. The results of the computations are to be found in 'final_result' table:
-mysql $MYCONN long_mult_test -e 'SELECT * FROM final_result'
-
-# 8. You can add more multiplication tasks by repeating from step 4.
+
+4   Long multiplication example pipeline.
+
+4.1 Long multiplication pipeline solves a problem of multiplying two very long integer numbers by pretending the computations have to be done in parallel on the farm.
+    While performing the task it demonstates the use of the following features:
+
+    A) A pipeline can have multiple analyses (this one has three: 'start', 'part_multiply' and 'add_together').
+
+    B) A job of one analysis can create jobs of other analyses by 'flowing the data' down numbered channels or branches.
+       These branches are then assigned specific analysis names in the pipeline configuration file
+       (one 'start' job flows partial multiplication subtasks down branch #2 and a task of adding them together down branch #1).
+
+    C) Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed
+    ('add_together' is blocked both by 'part_multiply').
+
+    D) As filesystems are frequently a bottleneck for big pipelines, it is advised that eHive processes store intermediate
+    and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used).
+
+4.2 The pipeline is defined in 4 files:
+
+        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/Start.pm            splits a multiplication job into sub-tasks and creates corresponding jobs
+
+        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/PartMultiply.pm     performs a partial multiplication and stores the intermediate result in a table
+
+        * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/AddTogether.pm      waits for partial multiplication results to compute and adds them together into final result
+
+        * ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm             the pipeline configuration module that links the previous Runnables into one pipeline
+
+4.3 The main part of any PipeConfig file, pipeline_analyses() method, defines the pipeline graph whose nodes are analyses and whose arcs are control and dataflow rules.
+    Each analysis hash must have:
+        -logic_name     string name by which this analysis is referred to,
+        -module         a name of the Runnable module that contains the code to be run (several analyses can use the same Runnable)
+    Optionally, it can also have:
+        -input_ids      an array of hashes, each hash defining job-specific parameters (if empty it means jobs are created dynamically using dataflow mechanism)
+        -parameters     usually a hash of analysis-wide parameters (each such parameter can be overriden by the same name parameter contained in an input_id hash)
+        -wait_for       an array of other analyses, *controlling* this one (jobs of this analysis cannot start before all jobs of controlling analyses have completed)
+        -flow_into      usually a hash that defines dataflow rules (rules of dynamic job creation during pipeline execution) from this particular analysis.
+
+    The meaning of these parameters should become clearer after some experimentation with the pipeline.
+
+
+5   Initialization and running the long multiplication pipeline.
+
+5.1 Before running the pipeline you will have to initialize it using init_pipeline.pl script supplying PipeConfig module and the necessary parameters.
+    Have another look at LongMult_conf.pm file. The default_options() method returns a hash that pretty much defines what parameters you can/should supply to init_pipeline.pl .
+    You will probably need to specify the following:
+
+        $ init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf \
+            -ensembl_cvs_root_dir $ENS_CODE_ROOT \
+            -pipeline_db -host=<your_mysql_host> \
+            -pipeline_db -user=<your_mysql_username> \
+            -pipeline_db -user=<your_mysql_password> \
+
+    This should create a fresh eHive database and initalize it with long multiplication pipeline data (the two numbers to be multiplied are taken from defaults).
+
+    Upon successful completion init_pipeline.pl will print several beekeeper commands and
+    a mysql command for connecting to the newly created database.
+    Copy and run the mysql command in a separate shell session to follow the progress of the pipeline.
+
+5.2 Run the first beekeeper command that contains '-sync' option. This will initialize database's internal stats and determine which jobs can be run.
+
+5.3 Now you have two options: either to run the beekeeper.pl in automatic mode using '-loop' option and wait until it completes,
+    or run it in step-by-step mode, initiating every step by separate executions of 'beekeeper.pl ... -run' command.
+    We will use the step-by-step mode in order to see what is going on.
+
+5.4 Go to mysql window and check the contents of analysis_job table:
+
+        MySQL> SELECT * FROM analysis_job;
+
+    It will only contain jobs that set up the multiplication tasks in 'READY' mode - meaning 'ready to be taken by workers and executed'.
+
+    Go to the beekeeper window and run the 'beekeeper.pl ... -run' once.
+    It will submit a worker to the farm that will at some point get the 'start' job(s).
+
+5.5 Go to mysql window again and check the contents of analysis_job table. Keep checking as the worker may spend some time in 'pending' state.
+
+    After the first worker is done you will see that 'start' jobs are now done and new 'part_multiply' and 'add_together' jobs have been created.
+    Also check the contents of 'intermediate_result' table, it should be empty at that moment:
+
+        MySQL> SELECT * from intermediate_result;
+
+    Go back to the beekeeper window and run the 'beekeeper.pl ... -run' for the second time.
+    It will submit another worker to the farm that will at some point get the 'part_multiply' jobs.
+
+5.6 Now check both 'analysis_job' and 'intermediate_result' tables again.
+    At some moment 'part_multiply' jobs will have been completed and the results will go into 'intermediate_result' table;
+    'add_together' jobs are still to be done.
+    
+    Check the contents of 'final_result' table (should be empty) and run the third and the last round of 'beekeeper.pl ... -run'
+
+5.7 Eventually you will see that all jobs have completed and the 'final_result' table contains final result(s) of multiplication.

--- a/docs/long_mult_pipeline.conf
+++ b/docs/long_mult_pipeline.conf
-## Configuration file for the long multiplication pipeline example
-#
-## Run it like this:
-#
-# init_pipeline_old.pl -conf long_mult_pipeline.conf
-# 
-
-    # code directories:
-my $ensembl_cvs_root_dir      = $ENV{'HOME'}.'/work';
-#my $ensembl_cvs_root_dir      = $ENV{'HOME'}.'/ensembl_main'; ## for some Compara developers
-
-    # long multiplication pipeline database connection parameters:
-my $pipeline_db = {
-    -host   => 'compara2',
-    -port   => 3306,
-    -user   => 'ensadmin',
-    -pass   => 'ensembl',
-    -dbname => $ENV{USER}.'_long_mult_pipeline',
-};
-
-return {
-        # pass connection parameters into the pipeline initialization script to create adaptors:
-    -pipeline_db => $pipeline_db,
-
-        # shell commands that create and possibly pre-fill the pipeline database:
-    -pipeline_create_commands => [
-        'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'",
-
-            # standard eHive tables and procedures:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/tables.sql",
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/procedures.sql",
-
-            # additional tables needed for long multiplication pipeline's operation:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE intermediate_result (a_multiplier char(40) NOT NULL, digit tinyint NOT NULL, result char(41) NOT NULL, PRIMARY KEY (a_multiplier, digit))'",
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'",
-
-            # name the pipeline to differentiate the submitted processes:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'INSERT INTO meta (meta_key, meta_value) VALUES (\"name\", \"lmult\")'",
-    ],
-
-    -pipeline_analyses => [
-        {   -logic_name => 'start',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start',
-            -parameters => {},
-            -input_ids => [
-                { 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' },
-                { 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' },
-            ],
-            -flow_into => {
-                2 => [ 'part_multiply' ],   # will create a fan of jobs
-                1 => [ 'add_together'  ],   # will create a funnel job to wait for the fan to complete and add the results
-            },
-        },
-
-        {   -logic_name    => 'part_multiply',
-            -module        => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply',
-            -parameters    => {},
-            -input_ids     => [
-                # (jobs for this analysis will be flown_into via branch-2 from 'start' jobs above)
-            ],
-        },
-        
-        {   -logic_name => 'add_together',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether',
-            -parameters => {},
-            -input_ids => [
-                # (jobs for this analysis will be flown_into via branch-1 from 'start' jobs above)
-            ],
-            -wait_for => [ 'part_multiply' ],   # we can only start adding when all partial products have been computed
-        },
-    ],
-};
-
--- a/docs/long_mult_sema_pipeline.conf
+++ b/docs/long_mult_sema_pipeline.conf
-## Configuration file for the long multiplication semaphored pipeline example
-#
-## Run it like this:
-#
-# init_pipeline_old.pl -conf long_mult_sema_pipeline.conf
-# 
-
-    # code directories:
-my $ensembl_cvs_root_dir      = $ENV{'HOME'}.'/work';
-#my $ensembl_cvs_root_dir      = $ENV{'HOME'}.'/ensembl_main'; ## for some Compara developers
-
-    # long multiplication pipeline database connection parameters:
-my $pipeline_db = {
-    -host   => 'compara2',
-    -port   => 3306,
-    -user   => 'ensadmin',
-    -pass   => 'ensembl',
-    -dbname => $ENV{USER}.'_long_mult_sema_pipeline',
-};
-
-return {
-        # pass connection parameters into the pipeline initialization script to create adaptors:
-    -pipeline_db => $pipeline_db,
-
-        # shell commands that create and possibly pre-fill the pipeline database:
-    -pipeline_create_commands => [
-        'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'",
-
-            # standard eHive tables and procedures:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/tables.sql",
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/procedures.sql",
-
-            # additional tables needed for long multiplication pipeline's operation:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE intermediate_result (a_multiplier char(40) NOT NULL, digit tinyint NOT NULL, result char(41) NOT NULL, PRIMARY KEY (a_multiplier, digit))'",
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'",
-
-            # name the pipeline to differentiate the submitted processes:
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'INSERT INTO meta (meta_key, meta_value) VALUES (\"name\", \"slmult\")'",
-    ],
-
-    -pipeline_analyses => [
-        {   -logic_name => 'sema_start',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::SemaStart',
-            -parameters => {},
-            -input_ids => [
-                { 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' },
-                { 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' },
-            ],
-            -flow_into => {
-                1 => [ 'add_together'  ],   # will create a semaphored funnel job to wait for the fan to complete and add the results
-                2 => [ 'part_multiply' ],   # will create a fan of jobs that control the semaphored funnel
-            },
-        },
-
-        {   -logic_name    => 'part_multiply',
-            -module        => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply',
-            -parameters    => {},
-            -input_ids     => [
-                # (jobs for this analysis will be flown_into via branch-2 from 'start' jobs above)
-            ],
-        },
-        
-        {   -logic_name => 'add_together',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether',
-            -parameters => {},
-            -input_ids => [
-                # (jobs for this analysis will be flown_into via branch-1 from 'start' jobs above)
-            ],
-            # jobs in this analyses are semaphored, so no need to '-wait_for'
-        },
-    ],
-};
-
--- a/docs/long_mult_semaphores.txt
+++ b/docs/long_mult_semaphores.txt
-############################################################################################################################
-#
-# Please see the long_mult_example_pipeline.txt first.
-#
-# This is an example ( a follow-up on 'long_mult_example_pipeline.txt', so make sure you have read it first)
-# of how to set up a pipeline with counting semaphores.
-#
-############################################################################################################################
-
-# 0. Cache MySQL connection parameters in a variable (they will work as eHive connection parameters as well) :
-export MYCONN="--host=hostname --port=port_number --user=username --password=secret"
-#
-# also, set the ENS_CODE_ROOT to the directory where ensembl packages are installed:
-export ENS_CODE_ROOT="$HOME/ensembl_main"
-
-# 1. Create an empty database:
-mysql $MYCONN -e 'DROP DATABASE IF EXISTS long_mult_test'
-mysql $MYCONN -e 'CREATE DATABASE long_mult_test'
-
-# 2. Create eHive infrastructure:
-mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/tables.sql
-
-# 3. Create analyses/control_rules/dataflow_rules of the LongMult pipeline:
-mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/create_sema_long_mult.sql
-
-# 4. "Load" the pipeline with a multiplication task:
-mysql $MYCONN long_mult_test <<EoF
-INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }");
-INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }");
-EoF
-
-# 5. Initialize the newly created eHive for the first time:
-beekeeper.pl $MYCONN --database=long_mult_test -sync
-
-# 6. You can either execute three individual workers (each picking one analysis of the pipeline):
-runWorker.pl $MYCONN --database=long_mult_test
-#
-# ... or run an automatic loop that will run workers for you:
-beekeeper.pl $MYCONN --database=long_mult_test -loop
-#
-# KNOWN BUG: if you keep suggesting your own analysis_id/logic_name, the system may sometimes think there is no work,
-#            where actually there will be some previously semaphored jobs that have become available yet invisible to some workers.
-# KNOWN FIX: just run "beekeeper.pl $MYCONN --database=long_mult_test -sync" once, and the problem should rectify itself.
-
-# 7. The results of the computations are to be found in 'final_result' table:
-mysql $MYCONN long_mult_test -e 'SELECT * FROM final_result'
-
-# 8. You can add more multiplication tasks by repeating from step 4.
-
--- a/docs/test_SqlCmd.conf
+++ b/docs/test_SqlCmd.conf
-# mini-pipeline for testing meta-parameter evaluation and SqlCmd in "external_db" mode
-
-my $cvs_root_dir      = $ENV{'HOME'}.'/work';
-
-    # family database connection parameters (our main database):
-my $pipeline_db = {
-    -host   => 'compara3',
-    -port   => 3306,
-    -user   => 'ensadmin',
-    -pass   => 'ensembl',
-    -dbname => "lg4_test_sqlcmd",
-};
-
-my $slave_db  = {
-    -host   => 'compara3',
-    -port   => 3306,
-    -user   => 'ensadmin',
-    -pass   => 'ensembl',
-    -dbname => "lg4_test_sqlcmd_slave",
-};
-
-return {
-        # pass connection parameters into the pipeline initialization script to create adaptors:
-    -pipeline_db => $pipeline_db,
-
-        # shell commands that create and pre-fill the pipeline database:
-    -pipeline_create_commands => [
-        'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'",
-
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$cvs_root_dir/ensembl-hive/sql/tables.sql",
-        'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$cvs_root_dir/ensembl-hive/sql/procedures.sql",
-
-        'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $slave_db->{-dbname}'",
-    ],
-
-    -pipeline_wide_parameters => {  # these parameter values are visible to all analyses, can be overridden by parameters{} and input_id{}
-
-        'db_conn' => $slave_db, # testing the stringification of a structure here
-    },
-
-    -resource_classes => {
-         0 => { -desc => 'default, 8h',      'LSF' => '' },
-         1 => { -desc => 'urgent',           'LSF' => '-q yesterday' },
-    },
-
-    -pipeline_analyses => [
-        {   -logic_name => 'create_table',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::SqlCmd',
-            -parameters => { },
-            -hive_capacity => 20, # to enable parallel branches
-            -input_ids => [
-                { 'sql' => 'CREATE TABLE distance (place_from char(40) NOT NULL, place_to char(40) NOT NULL, miles float, PRIMARY KEY (place_from, place_to))', },
-            ],
-            -rc_id => 1,
-        },
-
-        {   -logic_name => 'fill_in_table',
-            -module     => 'Bio::EnsEMBL::Hive::RunnableDB::SqlCmd',
-            -parameters => {
-                'sql' => [ "INSERT INTO distance (place_from, place_to, miles) VALUES ('#from#', '#to#', #miles#)",
-                           "INSERT INTO distance (place_from, place_to, miles) VALUES ('#to#', '#from#', #miles#)", ],
-            },
-            -hive_capacity => 20, # to enable parallel branches
-            -input_ids => [
-                { 'from' => 'Cambridge', 'to' => 'Ely', 'miles' => 18.3 },
-                { 'from' => 'London', 'to' => 'Cambridge', 'miles' => 60 },
-            ],
-            -wait_for => 'create_table',
-            -rc_id => 1,
-        },
-    ],
-};
-
--- a/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README
+++ b/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README
@@ -9,7 +9,7 @@
 #   and
 #       ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/SemaLongMult_conf.pm
 #
-#   which are used to load the Long Multiplicaton pipeline in "analysis control" and "semaphore job control" modes respectively.
+#   which are used to load the Long Multiplication pipeline in "analysis control" and "semaphore job control" modes respectively.
 #
 #
 #   Create these pipelines using init_pipeline.pl and run them using beekeeper.pl in step-by-step mode (use -run instead of -loop option).

--- a/sql/create_long_mult.sql
+++ b/sql/create_long_mult.sql
-
-    # create the 3 analyses we are going to use:
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'start',          'Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start');
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'part_multiply',  'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply');
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'add_together',   'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether');
-
-# link the analyses with control- and dataflow-rules:
-
-    # 'all_together' waits for 'part_multiply':
-INSERT INTO analysis_ctrl_rule (condition_analysis_url, ctrled_analysis_id) VALUES ('part_multiply', (SELECT analysis_id FROM analysis WHERE logic_name='add_together'));
-
-    # 'start' flows into a fan:
-INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'part_multiply', 2);
-
-    # 'start' flows into a funnel:
-INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'add_together', 1);
-
-    # create a table for holding intermediate results (written by 'part_multiply' and read by 'add_together')
-CREATE TABLE intermediate_result (
-    a_multiplier    char(40) NOT NULL,
-    digit           tinyint NOT NULL,
-    result          char(41) NOT NULL,
-    PRIMARY KEY (a_multiplier, digit)
-);
-
-    # create a table for holding final results (written by 'add_together')
-CREATE TABLE final_result (
-    a_multiplier    char(40) NOT NULL,
-    b_multiplier    char(40) NOT NULL,
-    result          char(80) NOT NULL,
-    PRIMARY KEY (a_multiplier, b_multiplier)
-);
-
--- a/sql/create_sema_long_mult.sql
+++ b/sql/create_sema_long_mult.sql
-
-    # create the 3 analyses we are going to use:
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'start',          'Bio::EnsEMBL::Hive::RunnableDB::LongMult::SemaStart');
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'part_multiply',  'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply');
-INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'add_together',   'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether');
-
-# (no control rules anymore, jobs are controlled via semaphores)
-
-    # 'start' flows into a fan:
-INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'part_multiply', 2);
-
-    # 'start' flows into a funnel:
-INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'add_together', 1);
-
-
-    # create a table for holding intermediate results (written by 'part_multiply' and read by 'add_together')
-CREATE TABLE intermediate_result (
-    a_multiplier    char(40) NOT NULL,
-    digit           tinyint NOT NULL,
-    result          char(41) NOT NULL,
-    PRIMARY KEY (a_multiplier, digit)
-);
-
-    # create a table for holding final results (written by 'add_together')
-CREATE TABLE final_result (
-    a_multiplier    char(40) NOT NULL,
-    b_multiplier    char(40) NOT NULL,
-    result          char(80) NOT NULL,
-    PRIMARY KEY (a_multiplier, b_multiplier)
-);
-
--- a/sql/load_long_mult.sql
+++ b/sql/load_long_mult.sql
-
-    # To multiply two long numbers using the long_mult pipeline
-    # we have to create the 'start' job and provide the two multipliers:
-
-INSERT INTO analysis_job (analysis_id, input_id) VALUES (
-    (SELECT analysis_id FROM analysis WHERE logic_name='start'),
-    "{ 'a_multiplier' => '123456789', 'b_multiplier' => '90319' }");
-