diff --git a/README b/README index 126d8994b126dbfbb8136111bd37c465bae4a5ee..d48e3ec08eaf86359cbf0441935f5c3089020bf9 100644 --- a/README +++ b/README @@ -16,6 +16,15 @@ Summary: Bio::EnsEMBL::Analysis::RunnableDB perl wrapper objects as nodes/blocks in the graphs but could be adapted more generally. +12 May, 2010 : Leo Gordon + +* init_pipeline.pl can be given a PipeConfig file name instead of full module name. + +* init_pipeline.pl has its own help that displays pod documentation (same mechanism as other eHive scripts) + +* 3 pipeline initialization modes supported: + full (default), -analysis_topup (pipeline development mode) and -job_topup (add more data to work with) + 11 May, 2010 : Leo Gordon diff --git a/docs/eHive_install_usage.txt b/docs/eHive_install_usage.txt index 3469fb8c862834b43943c85f252147b8144a79ba..068f0e12bcdacc4379173a82d2f8fd01e1b45adf 100644 --- a/docs/eHive_install_usage.txt +++ b/docs/eHive_install_usage.txt @@ -124,83 +124,3 @@ It will be convenient to set a variable pointing at this directory for future us 3.4 In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult we keep bespoke RunnableDBs for long multiplication example pipeline. - -4 Long multiplication example pipeline. - - Long multiplication pipeline solves a problem of multiplying two very long integer numbers by pretending the computations have to be done in parallel on the farm. - While performing the task it uses various features of eHive, so by studying this and other examples you can learn how to put together your own pipeines. - -4.1 The pipeline is defined in 4 files: - - * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/Start.pm splits a multiplication job into sub-tasks and creates corresponding jobs - - * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/PartMultiply.pm performs a partial multiplication and stores the intermediate result in a table - - * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/AddTogether.pm waits for partial multiplication results to compute and adds them together into final result - - * ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm the pipeline configuration module that links the previous Runnables into one pipeline - -4.2 The main part of any PipeConfig file, pipeline_analyses() method, defines the pipeline graph whose nodes are analyses and whose arcs are control and dataflow rules. - Each analysis hash must have: - -logic_name string name by which this analysis is referred to, - -module a name of the Runnable module that contains the code to be run (several analyses can use the same Runnable) - Optionally, it can also have: - -input_ids an array of hashes, each hash defining job-specific parameters (if empty it means jobs are created dynamically using dataflow mechanism) - -parameters usually a hash of analysis-wide parameters (each such parameter can be overriden by the same name parameter contained in an input_id hash) - -wait_for an array of other analyses, *controlling* this one (jobs of this analysis cannot start before all jobs of controlling analyses have completed) - -flow_into usually a hash that defines dataflow rules (rules of dynamic job creation during pipeline execution) from this particular analysis. - - The meaning of these parameters should become clearer after some experimentation with the pipeline. - - -5 Initialization and running the long multiplication pipeline. - -5.1 Before running the pipeline you will have to initialize it using init_pipeline.pl script supplying PipeConfig module and the necessary parameters. - Have another look at LongMult_conf.pm file. The default_options() method returns a hash that pretty much defines what parameters you can/should supply to init_pipeline.pl . - You will probably need to specify the following: - - $ init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf \ - -ensembl_cvs_root_dir $ENS_CODE_ROOT \ - -pipeline_db -host=<your_mysql_host> \ - -pipeline_db -user=<your_mysql_username> \ - -pipeline_db -user=<your_mysql_password> \ - - This should create a fresh eHive database and initalize it with long multiplication pipeline data (the two numbers to be multiplied are taken from defaults). - - Upon successful completion init_pipeline.pl will print several beekeeper commands and - a mysql command for connecting to the newly created database. - Copy and run the mysql command in a separate shell session to follow the progress of the pipeline. - -5.2 Run the first beekeeper command that contains '-sync' option. This will initialize database's internal stats and determine which jobs can be run. - -5.3 Now you have two options: either to run the beekeeper.pl in automatic mode using '-loop' option and wait until it completes, - or run it in step-by-step mode, initiating every step by separate executions of 'beekeeper.pl ... -run' command. - We will use the step-by-step mode in order to see what is going on. - -5.4 Go to mysql window and check the contents of analysis_job table: - - MySQL> SELECT * FROM analysis_job; - - It will only contain jobs that set up the multiplication tasks in 'READY' mode - meaning 'ready to be taken by workers and executed'. - - Go to the beekeeper window and run the 'beekeeper.pl ... -run' once. - It will submit a worker to the farm that will at some point get the 'start' job(s). - -5.5 Go to mysql window again and check the contents of analysis_job table. Keep checking as the worker may spend some time in 'pending' state. - - After the first worker is done you will see that 'start' jobs are now done and new 'part_multiply' and 'add_together' jobs have been created. - Also check the contents of 'intermediate_result' table, it should be empty at that moment: - - MySQL> SELECT * from intermediate_result; - - Go back to the beekeeper window and run the 'beekeeper.pl ... -run' for the second time. - It will submit another worker to the farm that will at some point get the 'part_multiply' jobs. - -5.6 Now check both 'analysis_job' and 'intermediate_result' tables again. - At some moment 'part_multiply' jobs will have been completed and the results will go into 'intermediate_result' table; - 'add_together' jobs are still to be done. - - Check the contents of 'final_result' table (should be empty) and run the third and the last round of 'beekeeper.pl ... -run' - -5.7 Eventually you will see that all jobs have completed and the 'final_result' table contains final result(s) of multiplication. - diff --git a/docs/long_mult_example_pipeline.txt b/docs/long_mult_example_pipeline.txt index 7591d9b3da256908012bb9b65237673aba6a1527..e824db4cdc24724a9d962a732bfcdf9221825ddf 100644 --- a/docs/long_mult_example_pipeline.txt +++ b/docs/long_mult_example_pipeline.txt @@ -1,58 +1,92 @@ -############################################################################################################################ -# -# Bio::EnsEMBL::Hive::RunnableDB::LongMult is an example eHive pipeline that demonstates the following features: -# -# A) A pipeline can have multiple analyses (this one has three: 'start', 'part_multiply' and 'add_together'). -# -# B) A job of one analysis can create jobs of another analysis (one 'start' job creates up to 8 'part_multiply' jobs). -# -# C) A job of one analysis can "flow the data" into another analysis (a 'start' job "flows into" an 'add_together' job). -# -# D) Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed -# ('add_together' is blocked both by 'start' and 'part_multiply'). -# -# E) As filesystems are frequently a bottleneck for big pipelines, it is advised that eHive processes store intermediate -# and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used). -# -############################################################################################################################ - -# 0. Cache MySQL connection parameters in a variable (they will work as eHive connection parameters as well) : -export MYCONN="--host=hostname --port=port_number --user=username --password=secret" -# -# also, set the ENS_CODE_ROOT to the directory where ensembl packages are installed: -export ENS_CODE_ROOT="$HOME/ensembl_main" - -# 1. Create an empty database: -mysql $MYCONN -e 'DROP DATABASE IF EXISTS long_mult_test' -mysql $MYCONN -e 'CREATE DATABASE long_mult_test' - -# 2. Create eHive infrastructure: -mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/tables.sql - -# 3. Create analyses/control_rules/dataflow_rules of the LongMult pipeline: -mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/create_long_mult.sql - -# 4. "Load" the pipeline with a multiplication task: -mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/load_long_mult.sql -# -# or you can add your own task(s). Several tasks can be added at once: -mysql $MYCONN long_mult_test <<EoF -INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }"); -INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }"); -EoF - -# 5. Initialize the newly created eHive for the first time: -beekeeper.pl $MYCONN --database=long_mult_test -sync - -# 6. You can either execute three individual workers (each picking one analysis of the pipeline): -runWorker.pl $MYCONN --database=long_mult_test -# -# -# ... or run an automatic loop that will run workers for you: -beekeeper.pl $MYCONN --database=long_mult_test -loop - -# 7. The results of the computations are to be found in 'final_result' table: -mysql $MYCONN long_mult_test -e 'SELECT * FROM final_result' - -# 8. You can add more multiplication tasks by repeating from step 4. + +4 Long multiplication example pipeline. + +4.1 Long multiplication pipeline solves a problem of multiplying two very long integer numbers by pretending the computations have to be done in parallel on the farm. + While performing the task it demonstates the use of the following features: + + A) A pipeline can have multiple analyses (this one has three: 'start', 'part_multiply' and 'add_together'). + + B) A job of one analysis can create jobs of other analyses by 'flowing the data' down numbered channels or branches. + These branches are then assigned specific analysis names in the pipeline configuration file + (one 'start' job flows partial multiplication subtasks down branch #2 and a task of adding them together down branch #1). + + C) Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed + ('add_together' is blocked both by 'part_multiply'). + + D) As filesystems are frequently a bottleneck for big pipelines, it is advised that eHive processes store intermediate + and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used). + +4.2 The pipeline is defined in 4 files: + + * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/Start.pm splits a multiplication job into sub-tasks and creates corresponding jobs + + * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/PartMultiply.pm performs a partial multiplication and stores the intermediate result in a table + + * ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/AddTogether.pm waits for partial multiplication results to compute and adds them together into final result + + * ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm the pipeline configuration module that links the previous Runnables into one pipeline + +4.3 The main part of any PipeConfig file, pipeline_analyses() method, defines the pipeline graph whose nodes are analyses and whose arcs are control and dataflow rules. + Each analysis hash must have: + -logic_name string name by which this analysis is referred to, + -module a name of the Runnable module that contains the code to be run (several analyses can use the same Runnable) + Optionally, it can also have: + -input_ids an array of hashes, each hash defining job-specific parameters (if empty it means jobs are created dynamically using dataflow mechanism) + -parameters usually a hash of analysis-wide parameters (each such parameter can be overriden by the same name parameter contained in an input_id hash) + -wait_for an array of other analyses, *controlling* this one (jobs of this analysis cannot start before all jobs of controlling analyses have completed) + -flow_into usually a hash that defines dataflow rules (rules of dynamic job creation during pipeline execution) from this particular analysis. + + The meaning of these parameters should become clearer after some experimentation with the pipeline. + + +5 Initialization and running the long multiplication pipeline. + +5.1 Before running the pipeline you will have to initialize it using init_pipeline.pl script supplying PipeConfig module and the necessary parameters. + Have another look at LongMult_conf.pm file. The default_options() method returns a hash that pretty much defines what parameters you can/should supply to init_pipeline.pl . + You will probably need to specify the following: + + $ init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::LongMult_conf \ + -ensembl_cvs_root_dir $ENS_CODE_ROOT \ + -pipeline_db -host=<your_mysql_host> \ + -pipeline_db -user=<your_mysql_username> \ + -pipeline_db -user=<your_mysql_password> \ + + This should create a fresh eHive database and initalize it with long multiplication pipeline data (the two numbers to be multiplied are taken from defaults). + + Upon successful completion init_pipeline.pl will print several beekeeper commands and + a mysql command for connecting to the newly created database. + Copy and run the mysql command in a separate shell session to follow the progress of the pipeline. + +5.2 Run the first beekeeper command that contains '-sync' option. This will initialize database's internal stats and determine which jobs can be run. + +5.3 Now you have two options: either to run the beekeeper.pl in automatic mode using '-loop' option and wait until it completes, + or run it in step-by-step mode, initiating every step by separate executions of 'beekeeper.pl ... -run' command. + We will use the step-by-step mode in order to see what is going on. + +5.4 Go to mysql window and check the contents of analysis_job table: + + MySQL> SELECT * FROM analysis_job; + + It will only contain jobs that set up the multiplication tasks in 'READY' mode - meaning 'ready to be taken by workers and executed'. + + Go to the beekeeper window and run the 'beekeeper.pl ... -run' once. + It will submit a worker to the farm that will at some point get the 'start' job(s). + +5.5 Go to mysql window again and check the contents of analysis_job table. Keep checking as the worker may spend some time in 'pending' state. + + After the first worker is done you will see that 'start' jobs are now done and new 'part_multiply' and 'add_together' jobs have been created. + Also check the contents of 'intermediate_result' table, it should be empty at that moment: + + MySQL> SELECT * from intermediate_result; + + Go back to the beekeeper window and run the 'beekeeper.pl ... -run' for the second time. + It will submit another worker to the farm that will at some point get the 'part_multiply' jobs. + +5.6 Now check both 'analysis_job' and 'intermediate_result' tables again. + At some moment 'part_multiply' jobs will have been completed and the results will go into 'intermediate_result' table; + 'add_together' jobs are still to be done. + + Check the contents of 'final_result' table (should be empty) and run the third and the last round of 'beekeeper.pl ... -run' + +5.7 Eventually you will see that all jobs have completed and the 'final_result' table contains final result(s) of multiplication. diff --git a/docs/long_mult_pipeline.conf b/docs/long_mult_pipeline.conf deleted file mode 100755 index 335447100e2ea5ef93ca1018233a51c774acd025..0000000000000000000000000000000000000000 --- a/docs/long_mult_pipeline.conf +++ /dev/null @@ -1,73 +0,0 @@ -## Configuration file for the long multiplication pipeline example -# -## Run it like this: -# -# init_pipeline_old.pl -conf long_mult_pipeline.conf -# - - # code directories: -my $ensembl_cvs_root_dir = $ENV{'HOME'}.'/work'; -#my $ensembl_cvs_root_dir = $ENV{'HOME'}.'/ensembl_main'; ## for some Compara developers - - # long multiplication pipeline database connection parameters: -my $pipeline_db = { - -host => 'compara2', - -port => 3306, - -user => 'ensadmin', - -pass => 'ensembl', - -dbname => $ENV{USER}.'_long_mult_pipeline', -}; - -return { - # pass connection parameters into the pipeline initialization script to create adaptors: - -pipeline_db => $pipeline_db, - - # shell commands that create and possibly pre-fill the pipeline database: - -pipeline_create_commands => [ - 'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'", - - # standard eHive tables and procedures: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/tables.sql", - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/procedures.sql", - - # additional tables needed for long multiplication pipeline's operation: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE intermediate_result (a_multiplier char(40) NOT NULL, digit tinyint NOT NULL, result char(41) NOT NULL, PRIMARY KEY (a_multiplier, digit))'", - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'", - - # name the pipeline to differentiate the submitted processes: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'INSERT INTO meta (meta_key, meta_value) VALUES (\"name\", \"lmult\")'", - ], - - -pipeline_analyses => [ - { -logic_name => 'start', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start', - -parameters => {}, - -input_ids => [ - { 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }, - { 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }, - ], - -flow_into => { - 2 => [ 'part_multiply' ], # will create a fan of jobs - 1 => [ 'add_together' ], # will create a funnel job to wait for the fan to complete and add the results - }, - }, - - { -logic_name => 'part_multiply', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply', - -parameters => {}, - -input_ids => [ - # (jobs for this analysis will be flown_into via branch-2 from 'start' jobs above) - ], - }, - - { -logic_name => 'add_together', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether', - -parameters => {}, - -input_ids => [ - # (jobs for this analysis will be flown_into via branch-1 from 'start' jobs above) - ], - -wait_for => [ 'part_multiply' ], # we can only start adding when all partial products have been computed - }, - ], -}; - diff --git a/docs/long_mult_sema_pipeline.conf b/docs/long_mult_sema_pipeline.conf deleted file mode 100755 index 99060217981d4a4323be18409076f1bff6c67ff9..0000000000000000000000000000000000000000 --- a/docs/long_mult_sema_pipeline.conf +++ /dev/null @@ -1,73 +0,0 @@ -## Configuration file for the long multiplication semaphored pipeline example -# -## Run it like this: -# -# init_pipeline_old.pl -conf long_mult_sema_pipeline.conf -# - - # code directories: -my $ensembl_cvs_root_dir = $ENV{'HOME'}.'/work'; -#my $ensembl_cvs_root_dir = $ENV{'HOME'}.'/ensembl_main'; ## for some Compara developers - - # long multiplication pipeline database connection parameters: -my $pipeline_db = { - -host => 'compara2', - -port => 3306, - -user => 'ensadmin', - -pass => 'ensembl', - -dbname => $ENV{USER}.'_long_mult_sema_pipeline', -}; - -return { - # pass connection parameters into the pipeline initialization script to create adaptors: - -pipeline_db => $pipeline_db, - - # shell commands that create and possibly pre-fill the pipeline database: - -pipeline_create_commands => [ - 'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'", - - # standard eHive tables and procedures: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/tables.sql", - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$ensembl_cvs_root_dir/ensembl-hive/sql/procedures.sql", - - # additional tables needed for long multiplication pipeline's operation: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE intermediate_result (a_multiplier char(40) NOT NULL, digit tinyint NOT NULL, result char(41) NOT NULL, PRIMARY KEY (a_multiplier, digit))'", - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'CREATE TABLE final_result (a_multiplier char(40) NOT NULL, b_multiplier char(40) NOT NULL, result char(80) NOT NULL, PRIMARY KEY (a_multiplier, b_multiplier))'", - - # name the pipeline to differentiate the submitted processes: - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." -e 'INSERT INTO meta (meta_key, meta_value) VALUES (\"name\", \"slmult\")'", - ], - - -pipeline_analyses => [ - { -logic_name => 'sema_start', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::SemaStart', - -parameters => {}, - -input_ids => [ - { 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }, - { 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }, - ], - -flow_into => { - 1 => [ 'add_together' ], # will create a semaphored funnel job to wait for the fan to complete and add the results - 2 => [ 'part_multiply' ], # will create a fan of jobs that control the semaphored funnel - }, - }, - - { -logic_name => 'part_multiply', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply', - -parameters => {}, - -input_ids => [ - # (jobs for this analysis will be flown_into via branch-2 from 'start' jobs above) - ], - }, - - { -logic_name => 'add_together', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether', - -parameters => {}, - -input_ids => [ - # (jobs for this analysis will be flown_into via branch-1 from 'start' jobs above) - ], - # jobs in this analyses are semaphored, so no need to '-wait_for' - }, - ], -}; - diff --git a/docs/long_mult_semaphores.txt b/docs/long_mult_semaphores.txt deleted file mode 100644 index 77b3e76ef74dc958a4d74abbcb5913a850183ab1..0000000000000000000000000000000000000000 --- a/docs/long_mult_semaphores.txt +++ /dev/null @@ -1,49 +0,0 @@ -############################################################################################################################ -# -# Please see the long_mult_example_pipeline.txt first. -# -# This is an example ( a follow-up on 'long_mult_example_pipeline.txt', so make sure you have read it first) -# of how to set up a pipeline with counting semaphores. -# -############################################################################################################################ - -# 0. Cache MySQL connection parameters in a variable (they will work as eHive connection parameters as well) : -export MYCONN="--host=hostname --port=port_number --user=username --password=secret" -# -# also, set the ENS_CODE_ROOT to the directory where ensembl packages are installed: -export ENS_CODE_ROOT="$HOME/ensembl_main" - -# 1. Create an empty database: -mysql $MYCONN -e 'DROP DATABASE IF EXISTS long_mult_test' -mysql $MYCONN -e 'CREATE DATABASE long_mult_test' - -# 2. Create eHive infrastructure: -mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/tables.sql - -# 3. Create analyses/control_rules/dataflow_rules of the LongMult pipeline: -mysql $MYCONN long_mult_test <$ENS_CODE_ROOT/ensembl-hive/sql/create_sema_long_mult.sql - -# 4. "Load" the pipeline with a multiplication task: -mysql $MYCONN long_mult_test <<EoF -INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '9650516169', 'b_multiplier' => '327358788' }"); -INSERT INTO analysis_job (analysis_id, input_id) VALUES ( 1, "{ 'a_multiplier' => '327358788', 'b_multiplier' => '9650516169' }"); -EoF - -# 5. Initialize the newly created eHive for the first time: -beekeeper.pl $MYCONN --database=long_mult_test -sync - -# 6. You can either execute three individual workers (each picking one analysis of the pipeline): -runWorker.pl $MYCONN --database=long_mult_test -# -# ... or run an automatic loop that will run workers for you: -beekeeper.pl $MYCONN --database=long_mult_test -loop -# -# KNOWN BUG: if you keep suggesting your own analysis_id/logic_name, the system may sometimes think there is no work, -# where actually there will be some previously semaphored jobs that have become available yet invisible to some workers. -# KNOWN FIX: just run "beekeeper.pl $MYCONN --database=long_mult_test -sync" once, and the problem should rectify itself. - -# 7. The results of the computations are to be found in 'final_result' table: -mysql $MYCONN long_mult_test -e 'SELECT * FROM final_result' - -# 8. You can add more multiplication tasks by repeating from step 4. - diff --git a/docs/test_SqlCmd.conf b/docs/test_SqlCmd.conf deleted file mode 100755 index 581603346fef9db59f0ee3dc54fac7eaf7fb6100..0000000000000000000000000000000000000000 --- a/docs/test_SqlCmd.conf +++ /dev/null @@ -1,73 +0,0 @@ -# mini-pipeline for testing meta-parameter evaluation and SqlCmd in "external_db" mode - -my $cvs_root_dir = $ENV{'HOME'}.'/work'; - - # family database connection parameters (our main database): -my $pipeline_db = { - -host => 'compara3', - -port => 3306, - -user => 'ensadmin', - -pass => 'ensembl', - -dbname => "lg4_test_sqlcmd", -}; - -my $slave_db = { - -host => 'compara3', - -port => 3306, - -user => 'ensadmin', - -pass => 'ensembl', - -dbname => "lg4_test_sqlcmd_slave", -}; - -return { - # pass connection parameters into the pipeline initialization script to create adaptors: - -pipeline_db => $pipeline_db, - - # shell commands that create and pre-fill the pipeline database: - -pipeline_create_commands => [ - 'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $pipeline_db->{-dbname}'", - - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$cvs_root_dir/ensembl-hive/sql/tables.sql", - 'mysql '.dbconn_2_mysql($pipeline_db, 1)." <$cvs_root_dir/ensembl-hive/sql/procedures.sql", - - 'mysql '.dbconn_2_mysql($pipeline_db, 0)." -e 'CREATE DATABASE $slave_db->{-dbname}'", - ], - - -pipeline_wide_parameters => { # these parameter values are visible to all analyses, can be overridden by parameters{} and input_id{} - - 'db_conn' => $slave_db, # testing the stringification of a structure here - }, - - -resource_classes => { - 0 => { -desc => 'default, 8h', 'LSF' => '' }, - 1 => { -desc => 'urgent', 'LSF' => '-q yesterday' }, - }, - - -pipeline_analyses => [ - { -logic_name => 'create_table', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::SqlCmd', - -parameters => { }, - -hive_capacity => 20, # to enable parallel branches - -input_ids => [ - { 'sql' => 'CREATE TABLE distance (place_from char(40) NOT NULL, place_to char(40) NOT NULL, miles float, PRIMARY KEY (place_from, place_to))', }, - ], - -rc_id => 1, - }, - - { -logic_name => 'fill_in_table', - -module => 'Bio::EnsEMBL::Hive::RunnableDB::SqlCmd', - -parameters => { - 'sql' => [ "INSERT INTO distance (place_from, place_to, miles) VALUES ('#from#', '#to#', #miles#)", - "INSERT INTO distance (place_from, place_to, miles) VALUES ('#to#', '#from#', #miles#)", ], - }, - -hive_capacity => 20, # to enable parallel branches - -input_ids => [ - { 'from' => 'Cambridge', 'to' => 'Ely', 'miles' => 18.3 }, - { 'from' => 'London', 'to' => 'Cambridge', 'miles' => 60 }, - ], - -wait_for => 'create_table', - -rc_id => 1, - }, - ], -}; - diff --git a/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README b/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README index 82e8e59f1ae25dc8e213221f01e93d51f6fd2319..f0b3c33403b867469d91e9cd9db3be08e460fe07 100644 --- a/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README +++ b/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/README @@ -9,7 +9,7 @@ # and # ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig/SemaLongMult_conf.pm # -# which are used to load the Long Multiplicaton pipeline in "analysis control" and "semaphore job control" modes respectively. +# which are used to load the Long Multiplication pipeline in "analysis control" and "semaphore job control" modes respectively. # # # Create these pipelines using init_pipeline.pl and run them using beekeeper.pl in step-by-step mode (use -run instead of -loop option). diff --git a/sql/create_long_mult.sql b/sql/create_long_mult.sql deleted file mode 100644 index f91945ee1ce91577b92b544c4db9c51ac03bdee0..0000000000000000000000000000000000000000 --- a/sql/create_long_mult.sql +++ /dev/null @@ -1,33 +0,0 @@ - - # create the 3 analyses we are going to use: -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'start', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::Start'); -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'part_multiply', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply'); -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'add_together', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether'); - -# link the analyses with control- and dataflow-rules: - - # 'all_together' waits for 'part_multiply': -INSERT INTO analysis_ctrl_rule (condition_analysis_url, ctrled_analysis_id) VALUES ('part_multiply', (SELECT analysis_id FROM analysis WHERE logic_name='add_together')); - - # 'start' flows into a fan: -INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'part_multiply', 2); - - # 'start' flows into a funnel: -INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'add_together', 1); - - # create a table for holding intermediate results (written by 'part_multiply' and read by 'add_together') -CREATE TABLE intermediate_result ( - a_multiplier char(40) NOT NULL, - digit tinyint NOT NULL, - result char(41) NOT NULL, - PRIMARY KEY (a_multiplier, digit) -); - - # create a table for holding final results (written by 'add_together') -CREATE TABLE final_result ( - a_multiplier char(40) NOT NULL, - b_multiplier char(40) NOT NULL, - result char(80) NOT NULL, - PRIMARY KEY (a_multiplier, b_multiplier) -); - diff --git a/sql/create_sema_long_mult.sql b/sql/create_sema_long_mult.sql deleted file mode 100644 index 6a200eeeebd400ef06c45a2263e1b8b84abf355b..0000000000000000000000000000000000000000 --- a/sql/create_sema_long_mult.sql +++ /dev/null @@ -1,31 +0,0 @@ - - # create the 3 analyses we are going to use: -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'start', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::SemaStart'); -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'part_multiply', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::PartMultiply'); -INSERT INTO analysis (created, logic_name, module) VALUES (NOW(), 'add_together', 'Bio::EnsEMBL::Hive::RunnableDB::LongMult::AddTogether'); - -# (no control rules anymore, jobs are controlled via semaphores) - - # 'start' flows into a fan: -INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'part_multiply', 2); - - # 'start' flows into a funnel: -INSERT INTO dataflow_rule (from_analysis_id, to_analysis_url, branch_code) VALUES ((SELECT analysis_id FROM analysis WHERE logic_name='start'), 'add_together', 1); - - - # create a table for holding intermediate results (written by 'part_multiply' and read by 'add_together') -CREATE TABLE intermediate_result ( - a_multiplier char(40) NOT NULL, - digit tinyint NOT NULL, - result char(41) NOT NULL, - PRIMARY KEY (a_multiplier, digit) -); - - # create a table for holding final results (written by 'add_together') -CREATE TABLE final_result ( - a_multiplier char(40) NOT NULL, - b_multiplier char(40) NOT NULL, - result char(80) NOT NULL, - PRIMARY KEY (a_multiplier, b_multiplier) -); - diff --git a/sql/load_long_mult.sql b/sql/load_long_mult.sql deleted file mode 100644 index 687912ee1539c08bde3a94f7e04351b10bf10b7a..0000000000000000000000000000000000000000 --- a/sql/load_long_mult.sql +++ /dev/null @@ -1,8 +0,0 @@ - - # To multiply two long numbers using the long_mult pipeline - # we have to create the 'start' job and provide the two multipliers: - -INSERT INTO analysis_job (analysis_id, input_id) VALUES ( - (SELECT analysis_id FROM analysis WHERE logic_name='start'), - "{ 'a_multiplier' => '123456789', 'b_multiplier' => '90319' }"); -