Fixing issues with the pipeline module used

fb5b2eb8 · Andy Yates · 1cd2f6d2 · fb5b2eb8 · fb5b2eb8
Commit fb5b2eb8 authored 12 years ago by Andy Yates
--- a/docs/pipelines/fasta.html
+++ b/docs/pipelines/fasta.html
@@ -52,21 +52,21 @@
 </code></pre><p>If you do override the config then you should use the package name for your overridden config in the upcoming example commands.</p><h2 id="Environment">Environment</h2><h3 id="PERL5LIB">PERL5LIB</h3><ul><li>ensembl</li><li>ensembl-hive</li><li>bioperl</li></ul><h3 id="PATH">PATH</h3><ul><li>ensembl-hive/scripts</li><li>faToTwoBit (if not using a custom location)</li><li>xdformat (if not using a custom location)</li></ul><h3 id="ENSEMBLCVSROOTDIR">ENSEMBL_CVS_ROOT_DIR</h3><p>Set to the base checkout of Ensembl. We should be able to add <strong>ensembl-hive/sql</strong> onto this path to find the SQL directory for hive e.g.</p><pre><code>	export ENSEMBL_CVS_ROOT_DIR=$HOME/work/ensembl-checkouts
 </code></pre><h3 id="ENSADMINPSW">ENSADMIN_PSW</h3><p>Give the password to use to log into a database server e.g.</p><pre><code>	export ENSADMIN_PSW=wibble
 </code></pre><h2 id="CommandLineArguments">Command Line Arguments</h2><p>Where <strong>Multiple Supported</strong> is supported we allow the user to specify the parameter more than once on the command line. For example species is one of these options e.g. </p><pre><code>-species human -species cele -species yeast@
-</code></pre><table><tr><th>Name </th><th> Type</th><th>Multiple Supported</th><th> Description</th><th>Default</th><th> Required</th></tr><tr><td><code>-registry</code></td><td>String</td><td>No</td><td>Location of the Ensembl registry to use with this pipeline</td><td>-</td><td><strong>YES</strong></td></tr><tr><td><code>-base_path</code></td><td>String</td><td>No</td><td>Location of the dumps</td><td>-</td><td><strong>YES</strong></td></tr><tr><td><code>-pipeline_db -host=</code></td><td>String</td><td>No</td><td>Specify a host for the hive database e.g. <code>-pipeline_db -host=myserver.mysql</code></td><td>See hive generic config</td><td><strong>YES</strong></td></tr><tr><td><code>-pipeline_db -dbname=</code></td><td>String</td><td>No</td><td>Specify a different database to use as the hive DB e.g. <code>-pipeline_db -dbname=my_dumps_test</code></td><td>Uses pipeline name by default</td><td><strong>NO</strong></td></tr><tr><td><code>-ftp_dir</code></td><td>String</td><td>No</td><td>Location of the current FTP directory with the previous release&#8217;s files. We will use this to copy DNA files from one release to another. If not given we do not do any reuse</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-species</code></td><td>String</td><td>Yes</td><td>Specify one or more species to process. Pipeline will only <em>consider</em> these species. Use <strong>-force_species</strong> if you want to force a species run</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-force_species</code></td><td>String</td><td>Yes</td><td>Specify one or more species to force through the pipeline. This is useful to force a dump when you know reuse will not do the <em>"right thing"</em></td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-dump_types</code></td><td>String</td><td>Yes</td><td>Specify each type of dump you want to produce. Supported values are <strong>dna</strong>, <strong>cdna</strong> and <strong>ncrna</strong></td><td>All</td><td><strong>NO</strong></td></tr><tr><td><code>-db_types</code></td><td>String</td><td>Yes</td><td>The database types to use. Supports the normal db adaptor groups e.g. <strong>core</strong>, <strong>otherfeatures</strong> etc.</td><td>core</td><td><strong>NO</strong></td></tr><tr><td><code>-release</code></td><td>Integer</td><td>No</td><td>The release to dump</td><td>Software version</td><td><strong>NO</strong></td></tr><tr><td><code>-previous_release</code></td><td>Integer</td><td>No</td><td>The previous release to use. Used to calculate reuse</td><td>Software version minus 1</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_servers</code></td><td>String</td><td>Yes</td><td>The servers to copy blast indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_genomic_dir</code></td><td>String</td><td>No</td><td>Location to copy the DNA blast indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_genes_dir</code></td><td>String</td><td>No</td><td>Location to copy the DNA gene (cdna, ncrna and protein) indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-scp_user</code></td><td>String</td><td>No</td><td>User to perform the SCP as. Defaults to the current user</td><td>Current user</td><td><strong>NO</strong></td></tr><tr><td><code>-scp_identity</code></td><td>String</td><td>No</td><td>The SSH identity file to use when performing SCPs. Normally used in conjunction with <strong>-scp_user</strong></td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-no_scp</code></td><td>Boolean</td><td>No</td><td>Skip SCP altogether</td><td>0</td><td><strong>NO</strong></td></tr><tr><td><code>-pipeline_name</code></td><td>String</td><td>No</td><td>Name to use for the pipeline</td><td>fasta_dump_$release</td><td><strong>NO</strong></td></tr><tr><td><code>-wublast_exe</code></td><td>String</td><td>No</td><td>Location of the WUBlast indexing binary</td><td>xdformat</td><td><strong>NO</strong></td></tr><tr><td><code>-blat_exe</code></td><td>String</td><td>No</td><td>Location of the Blat indexing binary</td><td>faToTwoBit</td><td><strong>NO</strong></td></tr><tr><td><code>-port_offset</code></td><td>Integer</td><td>No</td><td>The offset of the ports to use when generating blat indexes. This figure is added onto the web database species ID</td><td>30000</td><td><strong>NO</strong></td></tr><tr><td><code>-email</code></td><td>String</td><td>No</td><td>Email to send pipeline summaries to upon its successful completion</td><td>$USER@sanger.ac.uk</td><td><strong>NO</strong></td></tr></table><h2 id="ExampleCommands">Example Commands</h2><h3 id="Toloadusenormally">To load use normally:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><table><tr><th>Name </th><th> Type</th><th>Multiple Supported</th><th> Description</th><th>Default</th><th> Required</th></tr><tr><td><code>-registry</code></td><td>String</td><td>No</td><td>Location of the Ensembl registry to use with this pipeline</td><td>-</td><td><strong>YES</strong></td></tr><tr><td><code>-base_path</code></td><td>String</td><td>No</td><td>Location of the dumps</td><td>-</td><td><strong>YES</strong></td></tr><tr><td><code>-pipeline_db -host=</code></td><td>String</td><td>No</td><td>Specify a host for the hive database e.g. <code>-pipeline_db -host=myserver.mysql</code></td><td>See hive generic config</td><td><strong>YES</strong></td></tr><tr><td><code>-pipeline_db -dbname=</code></td><td>String</td><td>No</td><td>Specify a different database to use as the hive DB e.g. <code>-pipeline_db -dbname=my_dumps_test</code></td><td>Uses pipeline name by default</td><td><strong>NO</strong></td></tr><tr><td><code>-ftp_dir</code></td><td>String</td><td>No</td><td>Location of the current FTP directory with the previous release&#8217;s files. We will use this to copy DNA files from one release to another. If not given we do not do any reuse</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-species</code></td><td>String</td><td>Yes</td><td>Specify one or more species to process. Pipeline will only <em>consider</em> these species. Use <strong>-force_species</strong> if you want to force a species run</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-force_species</code></td><td>String</td><td>Yes</td><td>Specify one or more species to force through the pipeline. This is useful to force a dump when you know reuse will not do the <em>"right thing"</em></td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-dump_types</code></td><td>String</td><td>Yes</td><td>Specify each type of dump you want to produce. Supported values are <strong>dna</strong>, <strong>cdna</strong> and <strong>ncrna</strong></td><td>All</td><td><strong>NO</strong></td></tr><tr><td><code>-db_types</code></td><td>String</td><td>Yes</td><td>The database types to use. Supports the normal db adaptor groups e.g. <strong>core</strong>, <strong>otherfeatures</strong> etc.</td><td>core</td><td><strong>NO</strong></td></tr><tr><td><code>-release</code></td><td>Integer</td><td>No</td><td>The release to dump</td><td>Software version</td><td><strong>NO</strong></td></tr><tr><td><code>-previous_release</code></td><td>Integer</td><td>No</td><td>The previous release to use. Used to calculate reuse</td><td>Software version minus 1</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_servers</code></td><td>String</td><td>Yes</td><td>The servers to copy blast indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_genomic_dir</code></td><td>String</td><td>No</td><td>Location to copy the DNA blast indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-blast_genes_dir</code></td><td>String</td><td>No</td><td>Location to copy the DNA gene (cdna, ncrna and protein) indexes to</td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-scp_user</code></td><td>String</td><td>No</td><td>User to perform the SCP as. Defaults to the current user</td><td>Current user</td><td><strong>NO</strong></td></tr><tr><td><code>-scp_identity</code></td><td>String</td><td>No</td><td>The SSH identity file to use when performing SCPs. Normally used in conjunction with <strong>-scp_user</strong></td><td>-</td><td><strong>NO</strong></td></tr><tr><td><code>-no_scp</code></td><td>Boolean</td><td>No</td><td>Skip SCP altogether</td><td>0</td><td><strong>NO</strong></td></tr><tr><td><code>-pipeline_name</code></td><td>String</td><td>No</td><td>Name to use for the pipeline</td><td>fasta_dump_$release</td><td><strong>NO</strong></td></tr><tr><td><code>-wublast_exe</code></td><td>String</td><td>No</td><td>Location of the WUBlast indexing binary</td><td>xdformat</td><td><strong>NO</strong></td></tr><tr><td><code>-blat_exe</code></td><td>String</td><td>No</td><td>Location of the Blat indexing binary</td><td>faToTwoBit</td><td><strong>NO</strong></td></tr><tr><td><code>-port_offset</code></td><td>Integer</td><td>No</td><td>The offset of the ports to use when generating blat indexes. This figure is added onto the web database species ID</td><td>30000</td><td><strong>NO</strong></td></tr><tr><td><code>-email</code></td><td>String</td><td>No</td><td>Email to send pipeline summaries to upon its successful completion</td><td>$USER@sanger.ac.uk</td><td><strong>NO</strong></td></tr></table><h2 id="ExampleCommands">Example Commands</h2><h3 id="Toloadusenormally">To load use normally:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -base_path /path/to/dumps -registry reg.pm
-</code></pre><h3 id="Runasubsetofspeciesnoforcingsupportsregistryaliases">Run a subset of species (no forcing &amp; supports registry aliases):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><h3 id="Runasubsetofspeciesnoforcingsupportsregistryaliases">Run a subset of species (no forcing &amp; supports registry aliases):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -species anolis -species celegans -species human \
 	-base_path /path/to/dumps -registry reg.pm
-</code></pre><h3 id="Specifyingspeciestoforcesupportsallregistryaliases">Specifying species to force (supports all registry aliases):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><h3 id="Specifyingspeciestoforcesupportsallregistryaliases">Specifying species to force (supports all registry aliases):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -force_species anolis -force_species celegans -force_species human \
 	-base_path /path/to/dumps -registry reg.pm
-</code></pre><h3 id="Runningforcingaspecies">Running &amp; forcing a species:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><h3 id="Runningforcingaspecies">Running &amp; forcing a species:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -species celegans -force_species celegans \
 	-base_path /path/to/dumps -registry reg.pm
-</code></pre><h3 id="DumpingjustgenedatanoDNAorncRNA">Dumping just gene data (no DNA or ncRNA):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><h3 id="DumpingjustgenedatanoDNAorncRNA">Dumping just gene data (no DNA or ncRNA):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -dump_type cdna \
 	-base_path /path/to/dumps -registry reg.pm
-</code></pre><h3 id="UsingadifferentSCPuseridentity">Using a different SCP user &amp; identity:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+</code></pre><h3 id="UsingadifferentSCPuseridentity">Using a different SCP user &amp; identity:</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -scp_user anotherusr -scp_identity /users/anotherusr/.pri/identity \
 	-base_path /path/to/dumps -registry reg.pm
 </code></pre><h2 id="RunningthePipeline">Running the Pipeline</h2><ol><li>Start a screen session or get ready to run the beekeeper with a <code>nohup</code></li><li>Choose a dump location<ul><li>A fasta, blast and blat directory will be created 1 level below</li></ul></li><li>Use an <code>init_pipeline.pl</code> configuration from above<ul><li>Make sure to give it the <code>-base_path</code> parameter</li></ul></li><li>Sync the database using one of the displayed from <code>init_pipeline.pl</code></li><li>Run the pipeline in a loop with a good sleep between submissions and redirect log output (the following assumes you are using <strong>bash</strong>)<ul><li><code>2&gt;&amp;1</code> is important as this clobbers STDERR into STDOUT</li><li><code>&gt; my_run.log</code> then sends the output to this file. Use <code>tail -f</code> to track the pipeline</li></ul></li><li><code>beekeeper.pl -url mysql://usr:pass@server:port/db -reg_conf reg.pm -loop -sleep 5 2&gt;&amp;1 &gt; my_run.log &amp;</code></li><li>Wait</li></ol></body></html>
\ No newline at end of file
--- a/docs/pipelines/fasta.textile
+++ b/docs/pipelines/fasta.textile
@@ -154,41 +154,41 @@ h2. Example Commands
 h3. To load use normally:

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -base_path /path/to/dumps -registry reg.pm

 h3. Run a subset of species (no forcing & supports registry aliases):

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -species anolis -species celegans -species human \
 	-base_path /path/to/dumps -registry reg.pm

 h3. Specifying species to force (supports all registry aliases):

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -force_species anolis -force_species celegans -force_species human \
 	-base_path /path/to/dumps -registry reg.pm

 h3. Running & forcing a species:

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -species celegans -force_species celegans \
 	-base_path /path/to/dumps -registry reg.pm

 h3. Dumping just gene data (no DNA or ncRNA):

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -dump_type cdna \
 	-base_path /path/to/dumps -registry reg.pm

 h3. Using a different SCP user & identity:

 bc. 
-	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig:FASTA_conf \
+	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FASTA_conf \
 	-pipeline_db -host=my-db-host -scp_user anotherusr -scp_identity /users/anotherusr/.pri/identity \
 	-base_path /path/to/dumps -registry reg.pm