Adding documentation and also switching to using type

8e72f3ea · Andy Yates · 68ffc69b · 8e72f3ea · 8e72f3ea · 8e72f3ea
Commit 8e72f3ea authored 12 years ago by Andy Yates
--- a/docs/pipelines/flatfile.html
+++ b/docs/pipelines/flatfile.html
@@ -43,8 +43,15 @@
 </code></pre><h3 id="DumpingjustEMBLdatanogenbank">Dumping just EMBL data (no genbank):</h3><pre><code>	init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::Flatfile_conf \
 	-pipeline_db -host=my-db-host -type embl \
 	-base_path /path/to/dumps -registry reg.pm
-</code></pre><h2 id="RunningthePipeline">Running the Pipeline</h2><ol><li>Start a screen session or get ready to run the beekeeper with a <code>nohup</code></li><li>Choose a dump location<ul><li>A fasta, blast and blat directory will be created 1 level below</li></ul></li><li>Use an <code>init_pipeline.pl</code> configuration from above<ul><li>Make sure to give it the <code>-base_path</code> parameter</li></ul></li><li>Sync the database using one of the displayed from <code>init_pipeline.pl</code></li><li>Run the pipeline in a loop with a good sleep between submissions and redirect log output (the following assumes you are using <strong>bash</strong>)<ul><li><code>2&gt;&amp;1</code> is important as this clobbers STDERR into STDOUT</li><li><code>&gt; my_run.log</code> then sends the output to this file. Use <code>tail -f</code> to track the pipeline</li></ul></li><li><code>beekeeper.pl -url mysql://usr:pass@server:port/db -reg_conf reg.pm -loop -sleep 5 2&gt;&amp;1 &gt; my_run.log &amp;</code></li><li>Wait</li></ol><h2 id="RunningwithoutaPipeline">Running without a Pipeline</h2><p>Hive gives us the ability to run any Process outside of a database pipeline <br/>run using <code>standaloneJob.pl</code>. We will list some useful commands to run</p><h3 id="DumpingaSingleSpecies">Dumping a Single Species</h3><pre><code>  standaloneJob.pl Bio::EnsEMBL::Pipeline::FASTA::DumpFile \
+</code></pre><h2 id="RunningthePipeline">Running the Pipeline</h2><ol><li>Start a screen session or get ready to run the beekeeper with a <code>nohup</code></li><li>Choose a dump location<ul><li>A fasta, blast and blat directory will be created 1 level below</li></ul></li><li>Use an <code>init_pipeline.pl</code> configuration from above<ul><li>Make sure to give it the <code>-base_path</code> parameter</li></ul></li><li>Sync the database using one of the displayed from <code>init_pipeline.pl</code></li><li>Run the pipeline in a loop with a good sleep between submissions and redirect log output (the following assumes you are using <strong>bash</strong>)<ul><li><code>2&gt;&amp;1</code> is important as this clobbers STDERR into STDOUT</li><li><code>&gt; my_run.log</code> then sends the output to this file. Use <code>tail -f</code> to track the pipeline</li></ul></li><li><code>beekeeper.pl -url mysql://usr:pass@server:port/db -reg_conf reg.pm -loop -sleep 5 2&gt;&amp;1 &gt; my_run.log &amp;</code></li><li>Wait</li></ol><h2 id="RunningwithoutaPipeline">Running without a Pipeline</h2><p>Hive gives us the ability to run any Process outside of a database pipeline <br/>run using <code>standaloneJob.pl</code>. We will list some useful commands to run</p><h3 id="DumpingaSingleSpecies">Dumping a Single Species</h3><pre><code>  standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::DumpFile \
  -reg_conf reg.pm -debug 2 \
-  -release 67 -species homo_sapiens \
+  -release 67 -species homo_sapiens -type embl \
  -base_path /path/to/dumps
+</code></pre><h2 id="Verification">Verification</h2><p>Another pipeline is provided which can verify the files produced by this <br/>pipeline. Nothing else other than a basic prodding of file contents is<br/>attempted.</p><h3 id="RunningwithaPipeline">Running with a Pipeline</h3><p>The code works with a SQLite database so you do not need a MySQL database<br/>to schedule these jobs. You will have to schedule two pipelines; one<br/>to work with embl and another to work with genbank.</p><p>The pipeline searches for all files matching the format *.dat.gz.</p><pre><code>  init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
+  -base_path /path/to/embl/dumps -type embl
+</code></pre><pre><code>  init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
+  -base_path /path/to/genbank/dumps -type genbank
+</code></pre><h3 id="RunningwithoutaPipeline2">Running without a Pipeline</h3><p>You can run this module without a pipeline if you need to check a single<br/>file.</p><pre><code>  standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::CheckFlatfile \
+  -file /path/to/embl/dumps/homo_sapiens/Homo_sapiens.chromosome.1.dat.gz \
+  -type embl
 </code></pre></body></html>
\ No newline at end of file
--- a/docs/pipelines/flatfile.textile
+++ b/docs/pipelines/flatfile.textile
@@ -143,8 +143,40 @@ run using @standaloneJob.pl@. We will list some useful commands to run
 h3. Dumping a Single Species

 bc. 
-  standaloneJob.pl Bio::EnsEMBL::Pipeline::FASTA::DumpFile \
+  standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::DumpFile \
  -reg_conf reg.pm -debug 2 \
-  -release 67 -species homo_sapiens \
+  -release 67 -species homo_sapiens -type embl \
  -base_path /path/to/dumps
  
+h2. Verification
+
+Another pipeline is provided which can verify the files produced by this 
+pipeline. Nothing else other than a basic prodding of file contents is
+attempted.
+
+h3. Running with a Pipeline
+
+The code works with a SQLite database so you do not need a MySQL database
+to schedule these jobs. You will have to schedule two pipelines; one
+to work with embl and another to work with genbank.
+
+The pipeline searches for all files matching the format *.dat.gz.
+
+bc. 
+  init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
+  -base_path /path/to/embl/dumps -type embl
+
+bc. 
+  init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
+  -base_path /path/to/genbank/dumps -type genbank
+
+h3. Running without a Pipeline
+
+You can run this module without a pipeline if you need to check a single
+file.
+
+bc. 
+  standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::CheckFlatfile \
+  -file /path/to/embl/dumps/homo_sapiens/Homo_sapiens.chromosome.1.dat.gz \
+  -type embl
+
--- a/modules/Bio/EnsEMBL/Pipeline/Flatfile/CheckFlatfile.pm
+++ b/modules/Bio/EnsEMBL/Pipeline/Flatfile/CheckFlatfile.pm
@@ -33,7 +33,7 @@ Allowed parameters are:

 =item file - The file to parse

-=item format - Passed into SeqIO; the format to parse
+=item type - Passed into SeqIO; the format to parse

 =back

@@ -51,15 +51,15 @@ use base qw/Bio::EnsEMBL::Pipeline::Flatfile::Base/;
 sub fetch_input {
  my ($self) = @_;
  $self->throw("No 'file' parameter specified") unless $self->param('file');
-  $self->throw("No 'format' parameter specified") unless $self->param('format');
+  $self->throw("No 'type' parameter specified") unless $self->param('type');
  return;
 }

 sub run {
  my ($self) = @_;
  my $fh = $self->get_fh();
-  my $format = $self->param('format');
-  my $stream = Bio::SeqIO->new(-FH => $fh, -FORMAT => $format);
+  my $type = $self->param('type');
+  my $stream = Bio::SeqIO->new(-FH => $fh, -FORMAT => $type);
  my $count = 0;
  while ( (my $seq = $stream->next_seq()) ) {
    $self->fine("Found the record %s", $seq->accession());

--- a/modules/Bio/EnsEMBL/Pipeline/PipeConfig/FlatfileChecker_conf.pm
+++ b/modules/Bio/EnsEMBL/Pipeline/PipeConfig/FlatfileChecker_conf.pm
@@ -13,11 +13,15 @@ sub default_options {
      %{ $self->SUPER::default_options() }, 
      
      # 'base_path' => '', #where do you want your files
-      # 'format' => '',
+      # 'type' => '',
      
      ### Defaults 
      
      pipeline_name => 'flatfile_dump_check_'.$self->o('format'),
+      
+      pipeline_db => {
+        -driver => 'sqlite',
+      }
    };
 }

@@ -61,7 +65,7 @@ sub pipeline_wide_parameters {
  my ($self) = @_;
  return {
    %{ $self->SUPER::pipeline_wide_parameters() },
-    format => $self->o('format'),
+    format => $self->o('type'),
  };
 }