Skip to content
Snippets Groups Projects
Commit 8e72f3ea authored by Andy Yates's avatar Andy Yates
Browse files

Adding documentation and also switching to using type

parent 68ffc69b
No related branches found
No related tags found
No related merge requests found
......@@ -43,8 +43,15 @@
</code></pre><h3 id="DumpingjustEMBLdatanogenbank">Dumping just EMBL data (no genbank):</h3><pre><code> init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::Flatfile_conf \
-pipeline_db -host=my-db-host -type embl \
-base_path /path/to/dumps -registry reg.pm
</code></pre><h2 id="RunningthePipeline">Running the Pipeline</h2><ol><li>Start a screen session or get ready to run the beekeeper with a <code>nohup</code></li><li>Choose a dump location<ul><li>A fasta, blast and blat directory will be created 1 level below</li></ul></li><li>Use an <code>init_pipeline.pl</code> configuration from above<ul><li>Make sure to give it the <code>-base_path</code> parameter</li></ul></li><li>Sync the database using one of the displayed from <code>init_pipeline.pl</code></li><li>Run the pipeline in a loop with a good sleep between submissions and redirect log output (the following assumes you are using <strong>bash</strong>)<ul><li><code>2&gt;&amp;1</code> is important as this clobbers STDERR into STDOUT</li><li><code>&gt; my_run.log</code> then sends the output to this file. Use <code>tail -f</code> to track the pipeline</li></ul></li><li><code>beekeeper.pl -url mysql://usr:pass@server:port/db -reg_conf reg.pm -loop -sleep 5 2&gt;&amp;1 &gt; my_run.log &amp;</code></li><li>Wait</li></ol><h2 id="RunningwithoutaPipeline">Running without a Pipeline</h2><p>Hive gives us the ability to run any Process outside of a database pipeline <br/>run using <code>standaloneJob.pl</code>. We will list some useful commands to run</p><h3 id="DumpingaSingleSpecies">Dumping a Single Species</h3><pre><code> standaloneJob.pl Bio::EnsEMBL::Pipeline::FASTA::DumpFile \
</code></pre><h2 id="RunningthePipeline">Running the Pipeline</h2><ol><li>Start a screen session or get ready to run the beekeeper with a <code>nohup</code></li><li>Choose a dump location<ul><li>A fasta, blast and blat directory will be created 1 level below</li></ul></li><li>Use an <code>init_pipeline.pl</code> configuration from above<ul><li>Make sure to give it the <code>-base_path</code> parameter</li></ul></li><li>Sync the database using one of the displayed from <code>init_pipeline.pl</code></li><li>Run the pipeline in a loop with a good sleep between submissions and redirect log output (the following assumes you are using <strong>bash</strong>)<ul><li><code>2&gt;&amp;1</code> is important as this clobbers STDERR into STDOUT</li><li><code>&gt; my_run.log</code> then sends the output to this file. Use <code>tail -f</code> to track the pipeline</li></ul></li><li><code>beekeeper.pl -url mysql://usr:pass@server:port/db -reg_conf reg.pm -loop -sleep 5 2&gt;&amp;1 &gt; my_run.log &amp;</code></li><li>Wait</li></ol><h2 id="RunningwithoutaPipeline">Running without a Pipeline</h2><p>Hive gives us the ability to run any Process outside of a database pipeline <br/>run using <code>standaloneJob.pl</code>. We will list some useful commands to run</p><h3 id="DumpingaSingleSpecies">Dumping a Single Species</h3><pre><code> standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::DumpFile \
-reg_conf reg.pm -debug 2 \
-release 67 -species homo_sapiens \
-release 67 -species homo_sapiens -type embl \
-base_path /path/to/dumps
</code></pre><h2 id="Verification">Verification</h2><p>Another pipeline is provided which can verify the files produced by this <br/>pipeline. Nothing else other than a basic prodding of file contents is<br/>attempted.</p><h3 id="RunningwithaPipeline">Running with a Pipeline</h3><p>The code works with a SQLite database so you do not need a MySQL database<br/>to schedule these jobs. You will have to schedule two pipelines; one<br/>to work with embl and another to work with genbank.</p><p>The pipeline searches for all files matching the format *.dat.gz.</p><pre><code> init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
-base_path /path/to/embl/dumps -type embl
</code></pre><pre><code> init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
-base_path /path/to/genbank/dumps -type genbank
</code></pre><h3 id="RunningwithoutaPipeline2">Running without a Pipeline</h3><p>You can run this module without a pipeline if you need to check a single<br/>file.</p><pre><code> standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::CheckFlatfile \
-file /path/to/embl/dumps/homo_sapiens/Homo_sapiens.chromosome.1.dat.gz \
-type embl
</code></pre></body></html>
\ No newline at end of file
......@@ -143,8 +143,40 @@ run using @standaloneJob.pl@. We will list some useful commands to run
h3. Dumping a Single Species
bc.
standaloneJob.pl Bio::EnsEMBL::Pipeline::FASTA::DumpFile \
standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::DumpFile \
-reg_conf reg.pm -debug 2 \
-release 67 -species homo_sapiens \
-release 67 -species homo_sapiens -type embl \
-base_path /path/to/dumps
h2. Verification
Another pipeline is provided which can verify the files produced by this
pipeline. Nothing else other than a basic prodding of file contents is
attempted.
h3. Running with a Pipeline
The code works with a SQLite database so you do not need a MySQL database
to schedule these jobs. You will have to schedule two pipelines; one
to work with embl and another to work with genbank.
The pipeline searches for all files matching the format *.dat.gz.
bc.
init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
-base_path /path/to/embl/dumps -type embl
bc.
init_pipeline.pl Bio::EnsEMBL::Pipeline::PipeConfig::FlatfileChecker_conf \
-base_path /path/to/genbank/dumps -type genbank
h3. Running without a Pipeline
You can run this module without a pipeline if you need to check a single
file.
bc.
standaloneJob.pl Bio::EnsEMBL::Pipeline::Flatfile::CheckFlatfile \
-file /path/to/embl/dumps/homo_sapiens/Homo_sapiens.chromosome.1.dat.gz \
-type embl
......@@ -33,7 +33,7 @@ Allowed parameters are:
=item file - The file to parse
=item format - Passed into SeqIO; the format to parse
=item type - Passed into SeqIO; the format to parse
=back
......@@ -51,15 +51,15 @@ use base qw/Bio::EnsEMBL::Pipeline::Flatfile::Base/;
sub fetch_input {
my ($self) = @_;
$self->throw("No 'file' parameter specified") unless $self->param('file');
$self->throw("No 'format' parameter specified") unless $self->param('format');
$self->throw("No 'type' parameter specified") unless $self->param('type');
return;
}
sub run {
my ($self) = @_;
my $fh = $self->get_fh();
my $format = $self->param('format');
my $stream = Bio::SeqIO->new(-FH => $fh, -FORMAT => $format);
my $type = $self->param('type');
my $stream = Bio::SeqIO->new(-FH => $fh, -FORMAT => $type);
my $count = 0;
while ( (my $seq = $stream->next_seq()) ) {
$self->fine("Found the record %s", $seq->accession());
......
......@@ -13,11 +13,15 @@ sub default_options {
%{ $self->SUPER::default_options() },
# 'base_path' => '', #where do you want your files
# 'format' => '',
# 'type' => '',
### Defaults
pipeline_name => 'flatfile_dump_check_'.$self->o('format'),
pipeline_db => {
-driver => 'sqlite',
}
};
}
......@@ -61,7 +65,7 @@ sub pipeline_wide_parameters {
my ($self) = @_;
return {
%{ $self->SUPER::pipeline_wide_parameters() },
format => $self->o('format'),
format => $self->o('type'),
};
}
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment