ensembl-hive  2.1
 All Classes Namespaces Files Functions Pages
Bio::EnsEMBL::Hive::RunnableDB::FastaFactory Class Reference
+ Inheritance diagram for Bio::EnsEMBL::Hive::RunnableDB::FastaFactory:
+ Collaboration diagram for Bio::EnsEMBL::Hive::RunnableDB::FastaFactory:

Public Member Functions

public param_defaults ()
 
public fetch_input ()
 
public run ()
 
public write_output ()
 
- Public Member Functions inherited from Bio::EnsEMBL::Hive::Process
public new ()
 
public life_cycle ()
 
public say_with_header ()
 
public enter_status ()
 
public warning ()
 
public strict_hash_format ()
 
public param_defaults ()
 
public fetch_input ()
 
public run ()
 
public write_output ()
 
public Bio::EnsEMBL::Hive::Worker worker ()
 
public Boolean execute_writes ()
 
public
Bio::EnsEMBL::Hive::DBSQL::DBAdaptor 
db ()
 
public
Bio::EnsEMBL::Hive::DBSQL::DBConnection 
dbc ()
 
public
Bio::EnsEMBL::Hive::DBSQL::DBConnection 
data_dbc ()
 
public
Bio::EnsEMBL::Hive::AnalysisJob 
input_job ()
 
public input_id ()
 
public param ()
 
public param_required ()
 
public param_is_defined ()
 
public param_substitute ()
 
public dataflow_output_id ()
 
public throw ()
 
public complete_early ()
 
public Int debug ()
 
public worker_temp_directory ()
 
public worker_temp_directory_name ()
 
public cleanup_worker_temp_directory ()
 

Detailed Description

Synopsis

standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::FastaFactory --inputfile reference.fasta --max_chunk_length 600000
--inputfile reference.fasta \
--max_chunk_length 700000 \
--output_prefix ref_chunk \
--flow_into "{ 2 => ['mysql://ensadmin:${ENSADMIN_PSW}@127.0.0.1/lg4_split_fasta/analysis?logic_name=blast']}"

Description

    This is a Bioinformatics-specific "Factory" Runnable that splits a given Fasta file into smaller chunks
    and dataflows one job per chunk.

    The following parameters are supported:

        param('inputfile');         # The original Fasta file: 'inputfile' => 'my_sequences.fasta'

        param('max_chunk_length');  # Maximum total length of sequences in a chunk: 'max_chunk_length' => '200000'

        param('output_prefix');     # A common prefix for output files: 'output_prefix' => 'my_special_chunk_'

        param('output_suffix');     # A common suffix for output files: 'output_suffix' => '.nt'

Member Function Documentation

public Bio::EnsEMBL::Hive::RunnableDB::FastaFactory::fetch_input ( )
    Description : Implements fetch_input() interface method of Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
                    Here we only check the existence of 'inputfile' parameter and try to parse it (all other parameters have defaults).
 
Code:
click to view
public Bio::EnsEMBL::Hive::RunnableDB::FastaFactory::param_defaults ( )
    Description : Implements param_defaults() interface method of Bio::EnsEMBL::Hive::Process that defines module defaults for parameters.
 
Code:
click to view
public Bio::EnsEMBL::Hive::RunnableDB::FastaFactory::run ( )
    Description : Implements run() interface method of Bio::EnsEMBL::Hive::Process that is used to perform the main bulk of the job (minus input and output).
                    Because we want to stream the data more efficiently, all functionality is in write_output();
 
Code:
click to view
public Bio::EnsEMBL::Hive::RunnableDB::FastaFactory::write_output ( )
    Description : Implements write_output() interface method of Bio::EnsEMBL::Hive::Process that is used to deal with job's output after the execution.
                    The main bulk of this Runnable's functionality is here.
                    Iterates through all sequences in input_seqio, splits them into separate files ("chunks") using a cut-off length and dataflows one job per chunk.
 
Code:
click to view

The documentation for this class was generated from the following file: