# Bio::EnsEMBL::Hive project # # Copyright Team Ensembl # You may distribute this package under the same terms as perl itself Contact: Please contact ehive-users@ebi.ac.uk mailing list with questions/suggestions. Summary: This is a distributed processing system based on 'autonomous agents' and Hive behavioural structure of Honey Bees . It implements all functionality of both data-flow graphs and block-branch diagrams which should allow it to codify any program, algorithm, or parallel processing job control system. It is not bound to any processing 'farm' system and can be adapted to any GRID. It builds on the design of the Ensembl Pipeline/Analysis and presently uses Bio::EnsEMBL::Analysis::RunnableDB perl wrapper objects as nodes/blocks in the graphs but could be adapted more generally. 3 December, 2009 : Leo Gordon beekeeper.pl, runWorker.pl and cmd_hive.pl got new built-in documentation accessible via perldoc or directly. 2 December, 2009 : Leo Gordon Bio::EnsEMBL::Hive::RunnableDB::LongMult example toy pipeline has been created to show how to do various things "adult pipelines" perform (job creation, data flow, control blocking rules, usage of intermediate tables, etc). Read Bio::EnsEMBL::Hive::RunnableDB::LongMult for a step-by-step instruction on how to create and run this pipeline. 30 November, 2009 : Leo Gordon Bio::EnsEMBL::Hive::RunnableDB::JobFactory module has been added. It is a generic way of creating batches of jobs with the parameters given by a file or a range of ids. Entries in the file can also be randomly shuffled. 13 July, 2009 : Leo Gordon Merging the "Meadow" code from this March' development branch. Because it separates LSF-specific code from higher level, it will be easier to update. ------------------------------------------------------------------------------------------------------- Albert, sorry - in the process of merging into the development branch I had to remove your HIGHMEM code. I hope it is a temporary measure and we will be having hive-wide queue control soon. If not - you can restore the pre-merger state by updating with the following command: cvs update -r lg4_pre_merger_20090713 ('maximise_concurrency' option was carried over) ------------------------------------------------------------------------------------------------------- 3 April, 2009 : Albert Vilella Added a new maximise_concurrency 1/0 option. When set to 1, it will fetch jobs that need to be run in the adequate order as to maximise the different number of analyses being run. This is useful for cases where different analyses hit different tables and the overall sql load can be kept higher without breaking the server, instead of having lots of jobs for the same analysis trying to hit the same tables. Added quick HIGHMEM option. This option is useful when a small percent of jobs are too big and fail in normal conditions. The runnable can check if it's the second time it's trying to run the job, if it's because it contains big data (e.g. gene_count > 200) and if it isn't already in HIGHMEM mode. Then, it will call reset_highmem_job_by_dbID and quit: if ($self->input_job->retry_count == 1) { if ($self->{'protein_tree'}->get_tagvalue('gene_count') > 200 && !defined($self->worker->{HIGHMEM})) { $self->input_job->adaptor->reset_highmem_job_by_dbID($self->input_job->dbID); $self->DESTROY; throw("Alignment job too big: send to highmem and quit"); } } Assuming there is a beekeeper.pl -url <blah> -highmem -lsf_options "<lots of mem>" running, or a runWorker.pl <blah> -highmem 1 with lots of mem running, it will fetch the HIGHMEM jobs as if they were "READY but needs HIGHMEM". Also added a modification to Queen that will not synchronize as often when more than 450 jobs are running and the load is above 0.9, so that the queries to analysis tables are not hitting the sql server too much. 23 July, 2008 : Will Spooner Removed remaining ensembl-pipeline dependencies. 11 March, 2005 : Jessica Severin Project is reaching a very stable state. New 'node' object Bio::EnsEMBL::Hive::Process allows for independence from Ensembl Pipeline and provides extended process functionality to manipulate hive job objects, branch, modify hive graphs, create jobs, and other hive process specific tasks. Some of this extended 'Process' API may still evolve. 7 June, 2004 : Jessica Severin This project is under active development and should be classified as pre-alpha Most of the design has been settled and I'm in the process of implementing the details but entire objects could disappear or drastically change as I approach the end. Watch this space for further developments 11 March, 2005 : Jessica Severin
Name | Last commit | Last update |
---|---|---|
docs | ||
modules/Bio/EnsEMBL | ||
scripts | ||
sql | ||
README |