Skip to content
Snippets Groups Projects
user avatar
Leo Gordon authored
371c12e6
History
Name Last commit Last update
docs
modules/Bio/EnsEMBL
scripts
sql
README
# Bio::EnsEMBL::Hive project
#
# Copyright Team Ensembl
# You may distribute this package under the same terms as perl itself

Contact:
  Please contact ehive-users@ebi.ac.uk mailing list with questions/suggestions.

Summary:
  This is a distributed processing system based on 'autonomous agents' and
  Hive behavioural structure of Honey Bees .  It implements all functionality of both
  data-flow graphs and block-branch diagrams which should allow it to codify
  any program, algorithm, or parallel processing job control system.  It is
  not bound to any processing 'farm' system and can be adapted to any GRID.
  It builds on the design of the Ensembl Pipeline/Analysis and presently uses 
  Bio::EnsEMBL::Analysis::RunnableDB perl wrapper objects as nodes/blocks in 
  the graphs but could be adapted more generally.

3 December, 2009 : Leo Gordon

beekeeper.pl, runWorker.pl and cmd_hive.pl
got new built-in documentation accessible via perldoc or directly.


2 December, 2009 : Leo Gordon

Bio::EnsEMBL::Hive::RunnableDB::LongMult example toy pipeline has been created
to show how to do various things "adult pipelines" perform
(job creation, data flow, control blocking rules, usage of intermediate tables, etc).

Read Bio::EnsEMBL::Hive::RunnableDB::LongMult for a step-by-step instruction
on how to create and run this pipeline.


30 November, 2009 : Leo Gordon

Bio::EnsEMBL::Hive::RunnableDB::JobFactory module has been added.
It is a generic way of creating batches of jobs with the parameters
given by a file or a range of ids.
Entries in the file can also be randomly shuffled.


13 July, 2009 : Leo Gordon

Merging the "Meadow" code from this March' development branch.
Because it separates LSF-specific code from higher level, it will be easier to update.

-------------------------------------------------------------------------------------------------------
Albert, sorry - in the process of merging into the development branch I had to remove your HIGHMEM code.
I hope it is a temporary measure and we will be having hive-wide queue control soon.
If not - you can restore the pre-merger state by updating with the following command:

    cvs update -r lg4_pre_merger_20090713

('maximise_concurrency' option was carried over)
-------------------------------------------------------------------------------------------------------


3 April, 2009 : Albert Vilella

  Added a new maximise_concurrency 1/0 option. When set to 1, it will
  fetch jobs that need to be run in the adequate order as to maximise
  the different number of analyses being run. This is useful for cases
  where different analyses hit different tables and the overall sql
  load can be kept higher without breaking the server, instead of
  having lots of jobs for the same analysis trying to hit the same
  tables.

  Added quick HIGHMEM option. This option is useful when a small
  percent of jobs are too big and fail in normal conditions. The
  runnable can check if it's the second time it's trying to run the
  job, if it's because it contains big data (e.g. gene_count > 200)
  and if it isn't already in HIGHMEM mode. Then, it will call
  reset_highmem_job_by_dbID and quit:

  if ($self->input_job->retry_count == 1) {
    if ($self->{'protein_tree'}->get_tagvalue('gene_count') > 200 && !defined($self->worker->{HIGHMEM})) {
      $self->input_job->adaptor->reset_highmem_job_by_dbID($self->input_job->dbID);
      $self->DESTROY;
      throw("Alignment job too big: send to highmem and quit");
    }
  }

  Assuming there is a

   beekeeper.pl -url <blah> -highmem -lsf_options "<lots of mem>"

   running, or a 
   
   runWorker.pl <blah> -highmem 1

   with lots of mem running, it will fetch the HIGHMEM jobs as if they
   were "READY but needs HIGHMEM".

   Also added a modification to Queen that will not synchronize as
   often when more than 450 jobs are running and the load is above
   0.9, so that the queries to analysis tables are not hitting the sql
   server too much.

23 July, 2008 : Will Spooner
  Removed remaining ensembl-pipeline dependencies.

11 March, 2005 : Jessica Severin
  Project is reaching a very stable state.  New 'node' object Bio::EnsEMBL::Hive::Process
  allows for independence from Ensembl Pipeline and provides extended process functionality
  to manipulate hive job objects, branch, modify hive graphs, create jobs, and other hive
  process specific tasks.  Some of this extended 'Process' API may still evolve.

7 June, 2004 : Jessica Severin
  This project is under active development and should be classified as pre-alpha
  Most of the design has been settled and I'm in the process of implementing the details
  but entire objects could disappear or drastically change as I approach the end.
  Watch this space for further developments

11 March, 2005 : Jessica Severin