This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git. Pull mirroring updated .
  1. 30 Nov, 2004 1 commit
  2. 25 Nov, 2004 3 commits
  3. 24 Nov, 2004 4 commits
  4. 22 Nov, 2004 1 commit
  5. 20 Nov, 2004 3 commits
    • Jessica Severin's avatar
    • Jessica Severin's avatar
      new lsf_beekeeper to correspond to new distributed Queen system. · 7a104fd3
      Jessica Severin authored
      no longer does Queen::synchronize_hive as part of autonomous loop
      -sync option allows user to manually trigger a hard sync.
      also removed default display of full hive status and addded option -status
      which will print this full status.
      Also removed adjusting needed worker count for 'pending' workers. lsf will
      sometimes leave jobs in pending state for no apparent reason (new bsubed job
      will run yet older pending job stays pending). Current 'pending' count also didn't
      differentiate between lsf_beekeeper submited jobs and manually submitted
      jobs. This pend adjustment isn't a critcal subsystem so I've removed it for
      now.  If a runWorker starts (after a long pend) and there is no work left
      it will die immeadiately.  I may rewrite a smarter 'pending' adjustment
      in the future.
      7a104fd3
    • Jessica Severin's avatar
      New distributed Queen system. Queen/hive updates its state in an incremental · e3d44c7e
      Jessica Severin authored
      and distributed manner as it interacts with the workers over the course of its life.
      When a runWorker.pl script starts and asks a queen to create a worker the queen has
      a list of known analyses which are 'above the surface' where full hive analysis has
      been done and the number of needed workers has been calculated. Full synch requires
      joining data between the analysis, analysis_job, analysis_stats, and hive tables.
      When this reached 10e7 jobs, 10e4 analyses, 10e3 workers a full hard sync took minutes
      and it was clear this bit of the system wasn't scaling and wasn't going to make it
      to the next order of magnitude. This occurred in the compara blastz pipeline between
      mouse and rat.
      Now there are some analyses 'below the surface' that have partial synchronization.
      These analyses have been flagged as having 'x' new jobs (AnalysisJobAdaptor updating
      analysis_stats on job insert).  If no analysis is found to asign to the newly
      created worker, the queen will dip below the surface and start checking
      the analyses with the highest probablity of needing the most workers.
      This incremental sync is also done in Queen::get_num_needed_workers
      When calculating ahead a total worker count, this routine will also dip below
      the surface until the hive reaches it's current defined worker saturation.
      A beekeeper is no longer a required component for the system to function.
      If workers can get onto cpus the hive will run.  The beekeeper is now mainly a
      user display program showing the status of the hive.  There is no longer any
      central process doing work and one hive can potentially scale
      beyond 10e9 jobs in graphs of 10e6 analysis nodes and 10e6 running workers.
      e3d44c7e
  6. 19 Nov, 2004 4 commits
  7. 18 Nov, 2004 2 commits
  8. 17 Nov, 2004 3 commits
  9. 16 Nov, 2004 2 commits
  10. 10 Nov, 2004 1 commit
  11. 09 Nov, 2004 8 commits
  12. 05 Nov, 2004 1 commit
  13. 04 Nov, 2004 1 commit
  14. 27 Oct, 2004 1 commit
  15. 20 Oct, 2004 4 commits
  16. 19 Oct, 2004 1 commit
    • Jessica Severin's avatar
      extended input_id syntax: · 911d6847
      Jessica Severin authored
      1) input_id is the command
      2) input_id is formated like '{did=>123}'
        where did is short hand for analysis_data_id and the real command
        is stored in the analysis_data table
      911d6847