This project is mirrored from https://:***** Pull mirroring updated .
  1. 16 Feb, 2005 1 commit
  2. 10 Feb, 2005 1 commit
  3. 11 Jan, 2005 1 commit
  4. 08 Jan, 2005 1 commit
  5. 14 Dec, 2004 1 commit
  6. 09 Dec, 2004 1 commit
  7. 25 Nov, 2004 2 commits
  8. 24 Nov, 2004 1 commit
  9. 20 Nov, 2004 1 commit
    • Jessica Severin's avatar
      New distributed Queen system. Queen/hive updates its state in an incremental · e3d44c7e
      Jessica Severin authored
      and distributed manner as it interacts with the workers over the course of its life.
      When a script starts and asks a queen to create a worker the queen has
      a list of known analyses which are 'above the surface' where full hive analysis has
      been done and the number of needed workers has been calculated. Full synch requires
      joining data between the analysis, analysis_job, analysis_stats, and hive tables.
      When this reached 10e7 jobs, 10e4 analyses, 10e3 workers a full hard sync took minutes
      and it was clear this bit of the system wasn't scaling and wasn't going to make it
      to the next order of magnitude. This occurred in the compara blastz pipeline between
      mouse and rat.
      Now there are some analyses 'below the surface' that have partial synchronization.
      These analyses have been flagged as having 'x' new jobs (AnalysisJobAdaptor updating
      analysis_stats on job insert).  If no analysis is found to asign to the newly
      created worker, the queen will dip below the surface and start checking
      the analyses with the highest probablity of needing the most workers.
      This incremental sync is also done in Queen::get_num_needed_workers
      When calculating ahead a total worker count, this routine will also dip below
      the surface until the hive reaches it's current defined worker saturation.
      A beekeeper is no longer a required component for the system to function.
      If workers can get onto cpus the hive will run.  The beekeeper is now mainly a
      user display program showing the status of the hive.  There is no longer any
      central process doing work and one hive can potentially scale
      beyond 10e9 jobs in graphs of 10e6 analysis nodes and 10e6 running workers.
  10. 09 Nov, 2004 2 commits
    • Jessica Severin's avatar
      reformated code (removed all the tabs) · 088529b5
      Jessica Severin authored
    • Jessica Severin's avatar
      refactored synchronization logic to allow for worker distributed syncing. · e6fb56d1
      Jessica Severin authored
      The synchronization of the analysis_stat summary statistics was done by
      the beekeeper at the top of it's loop.  For graphs with 40,000+ analyses
      this centralized syncing became a bottle neck.  This new system allows
      the Queen attached to each worker process to synchronize it's analysis.
      Syncing happens when a worker 'checks in' and when it dies.  The sync on
      'check in' only updates if the stats are >60secs out of date to prevent
      over syncing.
      The beekeeper still needs to do whole system syncs when a subsection has
      finished and the next section needs to be 'unblocked'.  For homology this
      will happen 2 times in a 16 hour run.
  11. 20 Oct, 2004 1 commit
  12. 12 Oct, 2004 1 commit
  13. 11 Aug, 2004 2 commits
  14. 06 Aug, 2004 1 commit
  15. 03 Aug, 2004 1 commit
  16. 16 Jul, 2004 2 commits
  17. 15 Jul, 2004 1 commit
  18. 14 Jul, 2004 1 commit
  19. 13 Jul, 2004 3 commits
  20. 09 Jul, 2004 1 commit
    • Jessica Severin's avatar
      changed Queen->create_new_worker method to use rearrange formating. · 54927cb4
      Jessica Severin authored
      Also added functionality so that runWorker can be run without
      specification of an analysis.  The create_new_worker method now will
      query for a 'needed worker' analysis from the AnalysisStats adaptor when
      the analysis_id is undef.  This simplifies the API interface between the
      Queen and the beekeepers.  Now the beekeeper only needs to receive a count
      of workers.  The workers can still be run with explicit analyses for
      testing or situations where one wants to manually control the processing.
      Now one can simply do
      bsub -JW[1-100] runWorker -url mysql://ensadmin:<pass>@ecs2:3361/compara_hive_jess_23
      to create 100 workers which will become whatever analysis that needs to be done.
  21. 17 Jun, 2004 1 commit
  22. 16 Jun, 2004 1 commit
  23. 14 Jun, 2004 5 commits
  24. 08 Jun, 2004 3 commits
  25. 07 Jun, 2004 1 commit
    • Jessica Severin's avatar
      complete switch over to new DataflowRule design. Dataflow rules use · e45d4761
      Jessica Severin authored
      URL's to specify analysis objects from mysql databases distributed
      across a network.  AnalysisJobAdaptor was switched to create jobs with
      a cless method that gets the db connection from the analysis object that
      is passed.  Thus the system now exists in a distributed state.
      The dataflow rule also implements branching via the branch_code.
      SimpleRule will be deprecated.
  26. 04 Jun, 2004 1 commit
  27. 27 May, 2004 1 commit
  28. 25 May, 2004 1 commit