This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git. Pull mirroring updated .
  1. 30 Nov, 2004 1 commit
  2. 25 Nov, 2004 2 commits
  3. 24 Nov, 2004 2 commits
  4. 22 Nov, 2004 1 commit
  5. 20 Nov, 2004 1 commit
    • Jessica Severin's avatar
      New distributed Queen system. Queen/hive updates its state in an incremental · e3d44c7e
      Jessica Severin authored
      and distributed manner as it interacts with the workers over the course of its life.
      When a runWorker.pl script starts and asks a queen to create a worker the queen has
      a list of known analyses which are 'above the surface' where full hive analysis has
      been done and the number of needed workers has been calculated. Full synch requires
      joining data between the analysis, analysis_job, analysis_stats, and hive tables.
      When this reached 10e7 jobs, 10e4 analyses, 10e3 workers a full hard sync took minutes
      and it was clear this bit of the system wasn't scaling and wasn't going to make it
      to the next order of magnitude. This occurred in the compara blastz pipeline between
      mouse and rat.
      Now there are some analyses 'below the surface' that have partial synchronization.
      These analyses have been flagged as having 'x' new jobs (AnalysisJobAdaptor updating
      analysis_stats on job insert).  If no analysis is found to asign to the newly
      created worker, the queen will dip below the surface and start checking
      the analyses with the highest probablity of needing the most workers.
      This incremental sync is also done in Queen::get_num_needed_workers
      When calculating ahead a total worker count, this routine will also dip below
      the surface until the hive reaches it's current defined worker saturation.
      A beekeeper is no longer a required component for the system to function.
      If workers can get onto cpus the hive will run.  The beekeeper is now mainly a
      user display program showing the status of the hive.  There is no longer any
      central process doing work and one hive can potentially scale
      beyond 10e9 jobs in graphs of 10e6 analysis nodes and 10e6 running workers.
      e3d44c7e
  6. 19 Nov, 2004 2 commits
    • Jessica Severin's avatar
      changed fetch_by_status to order returned results so the stats which have · af03e291
      Jessica Severin authored
      the most time since last update are at the top of the returned list
      af03e291
    • Jessica Severin's avatar
      Change for distributed smart Queen system. · c05ce49d
      Jessica Severin authored
      When jobs are inserted into the analysis_job table, the analysis_stats table
      for the given analysis is updated by incrementing the total_job_count,
      and unclaimed_job_count and setting the status to 'LOADING'.
      If the analysis is 'BLOCKED' this incremental update does not happen.
      When an analysis_stats is 'BLOCKED' and then unblocked this automatically
      will trigger a resync so this progress partial update is not needed.
      c05ce49d
  7. 18 Nov, 2004 2 commits
  8. 17 Nov, 2004 3 commits
  9. 16 Nov, 2004 1 commit
  10. 09 Nov, 2004 4 commits
  11. 05 Nov, 2004 1 commit
  12. 04 Nov, 2004 1 commit
  13. 27 Oct, 2004 1 commit
  14. 20 Oct, 2004 4 commits
  15. 19 Oct, 2004 2 commits
  16. 18 Oct, 2004 1 commit
  17. 15 Oct, 2004 1 commit
  18. 12 Oct, 2004 1 commit
  19. 06 Oct, 2004 1 commit
  20. 05 Oct, 2004 1 commit
    • Jessica Severin's avatar
      Second insert into analysis_data for job_creation added extra overhead. · f7182485
      Jessica Severin authored
      Removed select before store (made new method store_if_needed if that functionality is required by users)
      and added option in AnalysisJobAdaptor::CreateNewJob to pass input_analysis_data_id
      so if already know the CreateNewJob will be as fast as before.  Plus there are no limits on the
      size of the input_id string.
      f7182485
  21. 04 Oct, 2004 1 commit
  22. 30 Sep, 2004 2 commits
    • Jessica Severin's avatar
      modified analysis_job table : replaced input_id varchar(100) with · 2be90ea9
      Jessica Severin authored
      input_analysis_data_id int(10) which joins to analysis_data table.
      added output_analysis_data_id int(10) for storing output_id
      External analysis_data.data is LongText which will allow much longer
      parameter sets to be passed around than was previously possible.
      AnalysisData will also allow processes to manually store 'other' data and
      pass it around via ID reference now.
      2be90ea9
    • Jessica Severin's avatar
      debugged syntax · 49fed033
      Jessica Severin authored
      49fed033
  23. 27 Sep, 2004 3 commits
  24. 23 Sep, 2004 1 commit