This project is mirrored from https://:***** Pull mirroring updated .
  1. 08 Jul, 2004 3 commits
    • Jessica Severin's avatar
      added hive_id index to analysis_job table to help with dead_worker · 27403dda
      Jessica Severin authored
      job reseting.  This allowed direct UPDATE..WHERE.. sql to be used.
      Also changed the retry_count system: retry_count is only incremented
      for jobs that failed (status in ('GET_INPUT','RUN','WRITE_OUTPUT')).
      Job that were CLAIMED by the dead worker are just reset without
      incrementing the retry_count since they were never attempted to run.
      Also the fetching of claimed jobs now has an 'ORDER BY retry_count'
      so that jobs that have failed are at the bottom of the list of jobs
      to process.  This allows the 'bad' jobs to filter themselves out.
    • Jessica Severin's avatar
      implemented a proper 'dead worker on lsf' checking system. Workers are · ba5578d5
      Jessica Severin authored
      created registered to the LSF beekeeper, and the 'dead' check is done only
      where the beekeeper is LSF and it's 15minutes overdue for it's checkin.
      The check is done with an ssh to the workers registered host machine and
      a 'ps' command to see if the registered process_id of the worker is still
      running.  This allows jobs to be submitted via lsf arrays (which only give
      a single LSF job id for the entire array), but still allows each worker
      to be checked separately.
    • Jessica Severin's avatar
      added beekeeper to interface between queens and an LSF controlled · 81e809d5
      Jessica Severin authored
      compute resource
  2. 06 Jul, 2004 4 commits
  3. 21 Jun, 2004 1 commit
  4. 19 Jun, 2004 1 commit
  5. 17 Jun, 2004 6 commits
  6. 16 Jun, 2004 5 commits
  7. 15 Jun, 2004 1 commit
  8. 14 Jun, 2004 19 commits