This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git.
Pull mirroring updated .
- 15 Jul, 2004 6 commits
-
-
Jessica Severin authored
it syncs and displays a full summary of the state of the hive including what workers were overdue, how many are needed, and what workers are running. Also changed the check_for_dead to use the LSF bjobs command since I now store an LSF job_id and array_index in the process_id for LSF workers. Also changed the overdue time limit to 75minutes since the expected lifetime is 60minutes.
-
Jessica Severin authored
as well also set the beekeeper to LSF. Commented this.
-
Jessica Severin authored
to prevent excessive workers from saturating the system.
-
Jessica Severin authored
when a bsub is issued without a job array and bjobs does not like 234234[0] as a job_id
-
Jessica Severin authored
added to a users path and the programs run from any directory
-
Jessica Severin authored
which causes a more even distribution of workers types as they are created by the Queen.
-
- 14 Jul, 2004 5 commits
-
-
Jessica Severin authored
lsf job arrays (e.g. 72344[3]). Also changed analysis_job index (analysis_id, status) so that the analysis_id is indexed first which provides better indexing when only the analysis_id is specified.
-
Jessica Severin authored
also changed time intervals (overdue workers now to 75 minutes and poll interval to 5 minutes). The polling load to check hive status is essentially zero.
-
Jessica Severin authored
Can't seem to figure out way of passing lsf job_id and array_index as parameter so hardcoded access of environment variables LSB_JOBID and LSB_JOBINDEX inside runWorker script. If both variables are set this should imply that the worker was created by an lsf deamon and to use these values to check the worker's life state in time and space. Otherwise the process_id will fall back on the ppid of the process (on the 'host' it's running on).
-
Jessica Severin authored
setting (e.g. lsf job_id and array_index
-
Jessica Severin authored
are stored in the worker.out output file.
-
- 13 Jul, 2004 5 commits
-
-
Jessica Severin authored
workers currently running
-
Jessica Severin authored
are running so that any unregistered worker can be assumed to be a fatality and it's jobs reset. Also added -loop option to run the beekeeper in an autonomous manner.
-
Jessica Severin authored
the analysis has effectively no system load and an unlimited number of workers of that analysis can run at any one time.
-
Jessica Severin authored
added get_num_needed_workers method which does a load analysis between the living workers and the workers needed to complete the available jobs. Returns a simple count which a beekeeper can use to allocate workers on computers. The workers are created without specific analyses but get assigned one by the Queen when they are created.
-
Jessica Severin authored
system load (higher hive_capacity) are picked first
-
- 09 Jul, 2004 5 commits
-
-
Jessica Severin authored
Also added functionality so that runWorker can be run without specification of an analysis. The create_new_worker method now will query for a 'needed worker' analysis from the AnalysisStats adaptor when the analysis_id is undef. This simplifies the API interface between the Queen and the beekeepers. Now the beekeeper only needs to receive a count of workers. The workers can still be run with explicit analyses for testing or situations where one wants to manually control the processing. Now one can simply do bsub -JW[1-100] runWorker -url mysql://ensadmin:<pass>@ecs2:3361/compara_hive_jess_23 to create 100 workers which will become whatever analysis that needs to be done.
-
Jessica Severin authored
-
Jessica Severin authored
decrement_needed_workers method. These are used by the Queen to pick an analysis for a newly created worker when one wasn't specified
-
Jessica Severin authored
-
Jessica Severin authored
-
- 08 Jul, 2004 5 commits
-
-
Jessica Severin authored
-
Jessica Severin authored
use when one knows all the workers are done running, so that any worker not registered properly is assumed to be a fatality.
-
Jessica Severin authored
job reseting. This allowed direct UPDATE..WHERE.. sql to be used. Also changed the retry_count system: retry_count is only incremented for jobs that failed (status in ('GET_INPUT','RUN','WRITE_OUTPUT')). Job that were CLAIMED by the dead worker are just reset without incrementing the retry_count since they were never attempted to run. Also the fetching of claimed jobs now has an 'ORDER BY retry_count' so that jobs that have failed are at the bottom of the list of jobs to process. This allows the 'bad' jobs to filter themselves out.
-
Jessica Severin authored
created registered to the LSF beekeeper, and the 'dead' check is done only where the beekeeper is LSF and it's 15minutes overdue for it's checkin. The check is done with an ssh to the workers registered host machine and a 'ps' command to see if the registered process_id of the worker is still running. This allows jobs to be submitted via lsf arrays (which only give a single LSF job id for the entire array), but still allows each worker to be checked separately.
-
Jessica Severin authored
compute resource
-
- 06 Jul, 2004 4 commits
-
-
Abel Ureta-Vidal authored
-
Abel Ureta-Vidal authored
Modified the way to build the worker_cmd, so that it allows now to add -limit $batch_size when the hive_capacity is < 0.
-
Abel Ureta-Vidal authored
-
Abel Ureta-Vidal authored
-
- 21 Jun, 2004 1 commit
-
-
Jessica Severin authored
-
- 19 Jun, 2004 1 commit
-
-
Jessica Severin authored
analysis_id to identify which analysis this worker is to use
-
- 17 Jun, 2004 6 commits
-
-
Jessica Severin authored
need that vacation ;-)
-
Jessica Severin authored
-
Abel Ureta-Vidal authored
Removed single quote around a question mark in the prepare statement. Otherwise it is not considered as an argument to be passed in the execute statement later on
-
Abel Ureta-Vidal authored
-
Jessica Severin authored
defined in the analysis_stats table
-
Jessica Severin authored
(was still using the original idea of setting batch_size as constant in the RunnableDB class). Need to deprecate that idea.
-
- 16 Jun, 2004 2 commits
-
-
Jessica Severin authored
-
Jessica Severin authored
-