Commits · 35fc8164c14c754fd88ea5996ad140b7076ba7c7 · ensembl-gh-mirror / ensembl-hive

This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git. Pull mirroring updated 11 minutes ago.

Mar 04, 2005
- switched output_dir logic back to simple one level since subdir tree actually · 35fc8164
  Jessica Severin authored 20 years ago
```
made the worse (Tim Cutts).  This will do until we figure this out....
I like the '>/dev/null + rerun failed jobs manually with debug' option personally :)
```
  35fc8164
- added convenience method parameters (redirects to $self->analysis->parameters). · 00441cbb
  Jessica Severin authored 20 years ago
  
  00441cbb
Mar 03, 2005

changed the way the output_dir is created. The hive_id is split so that · 1a3fce82

Jessica Severin authored 20 years ago

each digit becomes a directory with a final directory created with the full hive_id
hive_id=1234 => <base_dir>/1/2/3/4/hive_id_1234/
hive_id=12   => <base_dir>/1/2/hive_id_12/
this should distribute the output directories

1a3fce82

Queen::synchronize_analysis_stats changed the way the AnalysisStats::num_required_workers · b3f126ec
Jessica Severin authored 20 years ago
```
is calculated.  If batch_size>0 use batch_size, else use avg_msec_per_job equation.
```
b3f126ec

created Bio::EnsEMBL::Hive::Process as a processing module superclass alternative · 55cd98de

Jessica Severin authored 20 years ago

to RunnableDB to allow full benefit of dataflow graph capabilities.
- Removed from Extension.pm branch_code, analysis_job_id, reset_job extensions to
  RunnableDB (no longer trying to shoe-horn hive 'extra' functions into them)
- Bio::EnsEMBL::Hive::Process mirrors some of the RunnableDB interface
  (new, analysis, fetch_input, run, write_output)
  but uses a new job interface (input_job, dataflow_output_id) instead of
  input_id (but provides convenience method $self->input_id which redirects to
  $self->input_job->input_id to simplify porting)
- Changed Worker to only use hive 'extended' function if the processing module
  isa(Bio::EnsEMBL::Hive::Process).  Also allows all RunnableDB modules to
  still be used (or any object which implements a minimal 'RunnableDB interface')
  (new, input_id, db, fetch_input, run, write_output)

55cd98de

DEBUG: found a Hive Graph which caused the blocking control logic to fail. · d6e4f9f9
Jessica Severin authored 20 years ago
```
reordered where the blocking checks are done (added, deleted, moved).
```
d6e4f9f9
fixed docs · 5a7600c2
Jessica Severin authored 20 years ago

5a7600c2
added printing of analysis_stats also when queen couldn't create a new worker · fb7e9969
Jessica Severin authored 20 years ago

fb7e9969

added option -analysis_stats which will print the analysis_stats and next · 0ad2158f

Jessica Severin authored 20 years ago

needed workers after this worker is done.  Useful in debugging one's dataflow
and blocking_ctrl graphs by running one worker at a time (like stepping in a debugger)

0ad2158f

added back in auto-flow of input_job to output on call to worker_register_job_done · e5d701ba
Jessica Severin authored 20 years ago

e5d701ba

Mar 02, 2005
- added method flow_output_job, so this can be called independent of finishing · 045f36b1
  Jessica Severin authored 20 years ago
```
a job that has been flowed into an analysis/process
```
  045f36b1
- fixed docs · 3bd7c141
  Jessica Severin authored 20 years ago
  
  branch-ensembl-29
  
  3bd7c141
- fixed perldoc · b271ca2b
  Jessica Severin authored 20 years ago
  
  b271ca2b
Feb 23, 2005
- removed reset_job function since this has been moved inside the Queen · 8bb1141c
  Jessica Severin authored 20 years ago
  
  8bb1141c
- changed reset_job API call · 0c945b3d
  Jessica Severin authored 20 years ago
```
added option -no_pend which ignores the pending_count when figuring out how many workers to submit
removed some superfluous calls to Queen::get_num_running_workers
```
  0c945b3d
- switched to new improved job reset API (Queen::reset_and_fetch_job_by_dbID) · 036deb95
  Jessica Severin authored 20 years ago
  
  036deb95
- reset_job_by_dbID now also resets retry_count and hive_id which is needed · bdcd57f6
  Jessica Severin authored 20 years ago
```
when debugging an analysis which fails and would increment the retry_count.
```
  bdcd57f6
- fixed job reset/claim logic and API. Works better for debugging. · ddbb2ad2
  Jessica Severin authored 20 years ago
  
  ddbb2ad2
- included a sum of failed jobs in the calculation of %completion · 1e5ff11e
  Jessica Severin authored 20 years ago
  
  1e5ff11e
- when hive tables are empty a SUM(...) return NULL so need to check for that · dbf16f37
  Jessica Severin authored 20 years ago
  
  dbf16f37
- added safety check of unclaimed_job_count==0 in order for analysis_stats · 74e7ce80
  Jessica Severin authored 20 years ago
```
to be promoted to 'DONE'
```
  74e7ce80
Feb 22, 2005
- patch to add columns to analysis_stats and analysis_job · 17afec0d
  Jessica Severin authored 20 years ago
  
  17afec0d
Feb 21, 2005

YAHRF (Yet Another Hive ReFactor).....chapter 1 · 7675c31c

Jessica Severin authored 20 years ago

needed to better manage the hive system's load on the database housing all
the hive related tables (in case the database is overloaded by multiple users).
Added analysis_stats.sync_lock column (and correspondly in Object and Adaptor)
Added Queen::safe_synchronize_AnalysisStats method which wraps over the
synchronize_AnalysisStats method and does various checks and locks to ensure
that only one worker is trying to do a 'synchronize' on a given analysis at
any given moment.
Cleaned up API between Queen/Worker so that worker only talks directly to the
Queen, rather than getting the underlying database adaptor.
Added analysis_job columns runtime_msec, query_count to provide more data on
how the jobs hammer a database (queries/sec).

7675c31c

Feb 17, 2005

added method AnalysisStatsAdaptor::increment_needed_workers · af273c18

Jessica Severin authored 20 years ago

called when worker dies to replace itself in the needed_workers count since
it's decremented when it's born, and it's counted as living (and subtracted)
as long as it's running. This gunarantees that another worker will quickly
be created after this one dies (and not need to wait for a synch to happen)

af273c18

added printing of final analysis_stats at end when worker dies (if debug) · 9828df41
Jessica Severin authored 20 years ago

9828df41

Feb 16, 2005
- improved dynamic synch logic. Only case where the 5 minute delay is needed · da413295
  Jessica Severin authored 20 years ago
```
is when there are lots of workers 'WORKING' so as to avoid them falling over
each other.  The 'WORKING' state only exists in the middle of a large run.
When the last worker dies, the state is 'ALL_CLAIMED' so the sync on death
will happen properly.  As the last pile of workers die they will all do
a synch, but that's OK since the system needs to be properly synched when
the last one dies since there won't be anybody left to do it.
Also added 10 minute check for if already 'SYNCHING' to deal with case if
worker dies in the middle of 'SYNCHING'.
```
  da413295
- fixed total hive completion percent so that it doesn't round up · fe71fc76
  Jessica Severin authored 20 years ago
  
  fe71fc76
- increased the 'adaptive' batch_size to run for an estimated 120 seconds · a922aa57
  Jessica Severin authored 20 years ago
  
  a922aa57
- changed synchronize_AnalysisStats so that it always checks for the 5minute delay · 1e9b17ca
  Jessica Severin authored 20 years ago
```
so to reduce the sychronization frequency.
```
  1e9b17ca
- new display format so it's more compact and easier to understand · cf7fbd73
  Jessica Severin authored 20 years ago
  
  cf7fbd73
- print_running_worker_status was added higher up in loop code so the flagged · 04490e06
  Jessica Severin authored 20 years ago
```
call lower down isn't needed.  Also needed to move the printing of the analysis_stats
up higher to better display with the new printing order.  Now -loop -analysis_stats looks right.
```
  04490e06
- removed analysisStats synch that always happened when worker is created (not needed) · 8c88eeab
  Jessica Severin authored 20 years ago
```
added check/set of status to 'SYNCHING' right before the synch procedure
so as to prevent multiple workers from simultaneously trying to synch at the same time.
```
  8c88eeab
- added 'SYNCHING' status to analysis_stats · b005b004
  Jessica Severin authored 20 years ago
  
  b005b004
Feb 14, 2005

fixed bug in logic for checking if a rule is already inserted. mysql DBI execute · dc9721d3

Jessica Severin authored 20 years ago

will returns 0E0 if 'zero rows are inserted' which perl intreprets are true so I
need to check for it explicitly. Also store method now returns 1 on 'new insert'
and '0' and 'already stored'.

dc9721d3

Feb 10, 2005

Queen now uses avg_msec_per_job to guess how many workers are needed to · 57bbbb75

Jessica Severin authored 20 years ago

complete an analysis. If no job has been run (0 msec) it will assume 1 job
per worker up to the hive_capacity (maximum parallization).
Also changed worker->process_id to be the pid of the process not the ppid.

57bbbb75

when Worker is told to run a specific job, it now 'reclaims' it so that · 7005253a
Jessica Severin authored 20 years ago
```
if it runs properly, the job looks like a normally claimed/fetched/run job
```
7005253a

added method AnalysisJobAdaptor::reclaim_job which is used when a worker · 32f96923

Jessica Severin authored 20 years ago

is asked to 're-run' a specific job.  By reclaiming, this job is properly
processed so when it finishes it looks like it was run normally by the system.

32f96923

Feb 09, 2005
- added check in main::encode_hash to make sure value is defined · 534bd240
  Jessica Severin authored 20 years ago
  
  534bd240
Feb 08, 2005

added option -m to allow passing of options to bsub -m · b2d385b4

Jessica Severin authored 20 years ago

extended display in automation looping to print stats on currently running workers
and an overall statistic on the progress of the whole hive (% and total jobs)

b2d385b4

Feb 07, 2005

added options · 43bc4431

Jessica Severin authored 20 years ago

  -debug <level> : turn on debug messages at <level>
  -no_cleanup    : don't perform global_cleanup when worker exits

43bc4431