Commits · 5304bc3ff85d98dc99c8279910daf47b893e8289 · ensembl-gh-mirror / ensembl-hive

This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git. Pull mirroring updated 5 minutes ago.

Jan 13, 2005
- added method reset_job which can be used to hard reset a job so it can run again. · 5304bc3f
  Jessica Severin authored 20 years ago
```
Initially used to manually re-run a job with runWorker.pl -job_id
```
  5304bc3f
- added methods analysis_job_id and debug to both Pipeline::RunnableDB · 9270ced7
  Jessica Severin authored 20 years ago
```
and Analysis::RunnableDB superclasses
```
  9270ced7
- updated the perldoc · 7fe49577
  Jessica Severin authored 20 years ago
  
  7fe49577
Jan 12, 2005
- need to set status of job to GET_INPUT prior to executing fetch_input to · f8b850d6
  Jessica Severin authored 20 years ago
```
properly handle RunnableDBs that throw exceptions in the fetch_input stage.
```
  f8b850d6
Jan 11, 2005
- updated perldoc · 28bd3640
  Jessica Severin authored 20 years ago
```
changed INSERT syntax to be more SQL compliant
```
  28bd3640
- fixed perldoc · 74079acc
  Jessica Severin authored 20 years ago
```
changed INSERT syntax to be more SQL compliant
```
  74079acc
- changed INSERT syntax to be more SQL compliant · 859d337f
  Jessica Severin authored 20 years ago
  
  859d337f
- fixed perldoc · 519e47d0
  Jessica Severin authored 20 years ago
```
changed INSERT syntax to be more SQL compliant
```
  519e47d0
- changed sql INSERT syntax to be compliant with postgreSQL · 20a523f0
  Jessica Severin authored 20 years ago
  
  20a523f0
- fixed perldoc and formating · 33dd49ae
  Jessica Severin authored 20 years ago
  
  33dd49ae
Jan 08, 2005
- Set the OUTPUT_AUTOFLUSH=1 to get info immediately when in -loop mode. Also... · b6c6d8c5
  Abel Ureta-Vidal authored 20 years ago
```
Set the OUTPUT_AUTOFLUSH=1 to get info immediately when in -loop mode. Also print out the time when the next loop will occur
```
  b6c6d8c5
- In synchronize_AnalysisStats method, added a POSIX::ceil when setting the... · 62e5b6e6
  Abel Ureta-Vidal authored 20 years ago
```
In synchronize_AnalysisStats method, added a POSIX::ceil when setting the num_required_workers for an AnalysisStats object
```
  62e5b6e6
Jan 07, 2005
- Brought up-to-date with lsf_beekeeper · 8cfe3d0f
  Will Spooner authored 20 years ago
  
  8cfe3d0f
Jan 06, 2005
- converted local_beekeeper to synchronize_hive API · b584611b
  Jessica Severin authored 20 years ago
  
  b584611b
Dec 14, 2004
- when 'unblocking' set new state to 'LOADING' to trigger a resync · ac52a5a4
  Jessica Severin authored 20 years ago
  
  ac52a5a4
- fixed bug in setting branch_code to something other than '1' · e66f5386
  Jessica Severin authored 20 years ago
  
  e66f5386
Dec 13, 2004
- fixed abreviation of condition_analysis_url to just a 'logic_name' · 3db75c83
  Jessica Severin authored 20 years ago
```
used to always store a full URL
```
  3db75c83
Dec 10, 2004
- DataflowRuleAdaptor::fetch_from_analysis_job return [] if the job doesn't have a valid analysis_id · f7d5a83e
  Jessica Severin authored 20 years ago
  
  f7d5a83e
Dec 09, 2004
- added throw if an object can't be fetched for the requested analysis_id · bf31cdec
  Jessica Severin authored 20 years ago
```
in AnalysisStatsAdaptor::fetch_by_analysis_id
```
  bf31cdec
- added more perldoc · 01c231c6
  Jessica Severin authored 20 years ago
```
modified  Bio::EnsEMBL::Analysis::stats to not do any exception catching
if  AnalysisStatsAdaptor->fetch_by_analysis_id fails there is something
very wrong and the exception should propogate out and cause the program to fail
```
  01c231c6
- fixed divide by zero error when a hive is empty (no jobs) · b006cf5a
  Jessica Severin authored 20 years ago
  
  b006cf5a
Nov 30, 2004

added default methods reset_job, global_cleanup, and branch_code to · da353cab

Jessica Severin authored 20 years ago

Bio::EnsEMBL::Analysis::RunnableDB via namespace extension syntax
so that hive system can use analysis.modules that inherit from Bio::EnsEMBL::Analysis::RunnableDB

da353cab

Nov 25, 2004
- altered monitoring options. default now shows simple summaries with a percent progress · 78917424
  Jessica Severin authored 20 years ago
```
and gives the user a good overview of where processing is at.
added -analysis_stats and -worker_stats which give full statistics on all analyses
and all running workers respectfully.
```
  78917424
- added Queen::get_hive_progress which return a whole hive 'done jobs' and · 9e9838cd
  Jessica Severin authored 20 years ago
```
'total jobs' count so one can calculate a 'progress bar'
```
  9e9838cd
- split print_hive_status so that there are 2 methods print_analysis_status · 045a9d7a
  Jessica Severin authored 20 years ago
```
and print_running_worker_status.
```
  045a9d7a
Nov 24, 2004
- only hard sync when there are no running workers and there are no needed workers · fec86879
  Jessica Severin authored 20 years ago
  
  fec86879
- added -job_id option to pull a specific job from the database and run/rerun it · 2bf882a3
  Jessica Severin authored 20 years ago
  
  2bf882a3
- added method get_num_running_workers which just returns a count. · e819451a
  Jessica Severin authored 20 years ago
```
used by lsf_beekeeper to decide when it needs to do a hard resync.
```
  e819451a
- added parameter option to $worker->run to allow passing in of a specific job · d9e80329
  Jessica Severin authored 20 years ago
```
to run with.  eg $worker->run($job);  This job can be pulled from database
or created on the fly.  This is to accomodate debug modes of runWorker.pl
```
  d9e80329
Nov 22, 2004
- updated perldoc comments · a608a06c
  Jessica Severin authored 20 years ago
  
  a608a06c
Nov 20, 2004

added routine that shows running workers when in minimal stats mode · 0ef25863
Jessica Severin authored 20 years ago

0ef25863

new lsf_beekeeper to correspond to new distributed Queen system. · 7a104fd3

Jessica Severin authored 20 years ago

no longer does Queen::synchronize_hive as part of autonomous loop
-sync option allows user to manually trigger a hard sync.
also removed default display of full hive status and addded option -status
which will print this full status.
Also removed adjusting needed worker count for 'pending' workers. lsf will
sometimes leave jobs in pending state for no apparent reason (new bsubed job
will run yet older pending job stays pending). Current 'pending' count also didn't
differentiate between lsf_beekeeper submited jobs and manually submitted
jobs. This pend adjustment isn't a critcal subsystem so I've removed it for
now.  If a runWorker starts (after a long pend) and there is no work left
it will die immeadiately.  I may rewrite a smarter 'pending' adjustment
in the future.

7a104fd3

New distributed Queen system. Queen/hive updates its state in an incremental · e3d44c7e

Jessica Severin authored 20 years ago

and distributed manner as it interacts with the workers over the course of its life.
When a runWorker.pl script starts and asks a queen to create a worker the queen has
a list of known analyses which are 'above the surface' where full hive analysis has
been done and the number of needed workers has been calculated. Full synch requires
joining data between the analysis, analysis_job, analysis_stats, and hive tables.
When this reached 10e7 jobs, 10e4 analyses, 10e3 workers a full hard sync took minutes
and it was clear this bit of the system wasn't scaling and wasn't going to make it
to the next order of magnitude. This occurred in the compara blastz pipeline between
mouse and rat.
Now there are some analyses 'below the surface' that have partial synchronization.
These analyses have been flagged as having 'x' new jobs (AnalysisJobAdaptor updating
analysis_stats on job insert). If no analysis is found to asign to the newly
created worker, the queen will dip below the surface and start checking
the analyses with the highest probablity of needing the most workers.
This incremental sync is also done in Queen::get_num_needed_workers
When calculating ahead a total worker count, this routine will also dip below
the surface until the hive reaches it's current defined worker saturation.
A beekeeper is no longer a required component for the system to function.
If workers can get onto cpus the hive will run. The beekeeper is now mainly a
user display program showing the status of the hive. There is no longer any
central process doing work and one hive can potentially scale
beyond 10e9 jobs in graphs of 10e6 analysis nodes and 10e6 running workers.

e3d44c7e

Nov 19, 2004

changed fetch_by_status to order returned results so the stats which have · af03e291
Jessica Severin authored 20 years ago
```
the most time since last update are at the top of the returned list
```
af03e291

Change for distributed smart Queen system. · c05ce49d

Jessica Severin authored 20 years ago

When jobs are inserted into the analysis_job table, the analysis_stats table
for the given analysis is updated by incrementing the total_job_count,
and unclaimed_job_count and setting the status to 'LOADING'.
If the analysis is 'BLOCKED' this incremental update does not happen.
When an analysis_stats is 'BLOCKED' and then unblocked this automatically
will trigger a resync so this progress partial update is not needed.

c05ce49d

switched analysis_stats.status default back to 'READY' so as not to trigger · 93cf124c
Jessica Severin authored 20 years ago
```
sync unless some jobs have been loaded.
```
93cf124c
removed 'SYNCHING' type from analysis_stats table since I didn't need it · 8d1fd3b3
Jessica Severin authored 20 years ago
```
also changed default to 'LOADING' so that it can trigger a sync
```
8d1fd3b3

Nov 18, 2004
- moved the encode_hash function into the 'top level' name space (main::encode_hash) · 08694552
  Jessica Severin authored 20 years ago
  
  08694552
- shortened output from print_stats so that it fits on screen better · 2855d34c
  Jessica Severin authored 20 years ago
  
  2855d34c
Nov 17, 2004
- fixed bug in fetch_by_status: need to add ',' to list of status names when passing more · cf7db69b
  Jessica Severin authored 20 years ago
```
than one status
```
  cf7db69b