Commits · a1d141a87bfdce9af115a7f92b6c9c3bac0d4c88 · ensembl-gh-mirror / ensembl-hive

This project is mirrored from https://:*****@github.com/Ensembl/ensembl-hive.git. Pull mirroring updated 18 minutes ago.

Oct 16, 2009
- Reverted back to the previous version (1.44) because the change to the query made it much slower. · a1d141a8
  Kathryn Beal authored 15 years ago
  
  a1d141a8
Sep 23, 2009
- do the same with one query · 26b0893c
  Leo Gordon authored 15 years ago
  
  26b0893c
Jul 13, 2009
- Meadow branch merged into trunk · dbaad5bf
  Leo Gordon authored 15 years ago
  
  lg4_after_merger_20090713
  
  dbaad5bf
Apr 03, 2009
- adding quick HIGHMEM option, also some postprocessing after deletes · dc4ec348
  Albert Vilella authored 15 years ago
  
  dc4ec348
Feb 15, 2009
- Resets retry_count to 0 for reset_job_by_dbID method · 2f34ef46
  Will Spooner authored 16 years ago
  
  lg4_base_20090306
  
  2f34ef46
May 28, 2008
- Implement Dynamic behaviour option · 15fbc94f
  Javier Herrero authored 16 years ago
  
  15fbc94f
Nov 16, 2007
- Implement new max_retry_count and failed_job_tolerance parameters in the analysis_stats table · f4531084
  Javier Herrero authored 17 years ago
  
  f4531084
Oct 12, 2006

the deletion of the method is done in the right place now - still, be careful... · 2567e962

Albert Vilella authored 18 years ago

the deletion of the method is done in the right place now - still, be careful about using this method - Albert Vilella and Michael Schuster

2567e962

Sep 04, 2006

adding a remove_analysis_id method that will DELETE FROM analysis,... · 88c04ec5

Albert Vilella authored 18 years ago

adding a remove_analysis_id method that will DELETE FROM analysis, analysis_stats, and analysis_job table WHERE analysis_id equals the number given

88c04ec5

Jun 12, 2006
- Change: create a global variable instead of a "magic number" for the maximum number of retries · 4ad12163
  Javier Herrero authored 18 years ago
  
  4ad12163
Oct 01, 2005

Since the workers now die (as original design) when the job fails · d09dc214

Jessica Severin authored 19 years ago

there is no longer the possibility that a worker might accidentally
claim the job it just failed on so there is no longer a need to check
the hive_id of the job when claiming. Removed check for hive_id

d09dc214

Aug 16, 2005

added system for job-level blocking/unblocking. This is a very fine grain · faead1e0

Jessica Severin authored 19 years ago

control structure where a process/program has been made aware of the job(s)
they are responsible for controlling. This is facilited via a job url:
mysql://ia64e:3306/jessica_compara32b_tree/analysis_job?dbID=6065355
AnalysisJobAdptor::CreateNewJob now returns this url on job creation.
When a job is datflowed, an array of these urls is returned (one for each rule).
Jobs can now be dataflowed from a Process subclass with blocking enabled.
A job can be fetched directly with one of these URLs.
A commandline utility ehive_unblock.pl has been added to unblock a url job.
To unblock a job do:
Bio::EnsEMBL::Hive::URLFactory->fetch($url)->update_status('READY');
This is primarily useful in asynchronous split process/parsing situations.

faead1e0

Aug 11, 2005

with new system to allow jobs to be 'failed' without killing the worker, · 30b5773b

Jessica Severin authored 19 years ago

I needed to add in a check to prevent the worker from grabbing the same
job back and trying to run it again. The retry works best when the job is
run on a different machine at a different moment in time (ie a different
hive_id). This randomizes the run environment.

30b5773b

Aug 09, 2005

added method reset_dead_job_by_dbID which does all the reset/fail logic · e4dded05

Jessica Severin authored 19 years ago

but on a specific job.  For new system which catches job exceptions and
fails that job, but allows the worker to continue working.

e4dded05

Jun 13, 2005
- changed behaviour so that claiming of jobs preferentially picks jobs that · aadc55d3
  Jessica Severin authored 19 years ago
```
have not been run before (< retry_count)
```
  aadc55d3
Mar 04, 2005

modified analysis_job_file table to better track job diagnostics · 6cd6c078

Jessica Severin authored 20 years ago

added columns hive_id and retry. Allows user to join to failed workers
in the hive table, and to see which retry level the job was at when the
STDOUT/STDERR files were generated. Sets at beginning of job run, and
deletes those for 'empty' files at job end.

6cd6c078

Mar 02, 2005
- fixed docs · 3bd7c141
  Jessica Severin authored 20 years ago
  
  branch-ensembl-29
  
  3bd7c141
Feb 23, 2005
- reset_job_by_dbID now also resets retry_count and hive_id which is needed · bdcd57f6
  Jessica Severin authored 20 years ago
```
when debugging an analysis which fails and would increment the retry_count.
```
  bdcd57f6
Feb 21, 2005

YAHRF (Yet Another Hive ReFactor).....chapter 1 · 7675c31c

Jessica Severin authored 20 years ago

needed to better manage the hive system's load on the database housing all
the hive related tables (in case the database is overloaded by multiple users).
Added analysis_stats.sync_lock column (and correspondly in Object and Adaptor)
Added Queen::safe_synchronize_AnalysisStats method which wraps over the
synchronize_AnalysisStats method and does various checks and locks to ensure
that only one worker is trying to do a 'synchronize' on a given analysis at
any given moment.
Cleaned up API between Queen/Worker so that worker only talks directly to the
Queen, rather than getting the underlying database adaptor.
Added analysis_job columns runtime_msec, query_count to provide more data on
how the jobs hammer a database (queries/sec).

7675c31c

Feb 10, 2005

added method AnalysisJobAdaptor::reclaim_job which is used when a worker · 32f96923

Jessica Severin authored 20 years ago

is asked to 're-run' a specific job.  By reclaiming, this job is properly
processed so when it finishes it looks like it was run normally by the system.

32f96923

Feb 04, 2005

opps forgot to comment out debug line... · 3f5d31fb
Jessica Severin authored 20 years ago

3f5d31fb

added OODB logic to analysis_job.input_id · 48c12a6e

Jessica Severin authored 20 years ago

keep analysis_job.input_id as varchar(255) to allow UNIQUE(analysis_id,input_id)
but in adaptor added logic so that if input_id in AnalysisJob object exceeds
the 255 char limit to store/fetch from the analysis_data table. The input_id
in the analysis_job table becomes '_ext_input_analysis_data_id ##' which is a unique
internal variable to trigger the fetch routine to know to get the 'real' input_id
from the analysis_data table.
NO MORE 255 char limit on input_id and completely transparent to API user.

48c12a6e

Feb 01, 2005
- removed debug message · bfecec91
  Jessica Severin authored 20 years ago
  
  bfecec91
Jan 18, 2005
- in method reset_all_jobs_for_analysis_id need to set analysis_stats.status · dc569f39
  Jessica Severin authored 20 years ago
```
to 'LOADING' to trigger sync so system knows that something changed
```
  dc569f39
- refactored reset_job to reset_job_by_dbID to take a dbID rather than an object. · b6542054
  Jessica Severin authored 20 years ago
```
added method reset_all_jobs_for_analysis_id to facilitate re-flowing data through new dataflow rules.
extended perldoc
changed the 'retry count' to 7 (so runs 1 + 7 retrys)
```
  b6542054
Jan 13, 2005
- added method reset_job which can be used to hard reset a job so it can run again. · 5304bc3f
  Jessica Severin authored 20 years ago
```
Initially used to manually re-run a job with runWorker.pl -job_id
```
  5304bc3f
Jan 11, 2005
- fixed perldoc · 74079acc
  Jessica Severin authored 20 years ago
```
changed INSERT syntax to be more SQL compliant
```
  74079acc
Nov 22, 2004
- updated perldoc comments · a608a06c
  Jessica Severin authored 20 years ago
  
  a608a06c
Nov 19, 2004

Change for distributed smart Queen system. · c05ce49d

Jessica Severin authored 20 years ago

When jobs are inserted into the analysis_job table, the analysis_stats table
for the given analysis is updated by incrementing the total_job_count,
and unclaimed_job_count and setting the status to 'LOADING'.
If the analysis is 'BLOCKED' this incremental update does not happen.
When an analysis_stats is 'BLOCKED' and then unblocked this automatically
will trigger a resync so this progress partial update is not needed.

c05ce49d

Nov 09, 2004
- on job reset don't reset the hive_id so when debugging one can track the · ff6ee7cb
  Jessica Severin authored 20 years ago
```
failed job to the failed worker.
```
  ff6ee7cb
Oct 20, 2004

switched back to analysis_job.input_id · 77675743

Jessica Severin authored 20 years ago

changed to varchar(255) (but dropped joining to analysis_data table)
If modules need more than 255 characters of input_id
they can pass the anaysis_data_id via the varchar(255) : example {adid=>365902}

77675743

Oct 06, 2004

last tweaks to restore performance for separating input_id into analysis_data · 072c029a

Jessica Severin authored 20 years ago

table.  Doing join on analysis_job.input_analysis_data_id=analysis_data.analysis_data_id
gives same performance as having analysis_job.input_id in table rather than second query

072c029a

Oct 05, 2004

Second insert into analysis_data for job_creation added extra overhead. · f7182485

Jessica Severin authored 20 years ago

Removed select before store (made new method store_if_needed if that functionality is required by users)
and added option in AnalysisJobAdaptor::CreateNewJob to pass input_analysis_data_id
so if already know the CreateNewJob will be as fast as before. Plus there are no limits on the
size of the input_id string.

f7182485

Sep 30, 2004

modified analysis_job table : replaced input_id varchar(100) with · 2be90ea9

Jessica Severin authored 20 years ago

input_analysis_data_id int(10) which joins to analysis_data table.
added output_analysis_data_id int(10) for storing output_id
External analysis_data.data is LongText which will allow much longer
parameter sets to be passed around than was previously possible.
AnalysisData will also allow processes to manually store 'other' data and
pass it around via ID reference now.

2be90ea9

Aug 03, 2004
- removed references to Bio::EnsEMBL::Root and it's inherited methods · c81b5371
  Jessica Severin authored 20 years ago
```
created new() methods where needed, replaced throw, rearrange as needed
```
  c81b5371
Aug 02, 2004
- switched to dba->dbc for new Registry system · e28dfd1e
  Jessica Severin authored 20 years ago
  
  e28dfd1e
Jul 21, 2004

reimplemented check for non-BLOCKED jobs durring claiming process. Thus · 34d2a1ae

Jessica Severin authored 20 years ago

blocking can occurr both at the job level and the analysis level.
To block and unblock at the job level will require specific analyses to
determine logic, and will not be implemented in a generic way within the
hive system.

34d2a1ae

Jul 16, 2004

added FAILED status to annalysis_job which is set when a job is reset · eae8c3f1

Jessica Severin authored 20 years ago

on failure and retry_count>=5. Also changed Queen analysis summary to
classify an analysis as 'DONE' when all jobs are either DONE or FAILED
and hence allow the processing to proceed forward.

eae8c3f1

Jul 08, 2004

added hive_id index to analysis_job table to help with dead_worker · 27403dda

Jessica Severin authored 20 years ago

job reseting. This allowed direct UPDATE..WHERE.. sql to be used.
Also changed the retry_count system: retry_count is only incremented
for jobs that failed (status in ('GET_INPUT','RUN','WRITE_OUTPUT')).
Job that were CLAIMED by the dead worker are just reset without
incrementing the retry_count since they were never attempted to run.
Also the fetching of claimed jobs now has an 'ORDER BY retry_count'
so that jobs that have failed are at the bottom of the list of jobs
to process. This allows the 'bad' jobs to filter themselves out.

27403dda

Jun 16, 2004
- quiet down the debug statements to a bare minimum · 23391588
  Jessica Severin authored 20 years ago
  
  23391588