AnalysisJob.pm 3.71 KB
Newer Older
Jessica Severin's avatar
Jessica Severin committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
#
# You may distribute this module under the same terms as perl itself
#
# POD documentation - main docs before the code

=pod 

=head1 NAME

Bio::EnsEMBL::Hive::AnalysisJob

=cut

=head1 SYNOPSIS

Object which encapsulates the details of how to find jobs, how to run those
jobs, and then checked the rules to create the next jobs in the chain.
Essentially knows where to find data, how to process data, and where to
put it when it's done (put in next person's INBOX) so the next Worker
in the chain can find data to work on.

Hive based processing is a concept based on a more controlled version
of an autonomous agent type system.  Each worker is not told what to do
(like a centralized control system - like the current pipeline system)
but rather queries a central database for jobs (give me jobs).

Each worker is linked to an analysis_id, registers its self on creation
into the Hive, creates a RunnableDB instance of the Analysis->module,
gets $runnable->batch_size() jobs from the analysis_job table, does its
work, creates the next layer of analysis_job entries by querying simple_rule
table where condition_analysis_id = $self->analysis_id.  It repeats
this cycle until it's lived it's lifetime or until there are no more jobs left.
The lifetime limit is just a safety limit to prevent these from 'infecting'
a system.

The Queens job is to simply birth Workers of the correct analysis_id to get the
work down.  The only other thing the Queen does is free up jobs that were
claimed by Workers that died unexpectantly so that other workers can take
over the work.

=cut

=head1 DESCRIPTION

=cut

=head1 CONTACT

Jessica Severin, jessica@ebi.ac.uk

=cut

=head1 APPENDIX

The rest of the documentation details each of the object methods. 
Internal methods are usually preceded with a _

=cut

package Bio::EnsEMBL::Hive::AnalysisJob;

use strict;

use Bio::EnsEMBL::Root;
use Bio::EnsEMBL::Analysis;
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::Hive::Worker;

use vars qw(@ISA);

@ISA = qw(Bio::EnsEMBL::Root);


sub adaptor {
  my $self = shift;
  $self->{'_adaptor'} = shift if(@_);
  return $self->{'_adaptor'};
}

sub dbID {
  my $self = shift;
  $self->{'_dbID'} = shift if(@_);
  return $self->{'_dbID'};
}

sub input_id {
  my( $self, $value ) = @_;
  $self->{'_input_id'} = $value if($value);
  return $self->{'_input_id'};
}

sub hive_id {
  my $self = shift;
  $self->{'_hive_id'} = shift if(@_);
  return $self->{'_hive_id'};
}

sub analysis_id {
  my( $self, $value ) = @_;
  $self->{'_analysis_id'} = $value if($value);
  return $self->{'_analysis_id'};
}

sub job_claim {
  my( $self, $value ) = @_;
  $self->{'_job_claim'} = $value if($value);
  return $self->{'_job_claim'};
}

sub status {
  my( $self, $value ) = @_;

  if($value) {
    $self->{'_status'} = $value;
    $self->adaptor->update_status($self) if($self->adaptor);
  }
  return $self->{'_status'};
}

sub retry_count {
  my( $self, $value ) = @_;
  $self->{'_retry_count'} = $value if($value);
  return $self->{'_retry_count'};
}

sub completed {
  my( $self, $value ) = @_;
  $self->{'_completed'} = $value if($value);
  return $self->{'_completed'};
}

sub branch_code {
  my( $self, $value ) = @_;
  $self->{'_branch_code'} = $value if(defined($value));
  $self->{'_branch_code'} = 1 unless(defined($self->{'_branch_code'}));
  return $self->{'_branch_code'};
}

sub stdout_file {
  my( $self, $value ) = @_;
  $self->{'_stdout_file'} = $value if(defined($value));
  return $self->{'_stdout_file'};
}

sub stderr_file {
  my( $self, $value ) = @_;
  $self->{'_stderr_file'} = $value if(defined($value));
  return $self->{'_stderr_file'};
}

sub print_job {
  my $self = shift;
  print("WORKER: hive_id=",$self->hive_id,
     " host=",$self->host,
     " ppid=",$self->process_id,
     "\n");  
}

1;