Skip to content
Snippets Groups Projects
Commit 75c83af4 authored by Patrick Meidl's avatar Patrick Meidl
Browse files

added support for retrieving features and slices to be projected from a...

added support for retrieving features and slices to be projected from a different database; documentation
parent 19331e11
No related branches found
No related tags found
No related merge requests found
...@@ -7,12 +7,36 @@ utility class to post-process projections from one assembly to another ...@@ -7,12 +7,36 @@ utility class to post-process projections from one assembly to another
=head1 SYNOPSIS =head1 SYNOPSIS
# connect to an old database
my $dba_old = new Bio::EnsEMBL::DBSQL::DBAdaptor(
-host => 'ensembldb.ensembl.org',
-port => 3306,
-user => ensro,
-dbname => 'mus_musculus_core_46_36g',
-group => 'core_old',
);
# connect to the new database containing the mapping between old and new
# assembly
my $dba_new = new Bio::EnsEMBL::DBSQL::DBAdaptor(
-host => 'ensembldb.ensembl.org',
-port => 3306,
-user => ensro,
-dbname => 'mus_musculus_core_47_37',
-group => 'core_new',
);
my $assembly_projector = Bio::EnsEMBL::Utils::AssemblyProjector->new( my $assembly_projector = Bio::EnsEMBL::Utils::AssemblyProjector->new(
-OLD_ASSEMBLY => NCBIM36, -OLD_ASSEMBLY => 'NCBIM36',
-NEW_ASSEMBLY => NCBIM37, -NEW_ASSEMBLY => 'NCBIM37',
-ADAPTOR => $dba_new,
-EXTERNAL_SOURCE => 1,
-MERGE_FRAGMENTS => 1,
-CHECK_LENGTH => 0,
); );
# fetch a slice on the old assembly # fetch a slice on the old assembly
my $slice_adaptor = $dba_old->get_SliceAdaptor;
my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, undef, undef, my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, undef, undef,
undef, 'NCBIM36'); undef, 'NCBIM36');
...@@ -20,11 +44,45 @@ utility class to post-process projections from one assembly to another ...@@ -20,11 +44,45 @@ utility class to post-process projections from one assembly to another
=head1 DESCRIPTION =head1 DESCRIPTION
This class implements some utility functions which apply sensible rules to the This class implements some utility functions for converting coordinates between
results of projecting a feature or slice from one assembly to another. assemblies. A mapping between the two assemblies has to present the database for
this to work, see the 'Related Modules' section below on how to generate the
mapping.
In addition to the "raw" projecting of features and slices, the methods in this
module also apply some sensible rules to the results of the projection (like
discarding unwanted results or merging fragmented projections). These are the
rules (depending on configuration):
Discard the projected feature/slice if:
1. it doesn't project at all (no segments returned)
2. [unless MERGE_FRAGMENTS is set] the projection is fragmented (more
than one segment)
3. [if CHECK_LENGTH is set] the projection doesn't have the same length
as the original feature/slice
4. all segments are on same chromosome and strand
If a projection fails any of these rules, undef is returned instead of a
projected feature/slice.
Also note that when projecting features, only a shallow projection is performed,
i.e. other features attached to your features (e.g. the transcripts of a gene)
are not projected automatically, so it will be the responsability of the user
code project all levels of features involved.
=head1 METHODS =head1 METHODS
new
project
old_to_new
new_to_old
adaptor
external_source
old_assembly
new_assembly
merge_fragments
check_length
=head1 REALTED MODULES =head1 REALTED MODULES
...@@ -65,15 +123,20 @@ use Bio::EnsEMBL::Slice; ...@@ -65,15 +123,20 @@ use Bio::EnsEMBL::Slice;
=head2 new =head2 new
Arg [ADAPTOR] : Bio::EnsEMBL::DBSQL::DBAdaptor $adaptor - a db adaptor
for a database containing the assembly mapping
Arg [EXTERNAL_SOURCE] : (optional) Boolean $external_source - indicates if
source is from a different database
Arg [OLD_ASSEMBLY] : name of the old assembly Arg [OLD_ASSEMBLY] : name of the old assembly
Arg [OLD_ASSEMBLY] : name of the new assembly Arg [OLD_ASSEMBLY] : name of the new assembly
Arg [OBJECT_TYPE] : (optional) object type ('slice' or 'feature') Arg [OBJECT_TYPE] : (optional) object type ('slice' or 'feature')
Arg [MERGE_FRAGMENTS] : (optional) Boolean - determines if segments are merged Arg [MERGE_FRAGMENTS] : (optional) Boolean - determines if segments are merged
to return a single object spannin all segments to return a single object spanning all segments
(default: true) (default: true)
Arg [CHECK_LENGTH] : (optional) Boolean - determines if projected objects Arg [CHECK_LENGTH] : (optional) Boolean - determines if projected objects
have to have same length as original (default: false) have to have same length as original (default: false)
Example : my $ap = Bio::EnsEMBL::Utils::AssemblyProjector->new( Example : my $ap = Bio::EnsEMBL::Utils::AssemblyProjector->new(
-DBADAPTOR => $dba,
-OLD_ASSEMBLY => NCBIM36, -OLD_ASSEMBLY => NCBIM36,
-NEW_ASSEMBLY => NCBIM37, -NEW_ASSEMBLY => NCBIM37,
); );
...@@ -91,26 +154,26 @@ sub new { ...@@ -91,26 +154,26 @@ sub new {
my $caller = shift; my $caller = shift;
my $class = ref($caller) || $caller; my $class = ref($caller) || $caller;
my ($old_assembly, $new_assembly, $object_type, $merge_fragments, my ($adaptor, $external_source, $old_assembly, $new_assembly,
$check_length) = rearrange([qw(OLD_ASSEMBLY NEW_ASSEMBLY OBJECT_TYPE $merge_fragments, $check_length) = rearrange([qw(ADAPTOR EXTERNAL_SOURCE
MERGE_FRAGMENTS CHECK_LENGTH)], @_); OLD_ASSEMBLY NEW_ASSEMBLY MERGE_FRAGMENTS CHECK_LENGTH)], @_);
unless ($old_assembly and $new_assembly) { unless ($adaptor and ref($adaptor) and
throw("You must provide an old and new assembly name."); $adaptor->isa('Bio::EnsEMBL::DBSQL::DBAdaptor')) {
throw("You must provide a DBAdaptor to a database containing the assembly mapping.");
} }
if ($object_type and ! unless ($old_assembly and $new_assembly) {
(lc($object_type) eq 'feature' or lc($object_type) eq 'slice')) { throw("You must provide an old and new assembly name.");
throw("Type must be 'feature' or 'slice'");
} }
my $self = {}; my $self = {};
bless ($self, $class); bless ($self, $class);
# initialise # initialise
$self->{'adaptor'} = $adaptor;
$self->{'old_assembly'} = $old_assembly; $self->{'old_assembly'} = $old_assembly;
$self->{'new_assembly'} = $new_assembly; $self->{'new_assembly'} = $new_assembly;
$self->{'object_type'} = $object_type;
# by default, merge fragments # by default, merge fragments
$self->{'merge_fragments'} = $merge_fragments || 1; $self->{'merge_fragments'} = $merge_fragments || 1;
...@@ -118,6 +181,10 @@ sub new { ...@@ -118,6 +181,10 @@ sub new {
# by default, do not check length # by default, do not check length
$self->{'check_length'} = $check_length || 0; $self->{'check_length'} = $check_length || 0;
# by default, features and slices are expected in same database as the
# assembly mapping
$self->{'external_source'} = $external_source || 0;
return $self; return $self;
} }
...@@ -136,13 +203,17 @@ sub new { ...@@ -136,13 +203,17 @@ sub new {
seq_region and strand. If -MERGE_FRAGMENTS is set, gaps will be seq_region and strand. If -MERGE_FRAGMENTS is set, gaps will be
bridged by creating a single object from first_segment_start to bridged by creating a single object from first_segment_start to
last_segment_end. If -CHECK_LENGTH is set, the projected object last_segment_end. If -CHECK_LENGTH is set, the projected object
will have to have the same length as the original. will have to have the same length as the original. Please see
the comments in the code for more details about these rules.
The return value of this method will always be a single object. The return value of this method will always be a single object,
or undef if the projection fails any of the rules.
Please see the comments in the code for details about these Note that when projecting features, only a "shallow" projection
rules. is performed, i.e. attached features aren't projected
Return type : same a Arg 1 automatically! (e.g. if you project a gene, its transcripts will
have to be projected manually before storing the new gene)
Return type : same a Arg 1, or undef if projection fails any of the rules
Exceptions : thrown on invalid arguments Exceptions : thrown on invalid arguments
Caller : general, $self->old_to_new, $self->new_to_old Caller : general, $self->old_to_new, $self->new_to_old
Status : At Risk Status : At Risk
...@@ -159,24 +230,55 @@ sub project { ...@@ -159,24 +230,55 @@ sub project {
my ($slice, $object_type); my ($slice, $object_type);
if ($object->isa('Bio::EnsEMBL::Feature')) { if ($object->isa('Bio::EnsEMBL::Feature')) {
$slice = $object->feature_Slice;
$object_type = 'feature'; $object_type = 'feature';
} elsif ($object->isa('Bio::EnsEMBL::Slice')) { } elsif ($object->isa('Bio::EnsEMBL::Slice')) {
$slice = $object;
$object_type = 'slice'; $object_type = 'slice';
} else { } else {
throw("Need a Feature or Slice to project."); throw("Need a Feature or Slice to project.");
} }
# if the feature or slice is sourced from another db, we have to "transfer"
# it to the db that contains the assembly mapping. the transfer is very
# shallow but that should do for our purposes
if ($self->external_source) {
my $slice_adaptor = $self->adaptor->get_SliceAdaptor;
if ($object_type eq 'feature') {
# createa a new slice from the target db
my $f_slice = $object->slice;
my $target_slice = $slice_adaptor->fetch_by_name($f_slice->name);
# now change the feature so that it appears it's from the target db
$object->slice($target_slice);
} else {
# createa a new slice from the target db
$object = $slice_adaptor->fetch_by_name($object->name);
}
}
if ($object_type eq 'feature') {
$slice = $object->feature_Slice;
} else {
$slice = $object;
}
# warn if trying to project to assembly version the object already is on # warn if trying to project to assembly version the object already is on
if ($slice->coord_system->version eq $to_assembly) { if ($slice->coord_system->version eq $to_assembly) {
warning("Assembly version to project to ($to_assembly) is the same as your object's assembly (".$slice->coord_system->version.")."); warning("Assembly version to project to ($to_assembly) is the same as your object's assembly (".$slice->coord_system->version.").");
} }
# now project the slice
my $cs_name = $slice->coord_system_name; my $cs_name = $slice->coord_system_name;
my @segments = @{ $slice->project($cs_name, $to_assembly) };
# [todo] project $slice instead? what is more efficient for Features? # we need to reverse the projection segment list if the orignial
my @segments = @{ $object->project($cs_name, $to_assembly) }; if ($slice->strand == -1) {
@segments = reverse(@segments);
}
# apply rules to projection results # apply rules to projection results
# #
...@@ -190,15 +292,18 @@ sub project { ...@@ -190,15 +292,18 @@ sub project {
# test (1) # test (1)
return undef unless (@segments); return undef unless (@segments);
#warn "DEBUG: passed test 1\n";
# test (2) # test (2)
return undef if (!($self->merge_fragments) and scalar(@segments) > 1); return undef if (!($self->merge_fragments) and scalar(@segments) > 1);
#warn "DEBUG: passed test 2\n";
# test (3) # test (3)
my $first_slice = $segments[0]->to_Slice; my $first_slice = $segments[0]->to_Slice;
my $last_slice = $segments[-1]->to_Slice; my $last_slice = $segments[-1]->to_Slice;
return undef if ($self->check_length and return undef if ($self->check_length and
($last_slice->end - $first_slice->start + 1) != $object->length); ($last_slice->end - $first_slice->start + 1) != $object->length);
#warn "DEBUG: passed test 3\n";
# test (4) # test (4)
my %sr_names = (); my %sr_names = ();
...@@ -209,6 +314,7 @@ sub project { ...@@ -209,6 +314,7 @@ sub project {
$strands{$sl->strand}++; $strands{$sl->strand}++;
} }
return undef if (scalar(keys %sr_names) > 1 or scalar(keys %strands) > 1); return undef if (scalar(keys %sr_names) > 1 or scalar(keys %strands) > 1);
#warn "DEBUG: passed test 4\n";
# everything looks fine, so adjust the coords of your feature/slice # everything looks fine, so adjust the coords of your feature/slice
my $new_slice = $first_slice; my $new_slice = $first_slice;
...@@ -217,17 +323,22 @@ sub project { ...@@ -217,17 +323,22 @@ sub project {
if ($object_type eq 'slice') { if ($object_type eq 'slice') {
return $new_slice; return $new_slice;
} else { } else {
$object->start($new_slice->start); $object->start($new_slice->start);
$object->end($new_slice->end); $object->end($new_slice->end);
$object->strand($new_slice->strand); # ???? $object->strand($new_slice->strand);
$object->slice($new_slice->seq_region_Slice); $object->slice($new_slice->seq_region_Slice);
# undef dbID and adaptor so you can store the feature in the target db
$object->dbID(undef);
$object->adaptor(undef);
return $object; return $object;
} }
} }
=head2 old_to_new =head2 old_to_new
Arg[1] : Bio::EnsEMBL::Slice or Bio::EnsEMBL::Feature $object - Arg[1] : Bio::EnsEMBL::Slice or Bio::EnsEMBL::Feature $object -
...@@ -235,7 +346,7 @@ sub project { ...@@ -235,7 +346,7 @@ sub project {
Example : my $new_slice = $assembly_projector->old_to_new($old_slice); Example : my $new_slice = $assembly_projector->old_to_new($old_slice);
Description : Projects a Slice or Feature from old to new assembly. Description : Projects a Slice or Feature from old to new assembly.
This method is just a convenience wrapper for $self->project. This method is just a convenience wrapper for $self->project.
Return type : same a Arg 1 Return type : same a Arg 1, or undef
Exceptions : none Exceptions : none
Caller : general Caller : general
Status : At Risk Status : At Risk
...@@ -256,7 +367,7 @@ sub old_to_new { ...@@ -256,7 +367,7 @@ sub old_to_new {
Example : my $old_slice = $assembly_projector->new_to_old($new_slice, 1); Example : my $old_slice = $assembly_projector->new_to_old($new_slice, 1);
Description : Projects a Slice or Feature from new to old assembly. Description : Projects a Slice or Feature from new to old assembly.
This method is just a convenience wrapper for $self->project. This method is just a convenience wrapper for $self->project.
Return type : same a Arg 1 Return type : same a Arg 1, or undef
Exceptions : none Exceptions : none
Caller : general Caller : general
Status : At Risk Status : At Risk
...@@ -273,6 +384,20 @@ sub new_to_old { ...@@ -273,6 +384,20 @@ sub new_to_old {
# #
# accessors # accessors
# #
sub adaptor {
my $self = shift;
$self->{'adaptor'} = shift if (@_);
return $self->{'adaptor'};
}
sub external_source {
my $self = shift;
$self->{'external_source'} = shift if (@_);
return $self->{'external_source'};
}
sub old_assembly { sub old_assembly {
my $self = shift; my $self = shift;
$self->{'old_assembly'} = shift if (@_); $self->{'old_assembly'} = shift if (@_);
...@@ -287,13 +412,6 @@ sub new_assembly { ...@@ -287,13 +412,6 @@ sub new_assembly {
} }
sub object_type {
my $self = shift;
$self->{'object_type'} = shift if (@_);
return $self->{'object_type'};
}
sub merge_fragments { sub merge_fragments {
my $self = shift; my $self = shift;
$self->{'merge_fragments'} = shift if (@_); $self->{'merge_fragments'} = shift if (@_);
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment