Skip to content
Snippets Groups Projects
Commit f96f7bcf authored by Alessandro Vullo's avatar Alessandro Vullo
Browse files

[ENSCORESW-2230]. Checking whether the same fragment is mapped to a different...

[ENSCORESW-2230]. Checking whether the same fragment is mapped to a different seq_region relies on checking
whether the last target coordinate has been previously set and is on a different seq region. When looping
sequentially over the mapped fragments, these are checked whether they need to be considered and if they need not,
the loop was jumping to the next fragment, but only AFTER having registered the to be skipped fragment as the
last seen target coordinate.
A subsequent step might be considering a fragment on the same seq_region of the previously discarded fragment,
but now the check of whether the last target coordinate has been previously set and is on a different seq region
fails and so it might fail to detect the same source fragment is mapped to a different assembly bit, which happens
in the particular cases on chr10 reported in the issue.

WARNING: this change addresses the issue, as can be seen by the following output:

Found chromosome:GRCh38:10:96229321:96229901:1
Retrieved 2 projected bits
1-581 - chromosome:GRCh37:10:97989077:97989657:1
582-1162 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1

but does not entirely solve problems. As it can be seen, the coordinates of the fragments in the slice of the
source coordinate system and projected to two different seq_regions are deemed to be sequential where is reality
they are overlapping (the same).

NOTE: switching to the experimental branch as it is at this stage solves the two problems simulaneously:
- the interval tree just detects and then consider two overlapping fragments hence it does not have to loop
over irrelevant fragments which might compromise the logic; even if this is the case, the code in the experimental
branch already skips irrelevant fragments at the beginning of the loop;
- the coordinates of the projected bits in the original coordinate systems are already being correctly returned
by virtue of a previous fix [ENSCORESW-2289].

This is the output obtained with the new branch:

1-581 - chromosome:GRCh37:10:97989077:97989657:1
1-581 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
parent 0e4e3a03
No related branches found
No related tags found
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment