[ENSCORESW-2230]. Checking whether the same fragment is mapped to a different...
[ENSCORESW-2230]. Checking whether the same fragment is mapped to a different seq_region relies on checking whether the last target coordinate has been previously set and is on a different seq region. When looping sequentially over the mapped fragments, these are checked whether they need to be considered and if they need not, the loop was jumping to the next fragment, but only AFTER having registered the to be skipped fragment as the last seen target coordinate. A subsequent step might be considering a fragment on the same seq_region of the previously discarded fragment, but now the check of whether the last target coordinate has been previously set and is on a different seq region fails and so it might fail to detect the same source fragment is mapped to a different assembly bit, which happens in the particular cases on chr10 reported in the issue. WARNING: this change addresses the issue, as can be seen by the following output: Found chromosome:GRCh38:10:96229321:96229901:1 Retrieved 2 projected bits 1-581 - chromosome:GRCh37:10:97989077:97989657:1 582-1162 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1 but does not entirely solve problems. As it can be seen, the coordinates of the fragments in the slice of the source coordinate system and projected to two different seq_regions are deemed to be sequential where is reality they are overlapping (the same). NOTE: switching to the experimental branch as it is at this stage solves the two problems simulaneously: - the interval tree just detects and then consider two overlapping fragments hence it does not have to loop over irrelevant fragments which might compromise the logic; even if this is the case, the code in the experimental branch already skips irrelevant fragments at the beginning of the loop; - the coordinates of the projected bits in the original coordinate systems are already being correctly returned by virtue of a previous fix [ENSCORESW-2289]. This is the output obtained with the new branch: 1-581 - chromosome:GRCh37:10:97989077:97989657:1 1-581 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
Please register or sign in to comment