Skip to content
  • Alessandro Vullo's avatar
    f96f7bcf
    [ENSCORESW-2230]. Checking whether the same fragment is mapped to a different... · f96f7bcf
    Alessandro Vullo authored
    [ENSCORESW-2230]. Checking whether the same fragment is mapped to a different seq_region relies on checking
    whether the last target coordinate has been previously set and is on a different seq region. When looping
    sequentially over the mapped fragments, these are checked whether they need to be considered and if they need not,
    the loop was jumping to the next fragment, but only AFTER having registered the to be skipped fragment as the
    last seen target coordinate.
    A subsequent step might be considering a fragment on the same seq_region of the previously discarded fragment,
    but now the check of whether the last target coordinate has been previously set and is on a different seq region
    fails and so it might fail to detect the same source fragment is mapped to a different assembly bit, which happens
    in the particular cases on chr10 reported in the issue.
    
    WARNING: this change addresses the issue, as can be seen by the following output:
    
    Found chromosome:GRCh38:10:96229321:96229901:1
    Retrieved 2 projected bits
    1-581 - chromosome:GRCh37:10:97989077:97989657:1
    582-1162 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
    
    but does not entirely solve problems. As it can be seen, the coordinates of the fragments in the slice of the
    source coordinate system and projected to two different seq_regions are deemed to be sequential where is reality
    they are overlapping (the same).
    
    NOTE: switching to the experimental branch as it is at this stage solves the two problems simulaneously:
    - the interval tree just detects and then consider two overlapping fragments hence it does not have to loop
    over irrelevant fragments which might compromise the logic; even if this is the case, the code in the experimental
    branch already skips irrelevant fragments at the beginning of the loop;
    - the coordinates of the projected bits in the original coordinate systems are already being correctly returned
    by virtue of a previous fix [ENSCORESW-2289].
    
    This is the output obtained with the new branch:
    
    1-581 - chromosome:GRCh37:10:97989077:97989657:1
    1-581 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
    f96f7bcf
    [ENSCORESW-2230]. Checking whether the same fragment is mapped to a different...
    Alessandro Vullo authored
    [ENSCORESW-2230]. Checking whether the same fragment is mapped to a different seq_region relies on checking
    whether the last target coordinate has been previously set and is on a different seq region. When looping
    sequentially over the mapped fragments, these are checked whether they need to be considered and if they need not,
    the loop was jumping to the next fragment, but only AFTER having registered the to be skipped fragment as the
    last seen target coordinate.
    A subsequent step might be considering a fragment on the same seq_region of the previously discarded fragment,
    but now the check of whether the last target coordinate has been previously set and is on a different seq region
    fails and so it might fail to detect the same source fragment is mapped to a different assembly bit, which happens
    in the particular cases on chr10 reported in the issue.
    
    WARNING: this change addresses the issue, as can be seen by the following output:
    
    Found chromosome:GRCh38:10:96229321:96229901:1
    Retrieved 2 projected bits
    1-581 - chromosome:GRCh37:10:97989077:97989657:1
    582-1162 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
    
    but does not entirely solve problems. As it can be seen, the coordinates of the fragments in the slice of the
    source coordinate system and projected to two different seq_regions are deemed to be sequential where is reality
    they are overlapping (the same).
    
    NOTE: switching to the experimental branch as it is at this stage solves the two problems simulaneously:
    - the interval tree just detects and then consider two overlapping fragments hence it does not have to loop
    over irrelevant fragments which might compromise the logic; even if this is the case, the code in the experimental
    branch already skips irrelevant fragments at the beginning of the loop;
    - the coordinates of the projected bits in the original coordinate systems are already being correctly returned
    by virtue of a previous fix [ENSCORESW-2289].
    
    This is the output obtained with the new branch:
    
    1-581 - chromosome:GRCh37:10:97989077:97989657:1
    1-581 - chromosome:GRCh37:HG339_PATCH:98024843:98025423:1
Loading