Skip to content

Bugfix that causes warning messages when input exons are full UTR int…

Created by: duartemolha

…roduced in "ENSCORESW-2545"

Description

The commit here: https://github.com/Ensembl/ensembl/commit/74e499abb4fef9cba9b030e61c91604961941882

was trying to correct some discrepancies in the softmasking on non-coding sequences However, completely non-coding exons have a undefined $ex->coding_region_start resulting in warning messages

Use of uninitialized value in numeric gt (>) at ...EnsEMBL/Transcript.pm

After setting all the sequence to lower case at line 837 exon_seq = lc(exon_seq) any exons that does not have a defined coding start

the if statements if ($ex->coding_region_start($self) > $ex->start()) { and if ($ex->coding_region_end($self) < $ex->end()) { should never be done since both coding_region_start and/or coding_region_end will be undefined if if (!defined ($ex->coding_region_start($self))) is true

Use case

This bug will output warning messages for completely UTR exons where the optional softmask has been set

for example for gene DDR2 , transcript id ENST00000367921

even though the output softmasking is correct before and after my code change, in the updated code we do not get warning messages such as Use of uninitialized value in numeric gt (>) at .../ensembl/modules/Bio/EnsEMBL/Transcript.pm line XXX.

Benefits

The change I made makes it so that when checking a complete UTR exon (when soft_masking is requested) is all lowercase, and then the comparisons with coding start and coding end with the start and end of the exon are ignored,

Those comparisons are only done IF $ex->coding_region_start is defined if (!defined ($ex->coding_region_start($self))) { exon_seq = lc(exon_seq); }else{ if ($ex->coding_region_start($self) > $ex->start()) { ... }

      if ($ex->coding_region_end($self) < $ex->end()) {
        ...
     }

} $seq_string .= $exon_seq;

Possible Drawbacks

none that I can see

Testing

No. I have not created tests for this. The current tests available test to see if the boundaries between lowerCase and upperCase match. For exons that are completely UTR, both the previous code and the new code would make the entire exon sequence lowercase. The only difference is that after my change there is no invalid if comparisons with undefined values and therefore no warning messages.

Edited by Stefano Giorgetti

Merge request reports