Skip to content

Fix for ENSCORESW-2340 (handling NULL translation versions)

Marek Szuba requested to merge github/fork/james-monkeyshines/master into master

Created by: james-monkeyshines

Description

If a translation version is 'undef' (i.e. NULL in the database), override the 'new' method's default version (i.e. 1) immediately after the translation object is created.

Use case

The code that generates EMBL and GenBank dumps (SeqDumper.pm) adds spurious version numbers to translation stable IDs (called "protein_id" rows in the file). This is because the code pre-emptively loads the transcripts (via 'get_all_Genes'); which saves time, but means that translations are created with the 'new' method. The translation data is extracted from the database, so NULL versions lead to $version being set to undef. Which would be fine if the 'new' didn't then default to setting the version to '1' if it is undefined...

The suggestion to not load transcripts via get_all_Genes in the comments of ENSCORESW-2340 is undesirable, because it already takes quite a long time to generate these files (~12 hours all told, for all divisions, excluding bacteria).

The fix proposed here sets the object's version to accurately reflect what is in the database, and makes the Translation:stable_id_version method behave consistently, whether you lazy load or not.

Benefits

EMBL and GenBank dumps have correct stable IDs for EG species, and UniParc don't have to send us an email every release, telling us our dumps are wrong...

Possible Drawbacks

Can't think of anything

Testing

Have you added/modified unit tests to test the changes? No

Have you run the entire test suite and no regression was detected? Yes.

Merge request reports