Skip to content

Allow translation start and stop to be defined within a transcript-level seq-edit

Marek Szuba requested to merge github/fork/james-monkeyshines/master into master

Created by: james-monkeyshines

The way that translation start and end are defined in the core db schema make it impossible to insert sequence at the start or end that contains both UTR and CDS; in other words, translation cannot start or stop within a seq-edit.

Where you have a fragmented assembly, and therefore need to add seq_edits, it is common to have this situation, since genes will tend to be truncated, at one or both ends, rather than having missing coding sequence in the middle. The best workaround with the current code is to remove the UTRs, which is obviously not ideal.

There are probably much cleverer, and correspondingly more complicated, ways to address this. The simple/stupid solution I propose here is for a translation_attrib which overrides whatever is derived from the translation table, for start and/or stop. This attrib is only applied in the presence of seq-edits, so shouldn't slow down the module by looking for an attrib that won't exist for most species. I used an underscore prefix for the attrib codes, since that seemed to be the convention for seq-edit things. If this PR is accepted, I'll add the attribs to the production db.

I'd like this to be available for VectorBase, which is currently running on release 88 code. So if possible (and assuming you are happy to accept this change, of course) please could this be cherry-picked onto release/88 and release/89.

Merge request reports