Created by: s-mm
This PR combines changes required to resolve ENSCORESW-3411 and ENSCORESW-3390.
Missing attributes
Difference in number of the same attribute
Difference in letter case for vega hashkey for transcript
Trailing white space in vega hashkey
Difference in status of biotypes
Similar reasons were observed for author changes at gene level.
'vega_name', 'TAGENE_transcript', 'MANE_Select', 'ccds_transcript', 'miRNA', 'ncRNA', 'Frameshift'
. Whenever a locus is edited, the code checks if these attributes are present. As these attributes are not added to gene_attrib/transcript_attrib table, the code assumes the gene/transcript has been edited even when it has not been edited.'vega_name', 'TAGENE_transcript', 'MANE_Select', 'ccds_transcript', 'miRNA', 'ncRNA', 'Frameshift'
is set for transcripts. 'vega_name'
is set for genes.
Note:
upstream_ATG, parent_exon_key, parent_sid
attributes have not been handled. This is due to lack of testing.
Author changes have been noticed for transcripts due to change in status for few biotypes (miRNA, ncRNA). Author changes have been noticed for genes as the code sets the 'name'
attribute for these genes. This has been noticed for genes that were introduced in the DB as part of the NoMerge process.
Random author changes have been noticed for ENSE and ENSP. This is due to change in vega_hashkey.