ENSCORESW-2742: keep existing descriptions. Only for merged species
- Jun 04, 2018
-
-
Magali Ruffier authoredac34a0e4
-
Created by: magaliruffier
Currently, the code deletes all gene descriptions in the core database prior to loading the xrefs, then writes new descriptions based on the xref mappings. The change proposed leaves existing descriptions in place but overwrites them with descriptions assigned by the xref mapping.
Gene descriptions are set by the xref pipeline. For species with manual annotation, we use the description from the xref chosen as the display xref. Not all genes will get a valid display_xref (the clone name is used in that case), and not all genes with a valid display_xref have a useful description. By leaving the existing descriptions in, we increase the percentage of genes with a useful description.
For example, see http://e92.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000283537;r=7:143620943-143645675 which does not have any description. With the proposed change, this gene would get a meaningful description from Havana, ie 'novel TRPM8 channel-associated factor family pseudogene'
Increased number of genes with descriptions. The change only applies to species with manual annotation, as it is in the set_display_xrefs_from_stable_table method which is only used for human, mouse, rat, zebrafish and pig. By default, species use the set_display_xrefs method.
Some descriptions from Havana might not be meaningful ('novel transcript'). If old descriptions are kept in the database, they will not be cleaned up by the new xref update.
Have you added/modified unit tests to test the changes?
NA
If so, do the tests pass/fail?
NA Have you run the entire test suite and no regression was detected?
NA