Skip to content

ENSCORESW-2742: keep existing descriptions. Only for merged species

Marek Szuba requested to merge feature/havana_desc into master

Created by: magaliruffier

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • if you change the schema you must patch the test databases as well, see Updating the schema
    • the PR must not fail unit testing

Description

Currently, the code deletes all gene descriptions in the core database prior to loading the xrefs, then writes new descriptions based on the xref mappings. The change proposed leaves existing descriptions in place but overwrites them with descriptions assigned by the xref mapping.

Use case

Gene descriptions are set by the xref pipeline. For species with manual annotation, we use the description from the xref chosen as the display xref. Not all genes will get a valid display_xref (the clone name is used in that case), and not all genes with a valid display_xref have a useful description. By leaving the existing descriptions in, we increase the percentage of genes with a useful description.

For example, see http://e92.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000283537;r=7:143620943-143645675 which does not have any description. With the proposed change, this gene would get a meaningful description from Havana, ie 'novel TRPM8 channel-associated factor family pseudogene'

Benefits

Increased number of genes with descriptions. The change only applies to species with manual annotation, as it is in the set_display_xrefs_from_stable_table method which is only used for human, mouse, rat, zebrafish and pig. By default, species use the set_display_xrefs method.

Possible Drawbacks

Some descriptions from Havana might not be meaningful ('novel transcript'). If old descriptions are kept in the database, they will not be cleaned up by the new xref update.

Testing

Have you added/modified unit tests to test the changes?

NA

If so, do the tests pass/fail?

NA Have you run the entire test suite and no regression was detected?

NA

Merge request reports