Skip to content

ENSCORESW-3553: store RGD separately based on type of link

Marek Szuba requested to merge bugfix/RGD_source_clean into master

Created by: magaliruffier

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • if you change the schema you must patch the test databases as well, see Updating the schema
    • the PR must not fail unit testing

Description

For RGD which have a direct link to an Ensembl stable ID, use a different source than RGD which have an inferred link via RefSeq

Use case

RGD accessions can be linked to a RefSeq accession or directly to an Ensembl stable ID. Storing these links as two separate xref entries means we can treat them separately and prioritise the better link when available. Without the distinction, we can have the same accession mapped to two different stable IDs and one link is arbitrarily chosen over the other. In some cases, the chosen link is invalid and no link to RGD is kept.

Benefits

More links to RGD, more reproducible, more reliable

Possible Drawbacks

NA

Testing

Have you added/modified unit tests to test the changes? NA If so, do the tests pass/fail? NA Have you run the entire test suite and no regression was detected? NA

The xref pipeline currently fails for rat because some links become invalid after loading into the core database. With the proposed fix, the pipeline runs successfully and RGD xrefs are correctly mapped to Ensembl genes

Merge request reports