ENSCORESW-3553: store RGD separately based on type of link
Created by: magaliruffier
Requirements
- Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
- Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- if you change the schema you must patch the test databases as well, see Updating the schema
- the PR must not fail unit testing
Description
For RGD which have a direct link to an Ensembl stable ID, use a different source than RGD which have an inferred link via RefSeq
Use case
RGD accessions can be linked to a RefSeq accession or directly to an Ensembl stable ID. Storing these links as two separate xref entries means we can treat them separately and prioritise the better link when available. Without the distinction, we can have the same accession mapped to two different stable IDs and one link is arbitrarily chosen over the other. In some cases, the chosen link is invalid and no link to RGD is kept.
Benefits
More links to RGD, more reproducible, more reliable
Possible Drawbacks
NA
Testing
Have you added/modified unit tests to test the changes? NA If so, do the tests pass/fail? NA Have you run the entire test suite and no regression was detected? NA
The xref pipeline currently fails for rat because some links become invalid after loading into the core database. With the proposed fix, the pipeline runs successfully and RGD xrefs are correctly mapped to Ensembl genes