ENSCORESW-2792: separate mouse UCSC to avoid clashes
Created by: magaliruffier
Requirements
- Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
- Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- if you change the schema you must patch the test databases as well, see Updating the schema
- the PR must not fail unit testing
Description
Using one or more sentences, describe in detail the proposed changes. Mouse UCSC xref parsing uses a separate parser from human UCSC xrefs to avoid re-using IDs between the two sources.
Use case
Describe the problem. Please provide an example representing the motivation behind the need for having these changes in place. As we scale up the number of species, each source can be parsed by any species and the taxon id is extracted from the source. For UCSC xrefs, there is no information about the species in the original data. As a workaround, we use two separate parsers for the two species which use UCSC data.
Benefits
If applicable, describe the advantages the changes will have. We will map human xrefs to human and mouse xrefs to mouse, no cross-contamination. This will also prevent core foreign key breakages which happened before.
Possible Drawbacks
If applicable, describe any possible undesirable consequence of the changes. It will not scale well if we wanted UCSC xrefs for more species.
Testing
Have you added/modified unit tests to test the changes? The pipeline was run on human and mouse in parallel to check for any possible contamination.
If so, do the tests pass/fail? There are no CoreForeignKey HC failure, the number of xrefs obtained is comparable to previous releases and the data looks sane.
Have you run the entire test suite and no regression was detected? NA