Feature/xref optimisation (!509) · Merge requests · ensembl-gh-mirror / ensembl

Marek Szuba requested to merge feature/xref_optimisation into master Aug 26, 2020

Created by: magaliruffier

Requirements

Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- if you change the schema you must patch the test databases as well, see Updating the schema
- the PR must not fail unit testing

Description

The proposed changes simplify the stage where data is copied from the xref database into the core database.

Use case

Some of the queries run are quite heavy on the MySQL server. When multiple species are being run in parallel, these queries can get stuck on a heavily loaded server. As they deal with data that is not used in the core database, removing these steps from the pipeline makes the code simpler and less onerous to run.

Benefits

Less contention on the server for multiple running jobs.

Possible Drawbacks

Historical unmapped entries are not conserved.

Testing

Have you added/modified unit tests to test the changes? No test cases available but the pipeline was run on about 20 species with and without the proposed changes. The results between runs are comparable. The mapping stage runs faster with the changes

If so, do the tests pass/fail?

Have you run the entire test suite and no regression was detected?

Feature/xref optimisation