Skip to content

sequence/proteome/:species GET endpoint added

Marek Szuba requested to merge github/fork/vsitnik/vb_proteome_download into master

Created by: vsitnik

sequence/proteome/:species GET endpoint added, allowing bulk/whole proteome download.

core.meta.proteome_download_allowed used as guard for vb_proteome_download.

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • the PR must not fail unit testing
    • if you're adding/updating documentation of an endpoint, make sure you add/update the necessary parameters to the (template) configuration files in the ensembl-rest_private repo

Description

New endpoint allows downloading all protein sequences for the specified species. Only species having 'true' meta.proteome_download_allowed in the core databases will be affected. For others this feature will be forbidden.

Use case

Benefits

Will be use by uniprot to download protein fastas from vectorbase.org. I.e. allows to download all 'canonical' protein sequences for Anopheles atroparvus in 2 minutes 36 seconds instead of approximately 3 hours when using current approach.

Possible Drawbacks

Still slow. Won't be appropriate for a large genomes, probably. Thus, setting meta.proteome_download_allowed should be done with cautious.

Testing

Have you added/modified unit tests to test the changes?

If so, do the tests pass/fail?

Have you run the entire test suite and no regression was detected?

Changelog

sequence/proteome/:species allows whole proteome downloads, only for subset of species.

Merge request reports