Created by: vsitnik
- Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
- Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- the PR must not fail unit testing
- if you're adding/updating documentation of an endpoint, make sure you add/update the necessary parameters to the (template) configuration files in the ensembl-rest_private repo
New endpoint allows downloading all protein sequences for the specified species. Only species having 'true' meta.proteome_download_allowed in the core databases will be affected. For others this feature will be forbidden.
Will be use by uniprot to download protein fastas from vectorbase.org.
wget --header='Content-type:text/x-fasta' 'http://127.0.0.1:34274/sequence/proteome/Anopheles atroparvus?' -O - | gzip - > Aatr.prot.fasta.gz
The endpoint allows to download all 'canonical' protein sequences for Anopheles atroparvus in 2 minutes 25 seconds instead of approximately 3 hours when using current approach.
Still slow. Won't be appropriate for a large genomes, probably. Thus, setting meta.proteome_download_allowed should be done with cautious. seq_regions should have proper 'coding_cnt' and 'toplevel' attributes set.
t/sequences.t updated to test the new endpoint behaviour.
No regression was seen for the affected features.
VectorBase prod db anopheles_atroparvus_core_1810_93_3 was used fot the performance testing.
It's a new endpoint, which allows whole proteome fasta downloads.