Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • E ensembl-rest
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Jira
    • Jira
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ensembl-gh-mirror
  • ensembl-rest
  • Merge requests
  • !293

sequence/proteome/:species GET endpoint for whole proteome download

  • Review changes

  • Download
  • Patches
  • Plain diff
Closed Marek Szuba requested to merge github/fork/vsitnik/vb_proteome_download into master Aug 26, 2018
  • Overview 11
  • Commits 4
  • Pipelines 0
  • Changes 86

Created by: vsitnik

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • the PR must not fail unit testing
    • if you're adding/updating documentation of an endpoint, make sure you add/update the necessary parameters to the (template) configuration files in the ensembl-rest_private repo

Description

New endpoint allows downloading all protein sequences for the specified species. Only species having 'true' meta.proteome_download_allowed in the core databases will be affected. For others this feature will be forbidden.

Use case

Will be use by uniprot to download protein fastas from vectorbase.org. wget --header='Content-type:text/x-fasta' 'http://127.0.0.1:34274/sequence/proteome/Anopheles atroparvus?' -O - | gzip - > Aatr.prot.fasta.gz

Benefits

The endpoint allows to download all 'canonical' protein sequences for Anopheles atroparvus in 2 minutes 25 seconds instead of approximately 3 hours when using current approach.

Possible Drawbacks

Still slow. Won't be appropriate for a large genomes, probably. Thus, setting meta.proteome_download_allowed should be done with cautious. seq_regions should have proper 'coding_cnt' and 'toplevel' attributes set.

Testing

t/sequences.t updated to test the new endpoint behaviour. No regression was seen for the affected features.

VectorBase prod db anopheles_atroparvus_core_1810_93_3 was used fot the performance testing.

Changelog

It's a new endpoint, which allows whole proteome fasta downloads.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/vsitnik/vb_proteome_download