Skip to content

Compara performance improvements

Created by: muffato

There are three categories of improvements:

  1. Number of queries. I have added several helper methods in the Compara API that call fetch_all_by_dbID_list(). This is as you imagine much faster than calling fetch_by_dbID() repeatedly (PhyloXML export of a big tree down from 30 sec to 6 sec) and the helper methods are automatically called whenever needed in the Compara API. That's why I could remove a number a preload() call from ensembl-rest. Another consequence is that it now takes a constant number of queries to fetch the data for the gene-tree endpoints (used to depend on the number of taxa and chromosomes) and the homology endpoints (regardless of the number of actual homologies). This is essentially all done automatically in the Compara API

  2. Memory leaks. I've simply called each endpoint 1000 times and found a number of places where the gene-trees were not released (they are circular dependencies in this part of the Compara API)

  3. Automatic population of the Compara's FullCache adaptors in the Registry preload() method

Other changes:

  1. I have moved some code to the Compara API, so that it can be used outside of the REST context.
  2. I have added a "subtree_node_id" parameter to the gene-tree endpoints. This is needed by a new web view that shows the tree+multiple alignment of a subtree

There are no format changes, so no version upgrade to request

Merge request reports