Update to annotation provider in EMBL/GenBank files
Created by: james-monkeyshines
Description
We used to have a meta_key, provider.name
(and an associated provider.url
), but this did not allow us to capture the fact that sometimes the assembly needs to be credited to one provider, and the gene annotation to another. Vertebrate databases used the field to indicate the annotation provider, and non-vertebrates used it for the assembly provider. To better represent these use cases, provider.name
was replaced by two new meta_keys, assembly.provider_name
and annotation.provider_name
(https://www.ebi.ac.uk/panda/jira/browse/ENSINT-361).
In the comments of EMBL and GenBank files, the provider.name
was formerly used to indicate the source of the annotation; this was replaced by assembly.provider_name
in a previous PR (#506), but it would be more accurate to use annotation.provider_name
, and only if that is undefined, fall back to assembly.provider_name
.
Further, since an annotation can have multiple providers, it is good to list them all, rather than select one to include in the comments.
Use case
For vertebrates, this change does not make much difference, because assembly.provider_name
is typically not defined, so the code falls back to the generic 'Ensembl' in any case. But for non-vertebrates, which are typically annotated by non-Ensembl groups, this allows for more accurate attribution.
Benefits
Better attribution for annotation in ftp files.
Possible Drawbacks
None I can think of.
Testing
Have you added/modified unit tests to test the changes? Yes
If so, do the tests pass/fail? Pass
Have you run the entire test suite and no regression was detected? Yes