Update to annotation provider in EMBL/GenBank files
Created by: james-monkeyshines
We used to have a meta_key,
provider.name (and an associated
provider.url), but this did not allow us to capture the fact that sometimes the assembly needs to be credited to one provider, and the gene annotation to another. Vertebrate databases used the field to indicate the annotation provider, and non-vertebrates used it for the assembly provider. To better represent these use cases,
provider.name was replaced by two new meta_keys,
In the comments of EMBL and GenBank files, the
provider.name was formerly used to indicate the source of the annotation; this was replaced by
assembly.provider_name in a previous PR (#506), but it would be more accurate to use
annotation.provider_name, and only if that is undefined, fall back to
Further, since an annotation can have multiple providers, it is good to list them all, rather than select one to include in the comments.
For vertebrates, this change does not make much difference, because
assembly.provider_name is typically not defined, so the code falls back to the generic 'Ensembl' in any case. But for non-vertebrates, which are typically annotated by non-Ensembl groups, this allows for more accurate attribution.
Better attribution for annotation in ftp files.
None I can think of.
Have you added/modified unit tests to test the changes? Yes
If so, do the tests pass/fail? Pass
Have you run the entire test suite and no regression was detected? Yes