Skip to content

RefSeqGPFFParser update

Marek Szuba requested to merge xref_RefSeqGPFFParser into feature/xref_sprint

Created by: tgrego

Description

Update to the RefSeqGPFFParserr as part of the efforts of the xref sprint. See ENSCORESW-2898. Genbank parser from ensembl-io is now used to parse the source files instead of a custom parser. This will require https://github.com/Ensembl/ensembl-io/pull/69 to be merged. There are a few differences introduced from the original parser. For instance, only refseq ids with the prefixes defined in $refseq_sources are considered. Previously for peptide files all other possible types were considered and treated as RefSeq_peptide.

Testing

No unit tests. Tested with subset of rat, however related xrefs were absent. Ongoing testing with full dataset.

Merge request reports