Skip to content

EntrezGene/HPA parsers update

Marek Szuba requested to merge avullo_xref_sprint into feature/xref_sprint

Created by: avullo


  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • if you change the schema you must patch the test databases as well, see Updating the schema
    • the PR must not fail unit testing


Refactoring the parsers to consider:

  • consistent error handling
  • code compression, clarity
  • NULL fields where applicable without touching BaseParser at the moment, i.e. forcing NULL description when adding xrefs in HPA parser

Use case

Xref pipeline for species with EntrezGene/HPA sources


Code quality improvement

Possible Drawbacks

According to the guidelines, not fully there yet. Need to change the BaseParser and schema to force NULL info_text instead of empty string.


No unit tests at the moment I'm afraid. Run the xref_parser script with the current version and proposed update and found no difference except attribute description is now NULL for xrefs from HPA (formerly '').

Merge request reports