Bugfix/delint Mim2Gene parser

Merged Marek Szuba requested to merge bugfix/delint_Mim2GeneParser into feature/xref_sprint

Created by: mkszuba

Warning: as of 2018-10-30, this PR is expected to fail Travis builds due to being dependent on #323 and #324 .

Description

Fixes bugs observed (so far) in Mim2GeneParser during the xref sprint, implements support for direct MIM xrefs, add additional checks, and delint the code to facilitate further refactoring. See ENSCORESW-2891.

Use case

Part of the efforts to improve the xref pipeline. Moreover, one of the observed bugs actually prevents current versions of mim2gene data from being parsed at all.

Benefits

The parser can now handle recent versions of mim2gene input. Direct xrefs are now produced wherever possible, with dependent ones only used for entries lacking Ensembl ID but with EntrezGene ID present. Use BaseParser methods for inserting dependent xrefs into the database, which in addition to avoiding hand-rolled DBI code will, once pull request #314 has been approved, prevent Mim2GeneParser from inserting duplicate entries upon re-runs with the same input. Some future-proofing. Code (hopefully) easier to maintain. Most complaints of PerlCritic levels 3 and 2 taken care of. Use a standardised rather than hand-rolled CSV parser, with potential for a performance increase if compiled rather than native-Perl version of the parser is used.

Possible Drawbacks

Output is less straightforward than it used to be because it now includes both direct (the vast majority) and dependent (around 10 percent as of today) xrefs.

Testing

Have you added/modified unit tests to test the changes? No.

If so, do the tests pass/fail? N/A

Have you run the entire test suite and no regression was detected? N/A. However, I have run the parser itself on both current DBASS data and some intentionally malformed input, and it appears to work correctly.

Merge request reports