Skip to content

Add cadd parser

Marek Szuba requested to merge github/fork/at7/add_cadd_parser into master

Created by: at7

Description

I added a new parser for reading data from a CADD tabixed TSV file. CADD scores a calculated genome wide for all possible single base changes. The format is: seqname start ref alt score.

Use case

We want to read the scores by position for annotating variants. The plan is to read scores from file rather than having to store them in a database.

Benefits

We don't need to populate a very large variation_feature table (~690M rows) with more data.

Possible Drawbacks

Could be slow to annotate large regions with variants.

Testing

I added a test script which tests each method on the new module. travis build is failing due to DBD::mysql error.

Merge request reports