Changes required to support bigInteract format - ENSWEB-4751

Merged Marek Szuba requested to merge github/fork/ens-ap5/feature/bigInteract into master

Created by: ens-ap5

Description

Extend the BED/bigBed parsers to allow location data to be retrieved by column index.

Use case

UCSC have introduced another variation on bigBed format, designed for use with long-range interactions. However as part of their specification they encourage their users to set their own column names in the AutoSQL for the source and target locations, meaning that our web code cannot rely on using the default column names to fetch this data. See https://genome.ucsc.edu/goldenpath/help/interact.html

We must therefore rely on the column index alone, which is not currently supported by the bed-like parsers.

Benefits

Enables us to support this new format in the browser.

Possible Drawbacks

Perhaps not the most elegant solution, but the changes to the BED parser are mainly to keep things DRY. We need to convert UCSC chromosome names into Ensembl ones, and zero-based coordinates into Ensembl coordinates, so being able to do this for any arbitrary column saves repeating code in the bigBed parser.

Testing

I've run the bed.t and bigbed.t tests, which passed successfully. I'm not aware of any other tests that would be affected by this change since it's specific to these formats.

Merge request reports