Non-coding transcripts thick start/end should equal transcript start. (!362) · Merge requests · ensembl-gh-mirror / ensembl

Marek Szuba requested to merge github/fork/andrewyatz/bugfix/bedserialiser_noncoding into master Feb 11, 2019

Created by: andrewyatz

Description

When encoding non-coding transcripts, the thickStart and thickEnd should be the same value as start. In the code we did a decrement on the thickStart. This code removes that line meaning transcripts are serialised correctly. Added a test case to cover this from circ core.

Use case

Currently the BEDSerialiser code can convert a Transcript into the genePred format. Part of this is the normal 12 bed columns, which encodes for a transcript including cds start and end. When working with a non-coding BED file this should be set to the same value as the transcript's start. If not it will fail being indexed/accessed by UCSC tools.

Benefits

Correct encoding of non-coding transcripts (see below for an example from UCSC table browser)

chr7	140435315	140435787	ENST00000489972.1	0	-	140435315	140435315	0	1	472,	0,

Possible Drawbacks

External code may already compensate for this so we may get users complaining about this

Testing

Code has been tested
Tests pass
Attempted to run the entire suite but cannot due to errors I do not understand why they're appearing. I think the version of MySQL on my Mac disagrees with the test suite atmo.

Non-coding transcripts thick start/end should equal transcript start.

Description

Use case

Benefits

Possible Drawbacks

Testing

Merge request reports