Skip to content

Non-coding transcripts thick start/end should equal transcript start.

Created by: andrewyatz

Description

When encoding non-coding transcripts, the thickStart and thickEnd should be the same value as start. In the code we did a decrement on the thickStart. This code removes that line meaning transcripts are serialised correctly. Added a test case to cover this from circ core.

Use case

Currently the BEDSerialiser code can convert a Transcript into the genePred format. Part of this is the normal 12 bed columns, which encodes for a transcript including cds start and end. When working with a non-coding BED file this should be set to the same value as the transcript's start. If not it will fail being indexed/accessed by UCSC tools.

Benefits

  • Correct encoding of non-coding transcripts (see below for an example from UCSC table browser)
chr7	140435315	140435787	ENST00000489972.1	0	-	140435315	140435315	0	1	472,	0,

Possible Drawbacks

  • External code may already compensate for this so we may get users complaining about this

Testing

  • Code has been tested
  • Tests pass
  • Attempted to run the entire suite but cannot due to errors I do not understand why they're appearing. I think the version of MySQL on my Mac disagrees with the test suite atmo.

Merge request reports