Skip to content
Snippets Groups Projects
Commit 3e9adeac authored by Ewan Birney's avatar Ewan Birney
Browse files

made documentation more contig based

parent a578038e
No related branches found
No related tags found
No related merge requests found
Ensembl is a software system to automatically analyses and maintain
the analysis on genomic DNA. It is entirely open source (BSD style
license) and is run openly from http://ensembl.ebi.ac.uk/. Please go
license) and is run openly from http://www.ensembl.org/. Please go
the web site for the most up to date documentation, code and
discussion.
......@@ -49,9 +49,15 @@ objects are as follows:
Bio::EnsEMBL::DBSQL::Obj - The Ensembl database object
Bio::EnsEMBL::DBSQL::Clone - A Clone (sequencing unit)
Bio::EnsEMBL::DBSQL::Contig - A Contig, being continuous DNA sequenced together.
A Clone has a number of contigs: finished clones
have one Contig, working draft clones have more than one.
Bio::EnsEMBL::DBSQL::RawContig -
A Contig, being continuous DNA sequenced together.
A Clone has a number of contigs: finished clones
have one Contig, working draft clones have more than one.
Bio::EnsEMBL::DB::VirtualContig - A in-software contig made from a number
of RawContigs put together due to overlap or sized gapped
information
NB. We call contigs of DNA which span across a number of clones
"CloneContigs". The Contig object above refers to what comes out of
......@@ -64,19 +70,20 @@ the assembly process in working draft data.
The gene objects can hold the following cases:
a) Gene structures across pieces of DNA sequence where the interveaning sequence is
not known (nor do we have to assumme a length to them)
a) Gene structures across pieces of DNA sequence where the
interveaning sequence is not known (nor do we have to assumme a length
to them)
b) Alternative transcripts of a Gene. A distinction is made between alternative
transcripts producing unique cDNAs and alternative protein coding products. In other
words, a Gene could produce 5 unique cDNA structures, but only 3 unique protein
structures, with 3 of the 5 cDNAs making the same protein but differing in their
UTRs.
b) Alternative transcripts of a Gene. A distinction is made between
alternative transcripts producing unique cDNAs and alternative protein
coding products. In other words, a Gene could produce 5 unique cDNA
structures, but only 3 unique protein structures, with 3 of the 5
cDNAs making the same protein but differing in their UTRs.
Ensembl also reuses a number of Bioperl objects, in particular
Bio::Seq - Sequence object
Bio::AnnSeq - Annotated sequence object
Bio::PrimarySeq - Light weight, "just the sequence" object
Bio::SeqFeature::Generic - Generic seq feature base class
Bio::SeqFeature::Homol - seq feature class representing a similarity hit.
......@@ -109,22 +116,31 @@ Bio::EnsEMBL::DBSQL::Clone - A Clone (sequencing unit)
@contigs = $clone->get_all_Contigs(); - All contigs in a clone
$version = $clone->version(); - Version of the clone, from Ensembl's perspective
$version = $clone->embl_version(); - Version of the data in the clone
$seq = $clone->seq - Bio::Seq of DNA data, with N's between contigs
$annseq = $clone->get_AnnSeq(); - Bio::AnnSeq, with Genes attached on Clone
Able to dump the clone in EMBL/GenBank format
Bio::EnsEMBL::DBSQL::Contig - A Contig (contingous DNA in one sequencing unit)
-----------------------------------------------------------------------------
Available on all contigs (whether Raw or Virtual)
-------------------------------------------------
@genes = $contig->get_all_Genes() - Gets all the genes attached to this contig
$seq = $contig->seq() - The Bio::Seq of this contig
@features = $contig->get_all_SeqFeatures() - All the computed sequence features for this
contig
$length = $contig->length(); - Length of the contig (far better than $seq->seq_len)
$order = $contig->order(); - Which contig order this is thought to be
$strand = $contig->orientation(); - Which strand this contig is on
$offset = $contig->offset(); - Offset of the contig
$length = $contig->length(); - Length of the contig
Contigs also inheriet from Bio::SeqI this means that the following
methods work:
@sf = $contig->top_SeqFeatures(); # genes map to virtual genes
$seq = $contig->seq(); # sequence as a string
$seq = $contig->subseq(100,200); # sequence as a string
They can also be used to provide EMBL/GenBank flat files
$seqio = Bio::SeqIO->new('-format' => 'EMBL', -fh => \*STDOUT);
$seqio->write_seq($contig);
They can also provide extensions to the left and right
# produces a new contig 1000 base pairs to the 3' of this contig
$newcontig = $contig->extend(1000,-1000);
Bio::EnsEMBL::Gene - A gene structure
-------------------------------------
......@@ -153,5 +169,14 @@ Methods inherieted from Bio::SeqFeature::Generic
$start = $exon->start() - start position in bio coordinates.
$end = $exon->end() - end position in bio coordinates
$strand = $exon->strand() -
$strand = $exon->strand() - 1 or -1
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment