diff --git a/web/user_doc/Cluster_tags.shtml b/web/user_doc/Cluster_tags.shtml new file mode 100755 index 0000000000000000000000000000000000000000..0e904d21266caa9570c9bd9a06b1c500e7d98e06 --- /dev/null +++ b/web/user_doc/Cluster_tags.shtml @@ -0,0 +1,174 @@ +<!--#set var="banner" value="ZMap Feature Sets and Styles"--> +<!--#include virtual="/perl/header"--> + +<!--#set var="author" value="edgrif@sanger.ac.uk" --> + +<style> +pre{ width: 95%; background-color: #DDDDDD; border-style: solid; border-width: 1px; padding: 10px } +.example{ border-color: #000000 } +</style> + + +<br/> +<fieldset> +<legend id="Cluster_tags">Clustering Features In Acedb</legend> + + +<h3>Introduction</h3> + +<p>Please see the <a href="./Tag_sets.shtml">Tag Set</a> documentation for a key to +colouring in Acedb tag examples.</p> + + +<p>For a number of reasons we wish to cluster together features for processing in various +ways. Obvious examples are: + +<ul> + <li><p>showing matching EST read pairs.</p> + + <li><p>showing which HSP's were connected by the alignment program used to find them.</p> + + <li><p>showing/processing as a unit the transcript and the matches that align to the + exons of that transcript.</p> +</ul> + +<p>Acedb has provided two main ways of clustering:</p> + +<ul> + <li><p><b>Explicit:</b> using tags in objects to "point" to other sibling/parent/child + objects, these usually make use of the XREF mechanism to automatically maintain lists + of these relationships.</p> + <li><p><b>Implicit:</b> Certain tags, e.g. Homols, EST_n are clustered by the code for certain + types of display.</p> +</ul> + +<p>With more complex analysis of genetic data a more generalised way of clustering features is +required. We need a mechanism that can specify different kinds of clustering and which can be +used for clustering both acedb objects (e.g. transcripts) and features within those +objects (e.g. homologies) and any combination of those types.</p> + +<p>The method for clustering must be able to represent different types of clustering using a single +generic "cluster" object. This document describes a set of tags for achieving this.</p> + + +<h3>The Acedb Model for EST Read Pairs</h3> + +<p>Acedb has several tags in the ?Sequence class to implement clustering of EST read pairs:</p> + + +<pre class="model_clases"> +?Sequence + Visible + Paired_read ?Sequence XREF Paired_read + Properties + cDNA + EST_5 + EST_3 + Show_in_reverse_orientation +</pre> + +<p>The meaning of these tags is:</p> + +<ul> + <li><b>Paired_read:</b> used to cross reference to the matching read sequence object (arguably + the tag should be followed by UNIQUE to ensure there were only two objects). + <li><b>EST_5 or EST_3:</b> indicates which object is 5' and which 3' + <li><b>Show_in_reverse_orientation:</b> Usually the 3' read is sequenced in reverse, this + flag signals that the object should be reverse complemented for display. +</ul> + +<p>This system is tailored to EST read pairs but we can adapt it to a more general model as detailed +in the following section.</p> + + + + +<h3>Tag Sets for Clustering</h3> + +<p>The new tags will use some of the existing acedb tags, modify others and introduce +some new ones.</p> + + +<h4>EST tags</h4> + +<pre class="model_clases"> +?Sequence + Properties + cDNA + EST_5 + EST_3 +</pre> + + +<p>There are two choices with these tags:</p> + +<ol> + <li><b>Make them acedb "Special" tags:</b> Currently acedb code checks that read pairs are comprised of + EST_5 and EST_3 pairs, not necessarily a good thing..... + <li><b>Use them as normal tags:</b> remove the "specialness" of these tags and have them simply as information + for the annotator, i.e. the code does not check them at all. +</ol> + +<p>If they are to be special tags then it could be argued that they should be embedded +in the Cluster_pair tags as they would have semantics only in that context. If they are not +special tags then they do not need to be changed.</p> + +<p>I propose they stay asis.</p> + + +<h4>Show_in_reverse_orientation tag</h4> + +<pre class="model_clases"> +?Sequence + Properties + Show_in_reverse_orientation +</pre> + +<p>This tag is a separate issue from the Cluster tag but is included because it has an +important role in the display of read pairs in that it is usual that the 3' end is +sequenced in reverse and so needs to be reverse complemented before display. +We may find there are other display types we would like to apply to Clustered features +in which case we could augment this with other tags.</p> + +<p>I propose this tag stays asis.</p> + + +<h4>Cluster tags</h4> + +<p>In Acedb the Read_pair tag was used to cluster pairs of sequence objects, this should +be replaced with a more general model, the following examples use the "Tag2" convention +to allow arbitrary addition of different kinds of object to the tag set. + + +<p>To cluster pairs of objects:</p> + +<pre class="model_classes"> + <font color=red>Cluster_Pair</font> <font color=green>Seq_pair</font> <font color=red>UNIQUE</font> <font color=green>?Sequence</font> <font color=red>XREF</font> <font color=green>Seq_pair</font> + <font color=green>XX_type</font> <font color=red>UNIQUE</font> <font color=green>?XXX</font> <font color=red>XREF</font> <font color=green>XXX_type</font> +</pre> + +<p>This definition ensures that objects are paired.</p> + + + +<p>To cluster objects into sets/trees:</p> + +<pre class="model_classes"> + <font color=red>Cluster_Tree Parent UNIQUE</font> <font color=green>Seq_parent</font> <font color=red>UNIQUE</font> <font color=green>?Sequence</font> <font color=red>XREF</font> <font color=green>Seq_children</font> + <font color=green>XXX_parent</font> <font color=red>UNIQUE</font> <font color=green>?XXX</font> <font color=red>XREF</font> <font color=green>XXX_children</font> + <font color=red>Children</font> <font color=green>Seq_children ?Sequence</font> <font color=red>XREF</font> <font color=green>Seq_parent</font> + <font color=green>XXX_children ?XXX</font> <font color=red>XREF</font> <font color=green>XXX_parent</font> +</pre> + +<p>The tags impose the following rules:</p> + +<ul> + <li><p>There can be an arbitrary depth of child/parent clustering.</p> + <li><p>The child/parent tree is a DAG with single parents only.</p> +</ul> + + +</fieldset> + + +<!--#include virtual="/perl/footer"--> diff --git a/web/user_doc/Tag_sets.shtml b/web/user_doc/Tag_sets.shtml new file mode 100755 index 0000000000000000000000000000000000000000..3f7fde170d4c11e7644be60180b69e3f6db49a04 --- /dev/null +++ b/web/user_doc/Tag_sets.shtml @@ -0,0 +1,54 @@ +<!--#set var="banner" value="ZMap Feature Sets and Styles"--> +<!--#include virtual="/perl/header"--> + +<!--#set var="author" value="edgrif@sanger.ac.uk" --> + +<style> +pre{ width: 95%; background-color: #DDDDDD; border-style: solid; border-width: 1px; padding: 10px } +.example{ border-color: #000000 } +</style> + + +<br/> +<fieldset> +<legend id="Tag_sets">Tag Sets In Acedb</legend> + + +<H3>Tag Sets</H3> + +<P>A "tag set" is a set of tags and data that occur in a defined order and can be processed +by acedb code regardless of the class they appear in. +These tag sets are colour coded in this document to help +identify the significant parts of the tag set: + +<P><pre><code><font color=red>feature_tag</font> <font color=green>[anonymous tag and object reference]</font> <font color=purple>[feature specific tags and data]</font> +</pre></code> + +<P>Where: + +<P><font color=red>feature_tag</font> is the tag that the code searches for and locates +on to find out what sort of feature it is processing. This tag must be specified <b>exactly</b> +as given in these examples. + +<P><font color=green>anonymous tag and object reference</font> are sometimes included +to allow insertion in to the tag set of object references of arbitrary class (this is also known +as the "tag2 system"). Although this anonymous tag must be present, it's value is not read +by the code and so it can have any value. Similarly while the anonymous object reference +must be present, the class of the object is not used by the code and so it can be any class.</p> + +<P><font color=purple>feature specific tags and data</font> are specific tags and data that follow +a particular feature_tag and must be in the order and of the type specified in the tag set description.</p> + +<P>Some examples: + +<P><pre><code><font color=red>Source_Exons</font><font color=purple> Int UNIQUE Int</font> + +<font color=red>Homol</font> <font color=green>DNA_homol ?Sequence</font> <font color=red>XREF</font> <font color=green>DNA_homol</font> <font color=purple>?Method Float Int Int Int Int #Homol_info</font> +</pre></code> + + + +</fieldset> + + +<!--#include virtual="/perl/footer"-->