first versions

f092ffe8 · edgrif · 7584342a · f092ffe8 · f092ffe8
Commit f092ffe8 authored 15 years ago by edgrif
--- a/web/user_doc/Cluster_tags.shtml
+++ b/web/user_doc/Cluster_tags.shtml
+<!--#set var="banner" value="ZMap Feature Sets and Styles"-->
+<!--#include virtual="/perl/header"-->
+
+<!--#set var="author" value="edgrif@sanger.ac.uk" -->
+
+<style>
+pre{ width: 95%; background-color: #DDDDDD; border-style: solid; border-width: 1px; padding: 10px }
+.example{ border-color: #000000 }
+</style>
+
+
+<br/>
+<fieldset>
+<legend id="Cluster_tags">Clustering Features In Acedb</legend>
+
+
+<h3>Introduction</h3>
+
+<p>Please see the <a href="./Tag_sets.shtml">Tag Set</a> documentation for a key to
+colouring in Acedb tag examples.</p>
+
+
+<p>For a number of reasons we wish to cluster together features for processing in various
+ways. Obvious examples are:
+
+<ul>
+  <li><p>showing matching EST read pairs.</p>
+
+  <li><p>showing which HSP's were connected by the alignment program used to find them.</p>
+
+  <li><p>showing/processing as a unit the transcript and the matches that align to the
+      exons of that transcript.</p>
+</ul>
+
+<p>Acedb has provided two main ways of clustering:</p>
+
+<ul>
+  <li><p><b>Explicit:</b> using tags in objects to "point" to other sibling/parent/child
+      objects, these usually make use of the XREF mechanism to automatically maintain lists
+      of these relationships.</p>
+  <li><p><b>Implicit:</b> Certain tags, e.g. Homols, EST_n are clustered by the code for certain
+      types of display.</p>
+</ul>
+
+<p>With more complex analysis of genetic data a more generalised way of clustering features is
+required. We need a mechanism that can specify different kinds of clustering and which can be
+used for clustering both acedb objects (e.g. transcripts) and features within those
+objects (e.g. homologies) and any combination of those types.</p>
+
+<p>The method for clustering must be able to represent different types of clustering using a single
+generic "cluster" object. This document describes a set of tags for achieving this.</p>
+
+
+<h3>The Acedb Model for EST Read Pairs</h3>
+
+<p>Acedb has several tags in the ?Sequence class to implement clustering of EST read pairs:</p>
+
+
+<pre class="model_clases">
+?Sequence
+	  Visible
+		  Paired_read ?Sequence XREF Paired_read 
+	  Properties
+			cDNA
+			     EST_5
+			     EST_3
+			Show_in_reverse_orientation
+</pre>
+
+<p>The meaning of these tags is:</p>
+
+<ul>
+  <li><b>Paired_read:</b> used to cross reference to the matching read sequence object (arguably
+      the tag should be followed by UNIQUE to ensure there were only two objects).
+  <li><b>EST_5 or EST_3:</b> indicates which object is 5' and which 3'
+  <li><b>Show_in_reverse_orientation:</b> Usually the 3' read is sequenced in reverse, this
+      flag signals that the object should be reverse complemented for display.
+</ul>
+
+<p>This system is tailored to EST read pairs but we can adapt it to a more general model as detailed
+in the following section.</p>
+
+
+
+
+<h3>Tag Sets for Clustering</h3>
+
+<p>The new tags will use some of the existing acedb tags, modify others and introduce
+some new ones.</p>
+
+
+<h4>EST tags</h4>
+
+<pre class="model_clases">
+?Sequence
+	  Properties
+			cDNA
+			     EST_5
+			     EST_3
+</pre>
+
+
+<p>There are two choices with these tags:</p>
+
+<ol>
+  <li><b>Make them acedb "Special" tags:</b> Currently acedb code checks that read pairs are comprised of
+      EST_5 and EST_3 pairs, not necessarily a good thing.....
+  <li><b>Use them as normal tags:</b> remove the "specialness" of these tags and have them simply as information
+      for the annotator, i.e. the code does not check them at all.
+</ol>
+
+<p>If they are to be special tags then it could be argued that they should be embedded
+in the Cluster_pair tags as they would have semantics only in that context. If they are not
+special tags then they do not need to be changed.</p>
+
+<p>I propose they stay asis.</p>
+
+
+<h4>Show_in_reverse_orientation tag</h4>
+
+<pre class="model_clases">
+?Sequence
+	  Properties
+			Show_in_reverse_orientation
+</pre>
+
+<p>This tag is a separate issue from the Cluster tag but is included because it has an
+important role in the display of read pairs in that it is usual that the 3' end is
+sequenced in reverse and so needs to be reverse complemented before display.
+We may find there are other display types we would like to apply to Clustered features
+in which case we could augment this with other tags.</p>
+
+<p>I propose this tag stays asis.</p>
+
+
+<h4>Cluster tags</h4>
+
+<p>In Acedb the Read_pair tag was used to cluster pairs of sequence objects, this should
+be replaced with a more general model, the following examples use the "Tag2" convention
+to allow arbitrary addition of different kinds of object to the tag set.
+
+
+<p>To cluster pairs of objects:</p>
+
+<pre class="model_classes">
+               <font color=red>Cluster_Pair</font> <font color=green>Seq_pair</font> <font color=red>UNIQUE</font> <font color=green>?Sequence</font> <font color=red>XREF</font> <font color=green>Seq_pair</font>
+                            <font color=green>XX_type</font>  <font color=red>UNIQUE</font> <font color=green>?XXX</font>      <font color=red>XREF</font> <font color=green>XXX_type</font>
+</pre>
+
+<p>This definition ensures that objects are paired.</p>
+
+
+
+<p>To cluster objects into sets/trees:</p>
+
+<pre class="model_classes">
+               <font color=red>Cluster_Tree Parent UNIQUE</font> <font color=green>Seq_parent</font> <font color=red>UNIQUE</font> <font color=green>?Sequence</font>  <font color=red>XREF</font> <font color=green>Seq_children</font>
+                                          <font color=green>XXX_parent</font> <font color=red>UNIQUE</font> <font color=green>?XXX</font>       <font color=red>XREF</font> <font color=green>XXX_children</font>
+                            <font color=red>Children</font> <font color=green>Seq_children ?Sequence</font> <font color=red>XREF</font> <font color=green>Seq_parent</font>
+                                     <font color=green>XXX_children ?XXX</font>      <font color=red>XREF</font> <font color=green>XXX_parent</font>
+</pre>
+
+<p>The tags impose the following rules:</p>
+
+<ul>
+  <li><p>There can be an arbitrary depth of child/parent clustering.</p>
+  <li><p>The child/parent tree is a DAG with single parents only.</p>
+</ul>
+
+
+</fieldset>
+
+
+<!--#include virtual="/perl/footer"-->
--- a/web/user_doc/Tag_sets.shtml
+++ b/web/user_doc/Tag_sets.shtml
+<!--#set var="banner" value="ZMap Feature Sets and Styles"-->
+<!--#include virtual="/perl/header"-->
+
+<!--#set var="author" value="edgrif@sanger.ac.uk" -->
+
+<style>
+pre{ width: 95%; background-color: #DDDDDD; border-style: solid; border-width: 1px; padding: 10px }
+.example{ border-color: #000000 }
+</style>
+
+
+<br/>
+<fieldset>
+<legend id="Tag_sets">Tag Sets In Acedb</legend>
+
+
+<H3>Tag Sets</H3>
+
+<P>A "tag set" is a set of tags and data that occur in a defined order and can be processed
+by acedb code regardless of the class they appear in.
+These tag sets are colour coded in this document to help
+identify the significant parts of the tag set:
+
+<P><pre><code><font color=red>feature_tag</font> <font color=green>[anonymous tag and object reference]</font> <font color=purple>[feature specific tags and data]</font>
+</pre></code>
+
+<P>Where:
+
+<P><font color=red>feature_tag</font> is the tag that the code searches for and locates
+on to find out what sort of feature it is processing. This tag must be specified <b>exactly</b>
+as given in these examples.
+
+<P><font color=green>anonymous tag and object reference</font> are sometimes included
+to allow insertion in to the tag set of object references of arbitrary class (this is also known
+as the "tag2 system"). Although this anonymous tag must be present, it's value is not read
+by the code and so it can have any value. Similarly while the anonymous object reference
+must be present, the class of the object is not used by the code and so it can be any class.</p>
+
+<P><font color=purple>feature specific tags and data</font> are specific tags and data that follow
+a particular feature_tag and must be in the order and of the type specified in the tag set description.</p>
+
+<P>Some examples:
+
+<P><pre><code><font color=red>Source_Exons</font><font color=purple> Int UNIQUE Int</font>
+
+<font color=red>Homol</font> <font color=green>DNA_homol ?Sequence</font> <font color=red>XREF</font> <font color=green>DNA_homol</font> <font color=purple>?Method Float Int Int Int Int #Homol_info</font>
+</pre></code>
+
+
+
+</fieldset>
+
+
+<!--#include virtual="/perl/footer"-->