first versions

f04135a4 · edgrif · ee5173df · f04135a4 · f04135a4
Commit f04135a4 authored 15 years ago by edgrif
--- a/doc/Project_notes/Tools_meeting.txt
+++ b/doc/Project_notes/Tools_meeting.txt
+
+====================================================================
+Annotation Tools Meeting - 10th Sept 2009
+
+
+--------------------------------------------------------------------
+General Tools changes
+
+
+Tool monitoring
+---------------
+All the annotation tools have options that are probably either unused
+or only rarely used. We need to monitor which options are used and
+use this information to better tailor menus, shortcuts etc.
+
+Along with this is would be good to provide tools to allow annotators
+to record which activities are repetitive and/or time consuming.
+
+
+
+Shortcuts
+---------
+There are some widely accepted standards that we should support 
+e.g. "Close window" is usually Cntl-W, but we should also ensure
+shortcuts are common across our own tools.
+
+
+
+--------------------------------------------------------------------
+ZMap
+
+
+Show mismatches
+---------------
+Request for zmap to add bars (or other indicators) to boxes 
+representing matches to show where there are nucleotide mismatches.
+This is problematical because we would need to compare the actual
+sequences and it is not workable for zmap to store all the match
+sequences in order to do this. BUT we could add % identity to the
+data displayed or indicate in some other way the goodness of the
+match.
+
+
+Show raw sequences
+------------------
+A recurrent request from annotators is that they wish to see the
+raw match sequences, this was doable for previous sequence sources
+(just about) but will not be possible for sources from the new
+sequencing machines because of the sheer volume of data.
+
+We need to think about a different way to display this data, much
+of it is repetitive (i.e. redundant) and it feels like there should
+be some way to show this to the annotator in as compressed but
+still informative way.
+
+
+Navigation
+----------
+We still have more we could do, e.g. shortcuts to jump up and down
+exons of a selected transcript, split windows for transcript
+viewing in a useful way etc. etc.
+
+
+--------------------------------------------------------------------
+Blixem
+
+
+Navigation
+----------
+Blixem-ZMap: commonly annotators wish to move backwards and forwards
+between the two programs to view a particular alignment(s). ZMap
+already has inter-program communication but this needs adding to
+blixem to allow the user to click on an alignment and then move
+to it in blixem and vice versa.
+
+
+Match Sequence zooming/scrolling
+--------------------------------
+Currently the font size for the actual match sequences is fixed which
+limits the length of sequence that can be viewed. By introducing text
+display in one or two other sizes the text could in effect be zoomed
+to allow the user to see more or less sequence.
+
+Several uses would also like there to scroll bars with more continuous
+and smooth scrolling than there is now.
+
+
+Match Sequence Ends Display
+---------------------------
+Currently blixem only displays the run of bases from the match string
+for the alignment block, annotators would find it useful to be able
+to see the leading and trailing bases as well to help with verifying
+splice sites. This requires:
+
+- setting the number of bases displayed via a config file option.
+
+- visual indication of whether extra bases are unmatched or match a
+previous/subsequent block.
+
+
+Exons in the navigation window
+------------------------------
+Several improvements are required:
+
+- provide some kind of "bumping" mechanism to allow users to see
+overlapping transcripts more clearly.
+
+- add labels to exons to identify their feature.
+
+- allow users to select which types of exons are displayed, could make
+this interactive, i.e. load several types and then allow user to filter.
+
+
+Matches in navigation window
+----------------------------
+Currently an overview of the forward/reverse strand match alignments
+displayed in graph form with the y axis for both running from 0 to
+100% Identity. This usually wastes considerable screen space as most
+matches lie in the 90 to 100% range. The range shown should be
+calculated dynamically so that the space allocated is minimal.
+
+
+Alignment Choosing
+------------------
+Annotators would like to be able to choose which matches are shown in 
+blixem:
+
+- single match (already done)
+
+- all highlighted matches
+
+- all matches within the marked region
+
+- several different kinds (columns) of matches
+
+
+Exonerate type matches
+----------------------
+Exon-aware aligners the produce one extended alignment where previously
+there were a number of independant HSP's. Currently blixem orders the
+match sequences according to properties of just those match sequences
+that can be seen in the lower window. This will need to change in at
+least two ways:
+
+- score information will need to be added in addition to %id
+
+- the overall %id or score of the complete alignment will need to be used
+for ordering by score, not just the score for that individual block.
+
+- the display of the main blocks for these alignments will need to 
+have a visual indication that the block forms part of a larger
+prediction.
+
+
+Export of annotator identified features
+---------------------------------------
+Annotators use blixem to verify the details of matches that are hard
+to find automatically: exon variations, splice site corrections, poly-A
+features and more. They need to be able to export these using our
+usual cut buffer mechanism.
+
+
+---------------------------------------------------------------------
+Dotter
+
+
+Bad coordinates
+---------------
+The coordinate range shown by dotter is often wrong, needs fixing.
+
+
+Cut/paste of coordinates
+------------------------
+copy/paste to/from dotter of display coords is poor/doesn't work
+currently.
+
+
+Blacking Out
+------------
+Under certain conditions dotter draws it's entire sequence display
+area black, it's not clear why this is. Adam will provide test
+data.
+
+
+
+
+---------------------------------------------------------------------
+Otterlace
+
+
+Clone Boundaries
+----------------
+The are some edge effects with alignments and other features at clone
+boundaries.
+
+
+
+---------------------------------------------------------------------
+External tools
+
+LookSeq
+-------
+Lookseq was designed to display the very high volumes of sequencing
+data coming from the new machines. It has a web-based version and
+apparently a pure java version. If we could call it directly we
+could take advantage of it as a tool to display this type of data.
+This would require:
+
+- understanding the parameters/input files it requires
+
+- being able to translate our coordinates into the lookseq
+coordinates (not necessary if they use the same underlying assembly.)
+
+- annotators may wish to filter the lookseq display, it's not known
+if this is possible.
+
+
+
+
+====================================================================
--- a/doc/Project_notes/ZMap_Acedb_Goals_2009.shtml
+++ b/doc/Project_notes/ZMap_Acedb_Goals_2009.shtml
+<!--#set var="banner" value="ZMap/Acedb Development Plans for Autumn 2009/Spring 2010"-->
<!--#include virtual="/perl/header"-->


<h2>ZMap/Acedb Development Plans for Autumn 2009/Spring 2010</h2>

<br />
<fieldset>
<legend>Introduction</legend>

<p>The current state of play is:<p>

<ul>
  <li><p>ZMap has now replaced xace as the annotation tool in havana. The code is largely stable
      and performance is mostly acceptable, most functions that annotators used in xace
      have been replicated in ZMap. In addition new functions have been added as well as
      code to support much better interactivity between Otterlace and ZMap.<p>
  <li><p>Acedb had been maintained at an acceptable level for the Worm group and agreement was
      reached to stop any development on xace at all. Development continues on the
      commandline/server code because this is needed to support Otterlace and ZMap.<p>
</ul>

<p>The next phase for ZMap is to build on the existing code to add completely new
facilities for annotation as described in this document.</p>

</fieldset>



<br />
<fieldset>
<legend>Staffing</legend>

<p>This summer saw the departure of Roy Storey who had worked both on Otterlace and ZMap for some
years, he will be a hard act to follow. He will be replaced as soon as possible and in addition
there is funding for another person for 2 years. This is good news as it will mean that several
pieces of work that have remained as prototypes until now will be able to be finished.</p>

<p>It is anticipated that both positions will be filled by this Autumn and this would mean
the period up to Christmas will largely be one of learning the system with proper development
beginning in Spring 2010.</p>

</fieldset>



<br />
<fieldset>
<legend>Variation Display</legend>

<p>Annotation of "canonical" organism DNA is now more than a decade old and being superseded by
the need to annotate inter and intra species variation. The display of data for single sequence
annotation is increasingly challenging and the extension of this to variation data is not simple
and Whether in zmap, blixem or whatever we don't really have good ways to present this
information currently.</p>

<p>The major challenges are:</p>


<ul>
  <li>Dealing with the huge volumes of data, in particular alignments.
  <li>Handling screen real estate in a way that is useful to the annotator.
  <li>Displaying the different types of variation data in an informative way.
</ul>


<p>Variation can be taken to include:</p>

<ul>
  <li>SNPs
  <li>Alleles
  <li>CNVs
  <li>Haplotypes
  <li>Chromosomal rearrangements
</ul>


<p>Clearly several different types of display will be required for these quite different types of
data. Currently our toolkit has two major display components:</p>

<ul>
  <li>ZMap for features display
  <li>Blixem for DNA or peptide sequence comparisons
</ul>

<p>ZMap requires enhancements to display some of this data while blixem will require some parts
to be completely rewritten.</p>


</fieldset>


<br />
<fieldset>
<legend>Improving the "Annotation Suite"</legend>

<p>Through experience with Acedb in particular it became clear that the annotation "viewer" and
the annotation "database" should be separated into separate programs. Databases all have their own
semantics, it is vital to keep these separate from the viewer program if the latter is to be a
more general tool for annotators. This is the approach taken with the Otterlace/ZMap system (OZ)
and is one being considered by other developers (e.g. Apollo, Suzy Lewis pers com).  OZ has a
number of component programs that must communicate with each other to give as seamless a system as
possible and this is leading to the development of protocols for annotation program
inter-communication. Currently we have three major components that must communicate together:</p>

<ul>
  <li>Otterlace editing/DB system
  <li>ZMap display system
  <li>Helper programs: Blixem, dotter, belvu and others
</ul>


<p>Communication between these components is rudimentary at the moment and the ease of use of OZ
could be considerably improved with enhancements to the current "protocols".</p>

<p>ZMap and blixem need to be very tightly linked and a better alternative would be to incorporate
blixem function into ZMap in the form of a new ZMap window. This would allow for much more
sophisticated interaction. Maybe the overview panel in blixem would not even be needed, since this
duplicates some Zmap functions. The blixem code is poorly organised, which prevents further major
development.</p>

</fieldset>


<br />
<fieldset>
<legend>Data Source/Format Support</legend>

<p>Slowly but surely a few data sources/formats are becoming "standards" for bioinformatics
e.g. GFFv3. In particular the use of ontologies (e.g. SOFA) is becoming obligatory to ensure data
integrity and interchange. Annotation at Sanger needs to change to actively use more of these
formats which requires a number of components to be augmented to support these standards.</p>

<p>Most immediately:

<ul>
  <li>Add GFFV3 export to acedb
  <li>Add GFFv3 parsing/export to ZMap
  <li>Add Ensembl interface support to ZMap
</ul>


<p>Reuse of Sanger software by external groups is not widespread and adopting some of these common
standards and formats would help to change that. Adoption of our software by external users should
be an important goal for us.</p>

</fieldset>


<br />
<fieldset>
<legend>Strategic Software Decisions</legend>

<p>The OZ system relies on several major external graphical components:</p>

Otterlace: Tk graphics package
ZMap: Gtk graphics package, foocanvas canvas

<p>While Gtk is likely to be long lived both Tk and foocanvas seem to be reaching the end of their
active development lives. This is a concern because these components are unlikely to be developed
further and replacing them will be very time consuming as they are integral to our systems</p>

<p>There is a plan for ZMap to replace foocanvas with goocanvas its more powerful successor but
this is on hold until the Gtk consortium decide whether to adopt the goocanvas as their official
canvas widget</p>

</fieldset>




<br />
<fieldset>
<legend>Improvements to Blixem</legend>

<p>1) Blixem

<p>It's my impression that this tool is important to havana and that we could enhance it
in a number of ways:</p>

<ul>
  <li>make it deal with all combinations of strands, nucleotide and peptide
      alignments correctly (it does not do this currently)
  <li>make it more informative (show more information visually), about
      splice sites, gaps etc
  <li>make it able to interact with other programs (e.g. ZMap) to provide
      better navigation etc.
  <li>improve general display stuff like the transcript display.
  <li>make blixem able to take data in cigar and exonerate formats, the
      latter could be used to give more information about matches.
</ul>
      
<p>all of this can be done without a rewrite which I think is preferable otherwise there
is a danger that the 2 years could be absorbed by just reimplementing rather than
extending.</p>

<p>Along the same lines it may be that there are enhancements to dotter that would also
help you.</p>


</fieldset>







<!--#include virtual="/perl/footer"-->
\ No newline at end of file