Skip to content
Snippets Groups Projects
Commit f04135a4 authored by edgrif's avatar edgrif
Browse files

first versions

parent ee5173df
No related branches found
No related tags found
No related merge requests found
====================================================================
Annotation Tools Meeting - 10th Sept 2009
--------------------------------------------------------------------
General Tools changes
Tool monitoring
---------------
All the annotation tools have options that are probably either unused
or only rarely used. We need to monitor which options are used and
use this information to better tailor menus, shortcuts etc.
Along with this is would be good to provide tools to allow annotators
to record which activities are repetitive and/or time consuming.
Shortcuts
---------
There are some widely accepted standards that we should support
e.g. "Close window" is usually Cntl-W, but we should also ensure
shortcuts are common across our own tools.
--------------------------------------------------------------------
ZMap
Show mismatches
---------------
Request for zmap to add bars (or other indicators) to boxes
representing matches to show where there are nucleotide mismatches.
This is problematical because we would need to compare the actual
sequences and it is not workable for zmap to store all the match
sequences in order to do this. BUT we could add % identity to the
data displayed or indicate in some other way the goodness of the
match.
Show raw sequences
------------------
A recurrent request from annotators is that they wish to see the
raw match sequences, this was doable for previous sequence sources
(just about) but will not be possible for sources from the new
sequencing machines because of the sheer volume of data.
We need to think about a different way to display this data, much
of it is repetitive (i.e. redundant) and it feels like there should
be some way to show this to the annotator in as compressed but
still informative way.
Navigation
----------
We still have more we could do, e.g. shortcuts to jump up and down
exons of a selected transcript, split windows for transcript
viewing in a useful way etc. etc.
--------------------------------------------------------------------
Blixem
Navigation
----------
Blixem-ZMap: commonly annotators wish to move backwards and forwards
between the two programs to view a particular alignment(s). ZMap
already has inter-program communication but this needs adding to
blixem to allow the user to click on an alignment and then move
to it in blixem and vice versa.
Match Sequence zooming/scrolling
--------------------------------
Currently the font size for the actual match sequences is fixed which
limits the length of sequence that can be viewed. By introducing text
display in one or two other sizes the text could in effect be zoomed
to allow the user to see more or less sequence.
Several uses would also like there to scroll bars with more continuous
and smooth scrolling than there is now.
Match Sequence Ends Display
---------------------------
Currently blixem only displays the run of bases from the match string
for the alignment block, annotators would find it useful to be able
to see the leading and trailing bases as well to help with verifying
splice sites. This requires:
- setting the number of bases displayed via a config file option.
- visual indication of whether extra bases are unmatched or match a
previous/subsequent block.
Exons in the navigation window
------------------------------
Several improvements are required:
- provide some kind of "bumping" mechanism to allow users to see
overlapping transcripts more clearly.
- add labels to exons to identify their feature.
- allow users to select which types of exons are displayed, could make
this interactive, i.e. load several types and then allow user to filter.
Matches in navigation window
----------------------------
Currently an overview of the forward/reverse strand match alignments
displayed in graph form with the y axis for both running from 0 to
100% Identity. This usually wastes considerable screen space as most
matches lie in the 90 to 100% range. The range shown should be
calculated dynamically so that the space allocated is minimal.
Alignment Choosing
------------------
Annotators would like to be able to choose which matches are shown in
blixem:
- single match (already done)
- all highlighted matches
- all matches within the marked region
- several different kinds (columns) of matches
Exonerate type matches
----------------------
Exon-aware aligners the produce one extended alignment where previously
there were a number of independant HSP's. Currently blixem orders the
match sequences according to properties of just those match sequences
that can be seen in the lower window. This will need to change in at
least two ways:
- score information will need to be added in addition to %id
- the overall %id or score of the complete alignment will need to be used
for ordering by score, not just the score for that individual block.
- the display of the main blocks for these alignments will need to
have a visual indication that the block forms part of a larger
prediction.
Export of annotator identified features
---------------------------------------
Annotators use blixem to verify the details of matches that are hard
to find automatically: exon variations, splice site corrections, poly-A
features and more. They need to be able to export these using our
usual cut buffer mechanism.
---------------------------------------------------------------------
Dotter
Bad coordinates
---------------
The coordinate range shown by dotter is often wrong, needs fixing.
Cut/paste of coordinates
------------------------
copy/paste to/from dotter of display coords is poor/doesn't work
currently.
Blacking Out
------------
Under certain conditions dotter draws it's entire sequence display
area black, it's not clear why this is. Adam will provide test
data.
---------------------------------------------------------------------
Otterlace
Clone Boundaries
----------------
The are some edge effects with alignments and other features at clone
boundaries.
---------------------------------------------------------------------
External tools
LookSeq
-------
Lookseq was designed to display the very high volumes of sequencing
data coming from the new machines. It has a web-based version and
apparently a pure java version. If we could call it directly we
could take advantage of it as a tool to display this type of data.
This would require:
- understanding the parameters/input files it requires
- being able to translate our coordinates into the lookseq
coordinates (not necessary if they use the same underlying assembly.)
- annotators may wish to filter the lookseq display, it's not known
if this is possible.
====================================================================
<!--#set var="banner" value="ZMap/Acedb Development Plans for Autumn 2009/Spring 2010"--> <!--#include virtual="/perl/header"--> <h2>ZMap/Acedb Development Plans for Autumn 2009/Spring 2010</h2> <br /> <fieldset> <legend>Introduction</legend> <p>The current state of play is:<p> <ul> <li><p>ZMap has now replaced xace as the annotation tool in havana. The code is largely stable and performance is mostly acceptable, most functions that annotators used in xace have been replicated in ZMap. In addition new functions have been added as well as code to support much better interactivity between Otterlace and ZMap.<p> <li><p>Acedb had been maintained at an acceptable level for the Worm group and agreement was reached to stop any development on xace at all. Development continues on the commandline/server code because this is needed to support Otterlace and ZMap.<p> </ul> <p>The next phase for ZMap is to build on the existing code to add completely new facilities for annotation as described in this document.</p> </fieldset> <br /> <fieldset> <legend>Staffing</legend> <p>This summer saw the departure of Roy Storey who had worked both on Otterlace and ZMap for some years, he will be a hard act to follow. He will be replaced as soon as possible and in addition there is funding for another person for 2 years. This is good news as it will mean that several pieces of work that have remained as prototypes until now will be able to be finished.</p> <p>It is anticipated that both positions will be filled by this Autumn and this would mean the period up to Christmas will largely be one of learning the system with proper development beginning in Spring 2010.</p> </fieldset> <br /> <fieldset> <legend>Variation Display</legend> <p>Annotation of "canonical" organism DNA is now more than a decade old and being superseded by the need to annotate inter and intra species variation. The display of data for single sequence annotation is increasingly challenging and the extension of this to variation data is not simple and Whether in zmap, blixem or whatever we don't really have good ways to present this information currently.</p> <p>The major challenges are:</p> <ul> <li>Dealing with the huge volumes of data, in particular alignments. <li>Handling screen real estate in a way that is useful to the annotator. <li>Displaying the different types of variation data in an informative way. </ul> <p>Variation can be taken to include:</p> <ul> <li>SNPs <li>Alleles <li>CNVs <li>Haplotypes <li>Chromosomal rearrangements </ul> <p>Clearly several different types of display will be required for these quite different types of data. Currently our toolkit has two major display components:</p> <ul> <li>ZMap for features display <li>Blixem for DNA or peptide sequence comparisons </ul> <p>ZMap requires enhancements to display some of this data while blixem will require some parts to be completely rewritten.</p> </fieldset> <br /> <fieldset> <legend>Improving the "Annotation Suite"</legend> <p>Through experience with Acedb in particular it became clear that the annotation "viewer" and the annotation "database" should be separated into separate programs. Databases all have their own semantics, it is vital to keep these separate from the viewer program if the latter is to be a more general tool for annotators. This is the approach taken with the Otterlace/ZMap system (OZ) and is one being considered by other developers (e.g. Apollo, Suzy Lewis pers com). OZ has a number of component programs that must communicate with each other to give as seamless a system as possible and this is leading to the development of protocols for annotation program inter-communication. Currently we have three major components that must communicate together:</p> <ul> <li>Otterlace editing/DB system <li>ZMap display system <li>Helper programs: Blixem, dotter, belvu and others </ul> <p>Communication between these components is rudimentary at the moment and the ease of use of OZ could be considerably improved with enhancements to the current "protocols".</p> <p>ZMap and blixem need to be very tightly linked and a better alternative would be to incorporate blixem function into ZMap in the form of a new ZMap window. This would allow for much more sophisticated interaction. Maybe the overview panel in blixem would not even be needed, since this duplicates some Zmap functions. The blixem code is poorly organised, which prevents further major development.</p> </fieldset> <br /> <fieldset> <legend>Data Source/Format Support</legend> <p>Slowly but surely a few data sources/formats are becoming "standards" for bioinformatics e.g. GFFv3. In particular the use of ontologies (e.g. SOFA) is becoming obligatory to ensure data integrity and interchange. Annotation at Sanger needs to change to actively use more of these formats which requires a number of components to be augmented to support these standards.</p> <p>Most immediately: <ul> <li>Add GFFV3 export to acedb <li>Add GFFv3 parsing/export to ZMap <li>Add Ensembl interface support to ZMap </ul> <p>Reuse of Sanger software by external groups is not widespread and adopting some of these common standards and formats would help to change that. Adoption of our software by external users should be an important goal for us.</p> </fieldset> <br /> <fieldset> <legend>Strategic Software Decisions</legend> <p>The OZ system relies on several major external graphical components:</p> Otterlace: Tk graphics package ZMap: Gtk graphics package, foocanvas canvas <p>While Gtk is likely to be long lived both Tk and foocanvas seem to be reaching the end of their active development lives. This is a concern because these components are unlikely to be developed further and replacing them will be very time consuming as they are integral to our systems</p> <p>There is a plan for ZMap to replace foocanvas with goocanvas its more powerful successor but this is on hold until the Gtk consortium decide whether to adopt the goocanvas as their official canvas widget</p> </fieldset> <br /> <fieldset> <legend>Improvements to Blixem</legend> <p>1) Blixem <p>It's my impression that this tool is important to havana and that we could enhance it in a number of ways:</p> <ul> <li>make it deal with all combinations of strands, nucleotide and peptide alignments correctly (it does not do this currently) <li>make it more informative (show more information visually), about splice sites, gaps etc <li>make it able to interact with other programs (e.g. ZMap) to provide better navigation etc. <li>improve general display stuff like the transcript display. <li>make blixem able to take data in cigar and exonerate formats, the latter could be used to give more information about matches. </ul> <p>all of this can be done without a rewrite which I think is preferable otherwise there is a danger that the 2 years could be absorbed by just reimplementing rather than extending.</p> <p>Along the same lines it may be that there are enhancements to dotter that would also help you.</p> </fieldset> <!--#include virtual="/perl/footer"-->
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment