first versions

ab7d448b · edgrif · dfa26f2c · ab7d448b · ab7d448b
Commit ab7d448b authored 15 years ago by edgrif
--- a/doc/Papers/Annotation_clients.txt
+++ b/doc/Papers/Annotation_clients.txt
+
+USE web/user_doc/xremote_overview.shtml as the basis for a paper.
+
+
+Paper on new annotation clients using interclient comms...
+
+check what gmod does
+
+use the unix model...tools that have one purpose, separation of concerns etc.
+
+show our model and quote suzi lewis and what she would like to do
+
+talk about dynamic loading (plugins)...
+
+
+
+
+Abstract
+
+Genome Browsers and annotation programs have generally been linked
+to just one underlying database schema. ZMap is a genome browser
+that can be controlled by an external program so that it can be
+integrated into an existing annotation system. ZMap separates
+out the display of genome features from the database schema to provide
+a general purpose display engine.
+
+
+
+Introduction
+
+From the start of sequence analysis visualisation of the sequence
+and the features of that sequence have been and continue to be
+very important. Traditionally sequence annotation
+software has operated on data with just one schema and this is
+for good reasons. It is not currently possible to produce some
+"meta" description of all the popular database schemas that would
+allow one piece of software to annotate all databases. The underlying
+semantics differ considerably and are often not logically consistent.
+
+ZMap separates out the viewer tasks from the task of editting schema-specific
+data.
+
+
--- a/doc/Papers/ZMap.txt
+++ b/doc/Papers/ZMap.txt
+
+Paper on zmap
+
+Abstract
+
+ZMap is a database independent sequence display program that can be integrated into annotation
+systems that run on Unix and Unix-like systems with X-windows. ZMap is written in C for
+performance and has many optimisations that allow it to handle large volumes of data under the
+control of the annotator. It is multi-threaded to allow background loading of data while
+maintaining a responsive GUI. It currently supports acedb, GFF and DAS v1 datastreams but support
+for other formats can easily be added via a "plug-in" architecture (not actually completelly
+correct currently...).
+
+
+
+Introduction
+
+In general sequence annotation systems have been closely tied to the underlying database or
+flat-file system holding the annotation data. While this makes for easier programming it has two
+major disadvantages: the systems can only be used with a limited number of data formats and users
+are stuck with whatever annotation semantics the system imposes.
+
+Some systems (e.g. Apollo) included the option of adding code to support other database formats
+while others (e.g. acedb) required that the user translate their data into the supported
+format. While converting data to the supported format or writing code to support your own data
+format is possible it is time consuming and does not solve problems of data semantics. Different
+semantics arise for a number of reasons including the underlying general philosophy of the
+researchers, differences in the organisms being studied, and historical but intractable
+differences.
+
+A solution to these problems is to separate data display from data editting.  While there are some
+differences between annotation displays (e.g. vertical vs. horizontal layout) they have largely
+converged to use the same glyphs and basic layouts. This makes sense as it enables users to
+navigate new and different systems without in depth experience. This creates the opportunity for a
+data independent annotation viewer that can be used with many annotation systems. ZMap is an
+attempt to produce this kind of viewer.
+
+
+Design Goals
+
+The ZMap project was initiated with the goal of significantly improving the annotation interface
+available to researchers at the Sanger Institute. To do this it had to fulfill a number of goals:
+
+- use a modern but portable GUI library which integrates well into different flavours of unix.
+
+- be able to display large numbers of features efficiently
+
+- make use of threads to allow data loading independently of the GUI as modern browsers do
+
+- be independent of any one database format so that it could be used by a variety of end users.
+
+Experience has shown that languages that are interpreter based (perl, python, java) do not provide
+the performance to deal with displaying large volumes of data. The C language was chosen because
+of it's potential for good performance and it's portability. The GTK toolkit was used for the GUI
+with the foocanvas being used for the actual sequence display. This combination has proved robust
+and has provided the performance to display large numbers of features in very large scrollable
+windows. The following sections describe the key components of ZMap.
+
+
+Threading model
+
+Providing a responsive GUI while loading data is a problem that has been tackled in different
+ways, prior to the introduction of a portable threading interface code had to be written so that
+long running functions would allow periodic updates to the GUI. The X Windows server is a good
+example of this. The introduction of threading makes tackling this problem easier as it means that
+the long running function can be run in a separate thread without the need to callback to the GUI.
+
+Certain software constraints mean that a simple model presents itself.  The X Window library while
+being thread safe is not multi-threaded meaning that there is little or no gain to having more
+than one thread in the GUI code.  The obvious model therefore is to have one "master" thread
+running the GUI and in effect controlling the application, "slave" threads are then used to fetch
+and process data for display by the GUI thread (see figure XX).
+
+The separation of the threads also naturally leads to a "plugin" model for adding new data source
+modules. There is a single standardised interface between the GUI thread and it's source threads,
+new modules can be added without alteration to the interface code which provides the "bridge"
+between the threads (see bridge pattern in design patterns). See figure XXX.
+
+(should we add stuff about cancelling....currently it doesn't work that well so try it out
+first...also I think that restart doesn't work ???)
+
+
+Data display
+
+Sequence display must cope with very large coordinate systems that are then mapped to the
+screen. The foocanvas holds it's "world view" in floating point coordinates allowing plotting to
+as fine a scale as required for even whole chromosome display. The canvas has the concept of a
+world view and a subsection of that world view that is displayable (the "scrollable area"). This
+is important both for performance but also because underlaying window systems usually have a
+maximum limit to the size of window they can display. For X windows the biggest window that can
+usefully be handled is 32k by 32k pixels.  ZMap therefore limits the scrollable area to this size
+or less but allows scrolling of the overall scrollable area so that even at high zoom levels the
+user can quickly and easily scroll over large areas of sequence.
+
+A primary design goal was to allow users to rapidly scroll  around and zoom in and out of the
+sequence being annotated. In addition, like any good text editor, they can split the view of
+the sequence an arbitrary number of times both horizontally and vertically allowing for instance
+the simultaneous viewing of different sections of a long transcript.
+
+The foocanvas like other canvas packages allows the addition of custom written "canvas items"
+which is particularly important for a genome annotation program as there are current and emerging
+standards for the shapes used to represent different kinds of features. ZMap supports a number of
+these (see Fig xx) and it is easy to add more. They range from very simple glyphs to much more
+complex items that display a complete transcript with CDS, UTR and other parts marked up.
+
+ZMap has the concept of a 'mark' region which by default is the entire displayed sequence but can
+easily be set to encompass just the area of the window or any feature within the window. Many 
+operations are limited to the mark region which has two benefits:
+
+- better performance since only a relatively small number of features need be manipulated
+
+- more meaningful display, zmap offers various ways of clustering individual columns to make
+  them easier to view and often these work best if only the features in the marked area are
+  clustered, e.g. one cluster mode marks up all colinear homology matches and this makes most
+  sense when only those homology matches that are in the same coordinate range as a particular
+  transcript are clustered see fig....
+
+
+
+
+
+Efficiency in data display
+
+Displaying all features for a large sequence can be too slow even with optmised compiled
+code. Fortunately it is possible to optimise display both by differential loading and differential
+display of features.
+
+
+- differential display via zoom factors
+
+Each feature set has a minimum and maximum zoom factor controlling the range of magnifications at
+which the features are displayed. This allows the user to specify that very numerous features such
+as homologies are only displayed at higher magnifications.
+
+- differential display via the "mark" region    
+
+
+- Lazy Loading
+
+Features can be selectively loaded under the control of the user and sys Loading all features for
+a large sequence can consume large amounts of memory and Usually annotators do not need to see all
+features for the entire range of the sequence they are annotating, they instead concentrate on
+sub-areas of interest.
+
+
+- describe threading model
+
+- describe server model
+
+- describe use of autoconf
+
+
+
+ZMap and otterlace
+
+The first annotation system that ZMap has been used for is the Otterlace system which is a
+perl Tk based application and associated pipelines that is used for vertebrate annotation
+at the Sanger Institute and elsewhere. 
+
+
+
+
+
+
+Conclusions
+
+
+
+
+need acedb, otterlace, gmod, apollo refs