From ab7d448bd47281fdb33856fb0d0f61b978071ed4 Mon Sep 17 00:00:00 2001 From: edgrif <edgrif> Date: Thu, 2 Jul 2009 13:49:29 +0000 Subject: [PATCH] first versions --- doc/Papers/Annotation_clients.txt | 43 ++++++++ doc/Papers/ZMap.txt | 170 ++++++++++++++++++++++++++++++ 2 files changed, 213 insertions(+) create mode 100755 doc/Papers/Annotation_clients.txt create mode 100755 doc/Papers/ZMap.txt diff --git a/doc/Papers/Annotation_clients.txt b/doc/Papers/Annotation_clients.txt new file mode 100755 index 000000000..5825bc387 --- /dev/null +++ b/doc/Papers/Annotation_clients.txt @@ -0,0 +1,43 @@ + +USE web/user_doc/xremote_overview.shtml as the basis for a paper. + + +Paper on new annotation clients using interclient comms... + +check what gmod does + +use the unix model...tools that have one purpose, separation of concerns etc. + +show our model and quote suzi lewis and what she would like to do + +talk about dynamic loading (plugins)... + + + + +Abstract + +Genome Browsers and annotation programs have generally been linked +to just one underlying database schema. ZMap is a genome browser +that can be controlled by an external program so that it can be +integrated into an existing annotation system. ZMap separates +out the display of genome features from the database schema to provide +a general purpose display engine. + + + +Introduction + +From the start of sequence analysis visualisation of the sequence +and the features of that sequence have been and continue to be +very important. Traditionally sequence annotation +software has operated on data with just one schema and this is +for good reasons. It is not currently possible to produce some +"meta" description of all the popular database schemas that would +allow one piece of software to annotate all databases. The underlying +semantics differ considerably and are often not logically consistent. + +ZMap separates out the viewer tasks from the task of editting schema-specific +data. + + diff --git a/doc/Papers/ZMap.txt b/doc/Papers/ZMap.txt new file mode 100755 index 000000000..e1379650f --- /dev/null +++ b/doc/Papers/ZMap.txt @@ -0,0 +1,170 @@ + +Paper on zmap + +Abstract + +ZMap is a database independent sequence display program that can be integrated into annotation +systems that run on Unix and Unix-like systems with X-windows. ZMap is written in C for +performance and has many optimisations that allow it to handle large volumes of data under the +control of the annotator. It is multi-threaded to allow background loading of data while +maintaining a responsive GUI. It currently supports acedb, GFF and DAS v1 datastreams but support +for other formats can easily be added via a "plug-in" architecture (not actually completelly +correct currently...). + + + +Introduction + +In general sequence annotation systems have been closely tied to the underlying database or +flat-file system holding the annotation data. While this makes for easier programming it has two +major disadvantages: the systems can only be used with a limited number of data formats and users +are stuck with whatever annotation semantics the system imposes. + +Some systems (e.g. Apollo) included the option of adding code to support other database formats +while others (e.g. acedb) required that the user translate their data into the supported +format. While converting data to the supported format or writing code to support your own data +format is possible it is time consuming and does not solve problems of data semantics. Different +semantics arise for a number of reasons including the underlying general philosophy of the +researchers, differences in the organisms being studied, and historical but intractable +differences. + +A solution to these problems is to separate data display from data editting. While there are some +differences between annotation displays (e.g. vertical vs. horizontal layout) they have largely +converged to use the same glyphs and basic layouts. This makes sense as it enables users to +navigate new and different systems without in depth experience. This creates the opportunity for a +data independent annotation viewer that can be used with many annotation systems. ZMap is an +attempt to produce this kind of viewer. + + +Design Goals + +The ZMap project was initiated with the goal of significantly improving the annotation interface +available to researchers at the Sanger Institute. To do this it had to fulfill a number of goals: + +- use a modern but portable GUI library which integrates well into different flavours of unix. + +- be able to display large numbers of features efficiently + +- make use of threads to allow data loading independently of the GUI as modern browsers do + +- be independent of any one database format so that it could be used by a variety of end users. + +Experience has shown that languages that are interpreter based (perl, python, java) do not provide +the performance to deal with displaying large volumes of data. The C language was chosen because +of it's potential for good performance and it's portability. The GTK toolkit was used for the GUI +with the foocanvas being used for the actual sequence display. This combination has proved robust +and has provided the performance to display large numbers of features in very large scrollable +windows. The following sections describe the key components of ZMap. + + +Threading model + +Providing a responsive GUI while loading data is a problem that has been tackled in different +ways, prior to the introduction of a portable threading interface code had to be written so that +long running functions would allow periodic updates to the GUI. The X Windows server is a good +example of this. The introduction of threading makes tackling this problem easier as it means that +the long running function can be run in a separate thread without the need to callback to the GUI. + +Certain software constraints mean that a simple model presents itself. The X Window library while +being thread safe is not multi-threaded meaning that there is little or no gain to having more +than one thread in the GUI code. The obvious model therefore is to have one "master" thread +running the GUI and in effect controlling the application, "slave" threads are then used to fetch +and process data for display by the GUI thread (see figure XX). + +The separation of the threads also naturally leads to a "plugin" model for adding new data source +modules. There is a single standardised interface between the GUI thread and it's source threads, +new modules can be added without alteration to the interface code which provides the "bridge" +between the threads (see bridge pattern in design patterns). See figure XXX. + +(should we add stuff about cancelling....currently it doesn't work that well so try it out +first...also I think that restart doesn't work ???) + + +Data display + +Sequence display must cope with very large coordinate systems that are then mapped to the +screen. The foocanvas holds it's "world view" in floating point coordinates allowing plotting to +as fine a scale as required for even whole chromosome display. The canvas has the concept of a +world view and a subsection of that world view that is displayable (the "scrollable area"). This +is important both for performance but also because underlaying window systems usually have a +maximum limit to the size of window they can display. For X windows the biggest window that can +usefully be handled is 32k by 32k pixels. ZMap therefore limits the scrollable area to this size +or less but allows scrolling of the overall scrollable area so that even at high zoom levels the +user can quickly and easily scroll over large areas of sequence. + +A primary design goal was to allow users to rapidly scroll around and zoom in and out of the +sequence being annotated. In addition, like any good text editor, they can split the view of +the sequence an arbitrary number of times both horizontally and vertically allowing for instance +the simultaneous viewing of different sections of a long transcript. + +The foocanvas like other canvas packages allows the addition of custom written "canvas items" +which is particularly important for a genome annotation program as there are current and emerging +standards for the shapes used to represent different kinds of features. ZMap supports a number of +these (see Fig xx) and it is easy to add more. They range from very simple glyphs to much more +complex items that display a complete transcript with CDS, UTR and other parts marked up. + +ZMap has the concept of a 'mark' region which by default is the entire displayed sequence but can +easily be set to encompass just the area of the window or any feature within the window. Many +operations are limited to the mark region which has two benefits: + +- better performance since only a relatively small number of features need be manipulated + +- more meaningful display, zmap offers various ways of clustering individual columns to make + them easier to view and often these work best if only the features in the marked area are + clustered, e.g. one cluster mode marks up all colinear homology matches and this makes most + sense when only those homology matches that are in the same coordinate range as a particular + transcript are clustered see fig.... + + + + + +Efficiency in data display + +Displaying all features for a large sequence can be too slow even with optmised compiled +code. Fortunately it is possible to optimise display both by differential loading and differential +display of features. + + +- differential display via zoom factors + +Each feature set has a minimum and maximum zoom factor controlling the range of magnifications at +which the features are displayed. This allows the user to specify that very numerous features such +as homologies are only displayed at higher magnifications. + +- differential display via the "mark" region + + +- Lazy Loading + +Features can be selectively loaded under the control of the user and sys Loading all features for +a large sequence can consume large amounts of memory and Usually annotators do not need to see all +features for the entire range of the sequence they are annotating, they instead concentrate on +sub-areas of interest. + + +- describe threading model + +- describe server model + +- describe use of autoconf + + + +ZMap and otterlace + +The first annotation system that ZMap has been used for is the Otterlace system which is a +perl Tk based application and associated pipelines that is used for vertebrate annotation +at the Sanger Institute and elsewhere. + + + + + + +Conclusions + + + + +need acedb, otterlace, gmod, apollo refs -- GitLab