Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Z
zmap
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Iterations
Wiki
Requirements
Jira
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package Registry
Container Registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ensembl-gh-mirror
zmap
Commits
ab7d448b
Commit
ab7d448b
authored
15 years ago
by
edgrif
Browse files
Options
Downloads
Patches
Plain Diff
first versions
parent
dfa26f2c
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/Papers/Annotation_clients.txt
+43
-0
43 additions, 0 deletions
doc/Papers/Annotation_clients.txt
doc/Papers/ZMap.txt
+170
-0
170 additions, 0 deletions
doc/Papers/ZMap.txt
with
213 additions
and
0 deletions
doc/Papers/Annotation_clients.txt
0 → 100755
+
43
−
0
View file @
ab7d448b
USE web/user_doc/xremote_overview.shtml as the basis for a paper.
Paper on new annotation clients using interclient comms...
check what gmod does
use the unix model...tools that have one purpose, separation of concerns etc.
show our model and quote suzi lewis and what she would like to do
talk about dynamic loading (plugins)...
Abstract
Genome Browsers and annotation programs have generally been linked
to just one underlying database schema. ZMap is a genome browser
that can be controlled by an external program so that it can be
integrated into an existing annotation system. ZMap separates
out the display of genome features from the database schema to provide
a general purpose display engine.
Introduction
From the start of sequence analysis visualisation of the sequence
and the features of that sequence have been and continue to be
very important. Traditionally sequence annotation
software has operated on data with just one schema and this is
for good reasons. It is not currently possible to produce some
"meta" description of all the popular database schemas that would
allow one piece of software to annotate all databases. The underlying
semantics differ considerably and are often not logically consistent.
ZMap separates out the viewer tasks from the task of editting schema-specific
data.
This diff is collapsed.
Click to expand it.
doc/Papers/ZMap.txt
0 → 100755
+
170
−
0
View file @
ab7d448b
Paper on zmap
Abstract
ZMap is a database independent sequence display program that can be integrated into annotation
systems that run on Unix and Unix-like systems with X-windows. ZMap is written in C for
performance and has many optimisations that allow it to handle large volumes of data under the
control of the annotator. It is multi-threaded to allow background loading of data while
maintaining a responsive GUI. It currently supports acedb, GFF and DAS v1 datastreams but support
for other formats can easily be added via a "plug-in" architecture (not actually completelly
correct currently...).
Introduction
In general sequence annotation systems have been closely tied to the underlying database or
flat-file system holding the annotation data. While this makes for easier programming it has two
major disadvantages: the systems can only be used with a limited number of data formats and users
are stuck with whatever annotation semantics the system imposes.
Some systems (e.g. Apollo) included the option of adding code to support other database formats
while others (e.g. acedb) required that the user translate their data into the supported
format. While converting data to the supported format or writing code to support your own data
format is possible it is time consuming and does not solve problems of data semantics. Different
semantics arise for a number of reasons including the underlying general philosophy of the
researchers, differences in the organisms being studied, and historical but intractable
differences.
A solution to these problems is to separate data display from data editting. While there are some
differences between annotation displays (e.g. vertical vs. horizontal layout) they have largely
converged to use the same glyphs and basic layouts. This makes sense as it enables users to
navigate new and different systems without in depth experience. This creates the opportunity for a
data independent annotation viewer that can be used with many annotation systems. ZMap is an
attempt to produce this kind of viewer.
Design Goals
The ZMap project was initiated with the goal of significantly improving the annotation interface
available to researchers at the Sanger Institute. To do this it had to fulfill a number of goals:
- use a modern but portable GUI library which integrates well into different flavours of unix.
- be able to display large numbers of features efficiently
- make use of threads to allow data loading independently of the GUI as modern browsers do
- be independent of any one database format so that it could be used by a variety of end users.
Experience has shown that languages that are interpreter based (perl, python, java) do not provide
the performance to deal with displaying large volumes of data. The C language was chosen because
of it's potential for good performance and it's portability. The GTK toolkit was used for the GUI
with the foocanvas being used for the actual sequence display. This combination has proved robust
and has provided the performance to display large numbers of features in very large scrollable
windows. The following sections describe the key components of ZMap.
Threading model
Providing a responsive GUI while loading data is a problem that has been tackled in different
ways, prior to the introduction of a portable threading interface code had to be written so that
long running functions would allow periodic updates to the GUI. The X Windows server is a good
example of this. The introduction of threading makes tackling this problem easier as it means that
the long running function can be run in a separate thread without the need to callback to the GUI.
Certain software constraints mean that a simple model presents itself. The X Window library while
being thread safe is not multi-threaded meaning that there is little or no gain to having more
than one thread in the GUI code. The obvious model therefore is to have one "master" thread
running the GUI and in effect controlling the application, "slave" threads are then used to fetch
and process data for display by the GUI thread (see figure XX).
The separation of the threads also naturally leads to a "plugin" model for adding new data source
modules. There is a single standardised interface between the GUI thread and it's source threads,
new modules can be added without alteration to the interface code which provides the "bridge"
between the threads (see bridge pattern in design patterns). See figure XXX.
(should we add stuff about cancelling....currently it doesn't work that well so try it out
first...also I think that restart doesn't work ???)
Data display
Sequence display must cope with very large coordinate systems that are then mapped to the
screen. The foocanvas holds it's "world view" in floating point coordinates allowing plotting to
as fine a scale as required for even whole chromosome display. The canvas has the concept of a
world view and a subsection of that world view that is displayable (the "scrollable area"). This
is important both for performance but also because underlaying window systems usually have a
maximum limit to the size of window they can display. For X windows the biggest window that can
usefully be handled is 32k by 32k pixels. ZMap therefore limits the scrollable area to this size
or less but allows scrolling of the overall scrollable area so that even at high zoom levels the
user can quickly and easily scroll over large areas of sequence.
A primary design goal was to allow users to rapidly scroll around and zoom in and out of the
sequence being annotated. In addition, like any good text editor, they can split the view of
the sequence an arbitrary number of times both horizontally and vertically allowing for instance
the simultaneous viewing of different sections of a long transcript.
The foocanvas like other canvas packages allows the addition of custom written "canvas items"
which is particularly important for a genome annotation program as there are current and emerging
standards for the shapes used to represent different kinds of features. ZMap supports a number of
these (see Fig xx) and it is easy to add more. They range from very simple glyphs to much more
complex items that display a complete transcript with CDS, UTR and other parts marked up.
ZMap has the concept of a 'mark' region which by default is the entire displayed sequence but can
easily be set to encompass just the area of the window or any feature within the window. Many
operations are limited to the mark region which has two benefits:
- better performance since only a relatively small number of features need be manipulated
- more meaningful display, zmap offers various ways of clustering individual columns to make
them easier to view and often these work best if only the features in the marked area are
clustered, e.g. one cluster mode marks up all colinear homology matches and this makes most
sense when only those homology matches that are in the same coordinate range as a particular
transcript are clustered see fig....
Efficiency in data display
Displaying all features for a large sequence can be too slow even with optmised compiled
code. Fortunately it is possible to optimise display both by differential loading and differential
display of features.
- differential display via zoom factors
Each feature set has a minimum and maximum zoom factor controlling the range of magnifications at
which the features are displayed. This allows the user to specify that very numerous features such
as homologies are only displayed at higher magnifications.
- differential display via the "mark" region
- Lazy Loading
Features can be selectively loaded under the control of the user and sys Loading all features for
a large sequence can consume large amounts of memory and Usually annotators do not need to see all
features for the entire range of the sequence they are annotating, they instead concentrate on
sub-areas of interest.
- describe threading model
- describe server model
- describe use of autoconf
ZMap and otterlace
The first annotation system that ZMap has been used for is the Otterlace system which is a
perl Tk based application and associated pipelines that is used for vertebrate annotation
at the Sanger Institute and elsewhere.
Conclusions
need acedb, otterlace, gmod, apollo refs
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment