zmapFeature.shtml has some notes about the words used for various data items in the ZMap code (which cannot be changed without a lot of work) and this section aims to define terms to be used to describe things that can be configured by the user/ otterlace or presented to the user/ annotators, or used in an external interface eg data servers and X-remote. This may seem pedantic, but there is some confusion caused by the re-use of words to refer to different objects.
ACEDB includes a number of configuration options driven by database tables and this functionality must be supported, and other than that we need to be able to specify what data to display, where to request the data, and where to display it.
We also have to specify a display style for each data item and the display columns themselves.
A collection of data items of the same type. These will be identified in a GFF file with the same name (ie a GFF source, typically representing the output from a single analysis), and each individual data item (corresponding to one line in a GFF file) is a feature.
To make the implementation of external servers easier it is also desirable to allow the featureset name to be different from the GFF source name.
Each featureset is assigned a style, which defaults to having the same name as the featureset. Featuresets with no style defined cannot be displayed. Due to the behaviour of external sources it is desirable to be able to specify a style of a different name, and this would also allow styles to be shared amongst sources.
NB: currently for a pipeServer the featureset name must be the same as the GFF source name, and the pipeServer must also be given a styles file containing a style of the same name.
A display column in a window in a view. ZMap has a list of display columns which specify in order which data to display across the window. Tradtionally this has been inferred from the list of featuresets specified for each server such that all the featuresets from the first server appear in order, followed by all the featuresets for the other servers as configured.
A column can be filled with any number of featuresets each with thier own style, and can also be assigned a style for the whole column which can specify things like width, display options (eg show/hide), strand specific, frame specific etc. Note that if a column is defined as strand specific then ZMap will draw two columns, one on either side of the strand separator, but for each strand the order will be as specified. I'm not clear about whether or not this is currently true, perhaps the first style in a colums's list is used; however it makes sense to me to do this explicitly.
These are supplied either by a text file or from a Server (eg ACEDB, DAS). The style defines how to display a feature, and combines data for the appearance and also when and where to display it. Refer to Request Timing below for some notes about a third aspect of styles.
Where to request a data item from, which can only be a data server such as ACEDB, pipe, DAS, file.
We need all these to be configurable, with sensible defaults (ie same name for a 1-1 relationship)
There are a number of competing issues:
There is a conflict between the 'best' configuration between ACEDB and pipeServers which have a different perspective: a pipeServer is intended to supply one featureset and ACEDB to supply many and to provide links between them and other configuration options. However, it is likely that even pipeServers will supply multiple GFF sources, and that these can appear in different display columns - all combinations are expected in practice.
Any solution will be a compromise - what is proposed is to provide configuration options for ZMap from the perspective of ZMap rather than any particular external module. By referring to the ISO/OSI 7-layer model (try google) we hope to suggest the cleanest way to divide up the various configuration options. Other than this the approach is to avoid too great a re-write of Zmap code and to use formats similar to what's currently in use. Similar code has to be maintained to support exisitng ACEDB configurations and only having one implementation will increase reliability.
The columns to display and in which order should be defined explicitly and relate to ZMap rather than data sources or featuresets. Raw data (sets of features supplied externally) is logically at a different level from presentation and should not refer to presentation issues.
Current column lists and ordering is inferred from server config. By defining this explicitly we avoid having to define servers in display order (and can mix & match featuresets), and also choose columns to display without having to change server configuration.
In a simple configuration with one featureset per column a single style can be used to define column related data (eg hide/show) and feature related data (eg blue). However for a column with many featuresets it may be cleaner to define this in a seperate style - one for the column and one for each featureset. In which case the column may be defined with an optional style.
This data will take the form of a new option in the [ZMap] stanza as 'featureset:style':
[ZMap] columns = EST_human ; Repeats ; etc
Another stanza [columns] will allow the user to specify what featuresets to fill a column with, and this allows data from more than one source to be displayed in the same column.
[columns] Repeats = repeatmasker_line ; repeatmasker_sine mRNA = vertebrate_mRNA ; polyA_site ; polyA_signal
By default (if not specifed in the [columns] stanza) each column will be assigned data from the featureset of the same name.
Again this is logically at a different level from the raw data. Traditionally a feature's style has been linked to the feature type, but ACEDB does provide a mapping to arbitrary display styles as a separate database query. It seems logical to provide a similar function in ZMap configuration, but to continue support of existing ACEDB function we need to make this optional for ACEDB. We can default a style to be the same name as the featureset without breaking modularity.
To add in the ability to specify an arbitary style for a featureset we add another stanza for this mapping, and also another one for the column:
[featureset_style] vertebrate_mRNA = vertRNA [column_style] vertebrate_mRNA = vertRNA_col
We wish to define some data as 'requested on startup' or 'requested on demand'. Currently (April 2010) this is done via server config ('delayed=true/false') and this applies to all featuresets supplied by that server.
It would be possible to specify startup and delayed featuresets per server, but perhaps this is overkill: it is just as easy to configure two servers.
ACEDB also defines some featuresets as 'deferred' via the featuresets style, which means that they are not requested with the other features on startup. This relates to communication issues rather than display and we propose that the style options 'deferred' and 'loaded' are removed and replaced by ZMap configuration options. (NB:There is also a third style option 'current_bump_mode' which relates to display but implies a 1-1 featureset-style mapping which is also a candidate for review).
The ZMap Columns dialog lists columns that can be requested post startup - this could / should? be changed to include all configured display columns or featuresets
From a users point of view they wish to request extra features at various times after ZMap startup and we need to define how this process works. The startup situation can be treated in an identical manner (except for the user interaction of course).
From otterlace they request a column (which implies one or more featuresets), from ZMap they also request a column (for deferred styles). When these requests are given to ZMap then they may consist of a list of featuresets. As columns may include data from several sources there can be no direct mapping from column to source; we already have column-featureset defined, in which case the obvious thing is to define a featureset to (GFF) source mapping. This is strictly a 1-1 mapping and will default to the same name.
[featureset_styles] EST_rat = EST_rat_style
[GFF_source] vertebrate_mRNA = vertrna
Column style smay be defined as follows:
[column_styles] EST_rat = align_col_style
For GFF sources we also need a description text (which was supplied by ACEDB) and corresponds to the description in the load column dialog in Otterlace. This will appear in yet another stanza:
[GFF_description] vertna = Vertebrate messenger RNA
The alternative stanza format above will be used, as it is probably easier to read.
In the ZMap code there are a few mappings, originally derived from ACEDB and we need to used these (from ACEDB) and provide similar data for pipeServers. Or rathe that we need to implement the above configuration data in ZMap and either patch the ACEDB data into the same data structures or replace it with our own. ZMapView holds the following mappings:
ZMapFeatureContext has feature_set_names which is the 'requested featuresets' in the context. This derives from the list of featuresets specified in ZMap [server] config stanzas.
In zmapView.c/zMapViewLoadFeatures() (but not zmapView.c/zMapViewConnect()) when geven a request for a GFF source the code tried to map this to a featureset (ie display column) and then tries to find the featureset in any of the configured servers. So, traditionally ZMap has requested display columns aka featuresets from ACEDB and ACEDB has supplied several GFF sources in reply.
This has some implications for the request protocol in ZMap - ACEDB can accept requests for GFF sources directly or the columns that they are to map to. Some OTF requests for data involve GFF sourcfes that are not mentioned in the ZMap config but instead relate to the source_2_featureset list that is returend from ACEDB. pipeServers can only accept GFF sources as request fodder, which is OK as that is the format supplied by Otterlace, and all these can be configured without too much trouble.
It is important that all these mapping lists are merged and we have to consider the ZMap config file ACEDB and DAS.
The above deals with finding where to request a GFF source from, the converse is where to display GFF data when it comes back (ie what column).
zMapWindowCreateSetColumns() and set_name_create_set_columns() in zmapWindowDrawFeatures.c both call produce_column() and appears to assume that the featureset is a display column rather than a GFF source.
Instead, looking at zmapGFF2parser.c we can see that the makeNewFeature() function does the reverse source to featureset and source to style mapping: we simply have to pass this data to the pipeServer and features should end up in the correct display columns. Even with features being supplied by different servers for the same column ZMap will merge in the new context to the old, so there should be no problem.
The ZMapView structure will be modified to contain the mappings above, and the ...(continued p95). As the mappings operate as quark to quark the config stansas may be read in an any order.
zmapWindowContainerFeatureSet.c contains code that handles all the features and styles in a column, and there are functions to extract style information from a style table associated with the column. This works by finding the first match for a style parameter in the complete list of styles. If we add a new style to match the column name (if this is not already done) then this process should scontinue to work as is.
When creating a column (eg in produce_column()) we will try to find a style of the same name (or as mapped in [column_style]) and add this to the list first. If not found then we will carry onh as normal, and the source styles will be searched for column specific parameters if needed.