Skip to content
Snippets Groups Projects
Commit ad244a3d authored by mh17's avatar mh17
Browse files

more words

parent 45a1427d
No related branches found
No related tags found
No related merge requests found
<!-- $Id: featureset_col.html,v 1.8 2010-05-11 07:43:08 mh17 Exp $ -->
<!-- $Id: featureset_col.html,v 1.9 2010-05-18 09:46:14 mh17 Exp $ -->
<h2>Columns Featuresets and Data Sources</h2>
<fieldset><legend>Definitions</legend>
<p><a href="Design_notes/modules/zmapFeature.shtml#terminology">zmapFeature.shtml</a> has some notes about the words used for various data items in the ZMap code (which cannot be changed without a lot of work) and this section aims to define terms to be used to describe things that can be configured by the user/ otterlace or presented to the user/ annotators, or used in an external interface eg data servers and X-remote. This may seem pedantic, but there is some confusion caused by the re-use of words to refer to different objects.</p>
......@@ -76,22 +76,24 @@ columns = EST_human ; Repeats ; etc
This looks very similar to the existing featuresets option in the Server stanzas, but in this case does not refer directly to any featureset; it is a list of display columns and nothing more.
</p>
<p>Another stanza <b>[columns]</b> will allow the user to specify what featuresets to fill a column with, and this allows data from more than one source to be displayed in the same column.
By default (if not specifed in the <b>[columns]</b> stanza) each column will be assigned data from the featureset of the same name.
</p>
<pre>
[columns]
Repeats = repeatmasker_line ; repeatmasker_sine
mRNA = vertebrate_mRNA ; polyA_site ; polyA_signal
</pre>
</p>
<p>By default (if not specifed in the <b>[columns]</b> stanza) each column will be assigned data from the featureset of the same name.
<p>If no column list is defined in the [ZMap] stanza then each server's list of featuresets will be used to define the columns to display in order of servers and thier featuresets.
</p>
<p>If no column list is defined in the [ZMap] stanza then if a [columns] stanza exist that will be used to specify the column list in order, and if not then each server's list of featuresets will be used in the traditional manner.
<h4>Display styles</h4>
<p>Again this is logically at a different level from the raw data. Traditionally a feature's style has been linked to the feature type, but ACEDB does provide a mapping to arbitrary display styles as a separate database query. It seems logical to provide a similar function in ZMap configuration, but to continue support of existing ACEDB function we need to make this optional for ACEDB.
We can default a style to be the same name as the featureset without breaking modularity.
</p>
<p>To add in the ability to specify an arbitary style for a featureset we add another stanza for this mapping, and also another one for the column:
<p>To add in the ability to specify an arbitary style for a featureset we add another stanza for this mapping, and also another one for the column (column specific parameters can optionally be defined seperately from the individual featuresets):
<pre>
[featureset_style]
vertebrate_mRNA = vertRNA
......@@ -113,51 +115,57 @@ vertebrate_mRNA = vertRNA_col
<p>From a users point of view they wish to request extra features at various times after ZMap startup and we need to define how this process works. The startup situation can be treated in an identical manner (except for the user interaction of course).
</p>
<p>
From otterlace they request a column (which implies one or more featuresets), from ZMap they also request a column (for deferred styles). When these requests are given to ZMap then they may consist of a list of featuresets. As columns may include data from several sources there can be no direct mapping from column to source; we already have column-featureset defined, in which case the obvious thing is to define a featureset to (GFF) source mapping. This is strictly a 1-1 mapping and will default to the same name.
From otterlace they request a column (which implies one or more featuresets), from ZMap they also request a column (for deferred styles). When these requests are given to ZMap then they may consist of a list of featuresets. The server's 'featuresets' parameter is used to define where to request this from.
</p>
<p>
In the GFF data stream each features is identified by a source name and this maps 1-1 to the featureset as defined in ZMap's configuration. There may be a need to provide a further mapping of GFF-source-name to featureset-name but at present this is not implemented.
</p>
</fieldset>
<fieldset><legend>Configuration format summarised</legend>
<p>
All display columns will be define thus in terms of featuresets:
All display columns will be defined in terms of featuresets (and default to a 1-1 mapping):
<pre>
[columns]
Repeats = repeatmasker_line ; repeatmasker_sine
mRNA = vertebrate_mRNA ; polyA_site ; polyA_signal
</pre>
</p>
<p>Each featureset may be given a style:
<pre>
[featureset-styles]
[featureset-style]
EST_rat = EST_rat_style
</pre>
Each featureset may be mapped to a (case sensitive) GFF source name:
<pre>
[GFF-source]
vertebrate_mRNA = vertrna
</pre>
</p>
<p>Column styles may be defined as follows:
<pre>
[column-styles]
[column-style]
EST_rat = align_col_style
</pre>
</p>
</p>
Each featureset may be mapped to a (case sensitive) GFF source name. This is the name that is displayed in the right hand status box in ZMap and it defaults to the featureset name. If desired this can be set for any of the featuresets defined as featureset name = GFF source name. Note that this will not affect how GFF files are parsed, it does not map GFF-source-name to featureset-name.
<pre>
[featureset-source]
vertebrate_mRNA = vertrna
</pre>
</p>
<p>
For GFF sources we also need a description text (which was supplied by ACEDB) and corresponds to the description in the load column dialog in Otterlace. This will appear in yet another stanza:
<pre>
[GFF-description]
vertebrate_mRNA = Vertebrate messenger RNA
[featureset-description]
vertebrate_mRNA = Vertebrate Messenger RNA
</pre>
Note that the source key here is the featureset name, not the internal name defined in the [GFF_source] stanza.
</p>
<p>Each column may also be given a description:
<pre>
[Column-description]
[column-description]
vertebrate_mRNA = Vertebrate messenger RNA
</pre>
</p>
......@@ -178,13 +186,13 @@ The spec for Glib's key file functions is quite explicit in saying thay keys may
<p>The alternative stanza format above will be used, as it is probably easier to read.
</p>
<h3>Legacy issues with GFF and ACEDB etc</h3>
<p>In the ZMap code there are a few mappings, originally derived from ACEDB and we need to used these (from ACEDB) and provide similar data for pipeServers. Or rather that we need to implement the above configuration data in ZMap and either patch the ACEDB data into the same data structures or replace it with our own. <b>ZMapView</b> holds the following mappings:
<p>In the ZMap code there are a few mappings, originally derived from ACEDB and we need to use these (from ACEDB) and provide similar data for pipeServers. Or rather that we need to implement the above configuration data in ZMap and either patch the ACEDB data into the same data structures or replace it with our own. <b>ZMapView</b> holds the following mappings:
<ul>
<li> <b>source_2_featureset</b> GFF source id (quark) to ZMapGFFSet, which contains a feature_set_id, which is really a display column
<li> <b>source_2_sourcedata</b> GFF source id (quark) to ZMapGFFSource, which contains the source id (duplicated) an the id of the style (quark) to use to display it.
<li> <b>featureset_2_stylelist</b> feature set id (display column quark) to Glist of style id's. This will be all the styles needed by all the GFF sources in the column.
<li> <b>source_2_sourcedata</b> GFF source id (quark) to ZMapGFFSource, which contains the source id (duplicated) and the id of the style (quark) to use to display it.
<li> <b>featureset_2_stylelist</b> feature set id (display column quark) to GList of style id's. This will be all the styles needed by all the GFF sources in the column.
</ul>
and <b>ZMapWindow</b> has feature_set_names (columns in display order, which turns out to be the featuresets as defined in ZMap [source] stanza config) and featureset_2_styles.
and <b>ZMapWindow</b> has feature_set_names (columns in display order, which turns out to be the featuresets as defined in ZMap [source] stanzas config) and featureset_2_styles.
</p>
<p><b>ZMapFeatureContext</b> has feature_set_names which is the 'requested featuresets' in the context. This derives from the list of featuresets specified in ZMap [server] config stanzas.
</p>
......@@ -192,15 +200,15 @@ and <b>ZMapWindow</b> has feature_set_names (columns in display order, which tur
<h3>Requesting data</h3>
<p>In <b>zmapView.c/zMapViewLoadFeatures()</b> (but not <b>zmapView.c/zMapViewConnect()</b>) when geven a request for a GFF source the code tried to map this to a featureset (ie display column) and then tries to find the featureset in any of the configured servers. So, traditionally ZMap has requested display columns aka featuresets from ACEDB and ACEDB has supplied several GFF sources in reply.</p>
<p>This has some implications for the request protocol in ZMap - ACEDB can accept requests for GFF sources directly or the columns that they are to map to. Some OTF requests for data involve GFF sourcfes that are not mentioned in the ZMap config but instead relate to the source_2_featureset list that is returend from ACEDB. pipeServers can only accept GFF sources as request fodder, which is OK as that is the format supplied by Otterlace, and all these can be configured without too much trouble.</p>
<p>This has some implications for the request protocol in ZMap - ACEDB can accept requests for GFF sources directly or the columns that they are to map to. Some OTF requests for data involve GFF sourcfes that are not mentioned in the ZMap config but instead relate to the source_2_featureset list that is returned from ACEDB. pipeServers can only accept GFF sources as request fodder, which is OK as that is the format supplied by Otterlace, and all these can be configured without too much trouble.</p>
<p>It is important that all these mapping lists are merged and we have to consider the ZMap config file ACEDB and DAS.
</p>
<h3>Displaying data received from a server</h3>
<p>The above deals with finding where to request a GFF source from, the converse is where to display GFF data when it comes back (ie what column).
<p>The above deals with finding where to request a GFF source from, the converse is where to display GFF data when it comes back from a server (ie what column to put it in).
</p>
<p><b>zMapWindowCreateSetColumns()</b> and <b>set_name_create_set_columns()</b> in <b>zmapWindowDrawFeatures.c</b> both call <b>produce_column()</b> and appears to assume that the featureset is a display column rather than a GFF source. The columns are created in <b>zmapWindowDrawFeatures.c/windowDrawContextCB()</b> which is an execute function - look at the BLOCK level. The window data (<b>window->feature_set_names</b>) holds the list of columns.
<p><b>zMapWindowCreateSetColumns()</b> and <b>set_name_create_set_columns()</b> in <b>zmapWindowDrawFeatures.c</b> both call <b>produce_column()</b> and appears to assume that the featureset is a display column rather than a GFF source. The columns are created in <b>zmapWindowDrawFeatures.c/windowDrawContextCB()</b> which is an execute function - look at the end of the BLOCK level. The window data (<b>window->feature_set_names</b>) holds the list of columns.
</p>
<p>Instead, looking at <b>zmapGFF2parser.c</b> we can see that the <b>makeNewFeature()</b> function does the reverse source to featureset and source to style mapping: we simply have to pass this data to the pipeServer and features should end up in the correct display columns. Even with features being supplied by different servers for the same column ZMap will merge in the new context to the old, so there should be no problem.
......@@ -212,10 +220,8 @@ As the mappings operate as quark to quark the config stansas may be read in an a
</p>
<h3>Extracting styles from columns data</h3>
<p><b>zmapWindowContainerFeatureSet.c</b> contains code that handles all the features and styles in a column, and there are functions to extract style information from a style table associated with the column. This works by finding the first match for a style parameter in the complete list of styles. If we add a new style to match the column name (if this is not already done) then this process should scontinue to work as is.
</p>
<p>When creating a column (eg in <b>produce_column()</b>) we will try to find a style of the same name (or as mapped in [column_style]) and add this to the list first. If not found then we will carry on as normal, and the source styles will be searched for column specific parameters if needed.
</p>
<p><b>zmapWindowContainerFeatureSet.c</b> contains code that handles all the features and styles in a column, and there are functions to extract style information from a style table associated with the column. This works by finding the first match for a style parameter in the complete list of styles. If we add a new style to match the column name (if this is not already done) then this process should continue to work as is.
<h3>Where to store column styles?</h3>
<p><b>zmapWIndowUtils.c/zmapWindowFeatureSetStyles()</b> reveals that if we just add a column's style to the view->featureset_2_style list that will be enough - this data is copied to the window at some point and then used to find all the styles necessary.
<p>
......
<!-- $Id: glyph_style.html,v 1.6 2010-04-20 12:18:46 mh17 Exp $ -->
<!-- $Id: glyph_style.html,v 1.7 2010-05-18 09:46:14 mh17 Exp $ -->
<h2>Style definitions for Glyphs</h2>
<fieldset><legend>Summarised</legend>
<p>
......@@ -65,10 +65,10 @@ glyph-alt-colours = turquoise
mode = glyph
width=30.0 # must be twice the width of the shapes
width=30.0 # make enough space for left and right pointing hooks
frame-mode=only-1 # is frame specific so will use the colours
show-reverse-strand=true
glyph-strand=flip-x # on other side of vertical line
show-reverse-strand=true # (see below)
glyph-strand=flip-x # on other side of origin
glyph-score-mode=width
min-score= -2.0 # as in previous ACEDB style
......@@ -76,11 +76,23 @@ max-score = 4.0
glyph-5 = dn-hook
glyph-3 = up-hook
colours = grey # for central vertical line if we draw it
frame0-colours = red
frame1-colours = green
frame2-colours = blue
colours = normal fill grey # for central vertical line if we draw it
frame0-colours = normal fill red; normal border red
frame1-colours = normal fill red; normal border green
frame2-colours = normal fill red; normal border blue
</pre>
<h4> Handling 3F-splice markers</h4>
<p>These are a little non standard. The data is for the forwards strand only and the min and max scores can be set in the GeneFinder application and typically are -2.0 to +4.0. These scores are the log probability of there being a splice site, so -ve values (hooks on the left) are less likely than random.
</p>
<p>To provide left and right handed glyphs we pretend that a -ve value is on the reverse strand and invert the score, and the config above ensures that the glyph appears as the appropriate mirror image.
</p>
<p>The style as defined above is what gets used if you specify in [ZMap] 'legacy_styles = true', and the glyphs are defined as:
<pre>
[glyphs]
dn-hook = <0,0; 15,0; 15,10>
up-hook = <0,0; 15,0; 15,-10>
</pre>
</p>
<h3>incomplete homology markers </h3>
<pre>
......@@ -315,10 +327,10 @@ Note that this option will only be set on ZMap startup and not on creation of a
<h4>other stuff...</h4>
<p><ul>
<li>zmapFeature.c #2831 addFeatureModeCB() appears to be bodging up some hard coded values. This function is about setting the style mode (why is it needed?) and should not be setting colours? The source of the style data (acedbServer.c in this case) should manage these defaults. Other functions zMapFeatureAnyAddModesToStyles() and zMapFeatureAnyForceModesToStyles() are implicated and it looks like a temporary fix for something has been perpetrated.
<li>zmapGFF2parser.c/makeNewFeature() set the feature type from the style, the opposite of what addFeatureModeCB() does.
<li>the style 'GF_splice' from acedb does not define a glyph type (this was hard coded in ZMap as above). Another function has been created to restore previous functionality a) for splice triangles and b) for homology markers - if glyphs are not configured then these are installed in the relevant styles on demand - the code will work using configurable glyphs so we can move forwards.
<li>zmapGFF2parser.c/makeNewFeature() sets the feature type from the style, the opposite of what addFeatureModeCB() does.
<li>the style 'GF_splice' from acedb does not define a glyph type (this was hard coded in ZMap as above). Another function has been created to restore previous functionality for homology markers - if glyphs are not configured then these are installed in the relevant styles on demand - the code will work using configurable glyphs so we can move forwards.
<b>Note that for 3-frame splice we operate on an explicit style with a unique-id of 'gf-splice' and this name must not be changed</b>.
<li>zmap_window_glyph_item_point() needs to be revisited esp re arcs
<li>zmapWindowDump.c creates PDFs from the window/ view. Glyph items are not drawn as foo canvas items so we have some special code added to bodge these up, which breaks the modularity as designed. We need to resolve this! Previous bodge-up code has been commented out (and won't print). One way would be to have feature mode glyphs as zmap canvas items which include foo shapes in the same way as normal features...
<li>zmapWindowDump.c creates PDFs from the window/ view. Glyph items are not drawn as foo canvas items so we have some special code added to bodge these up, which breaks the modularity as designed. We need to resolve this! One way would be to have feature mode glyphs as zmap canvas items which include foo shapes in the same way as normal features...
</ul></p>
</fieldset>
\ No newline at end of file
<!-- $Id: performance.html,v 1.2 2010-03-29 15:36:20 mh17 Exp $ -->
<!-- $Id: performance.html,v 1.3 2010-05-18 09:46:14 mh17 Exp $ -->
<h2>Performance: Making ZMap and Otterlace run faster</h2>
<fieldset><legend>Ideas for speeding things up</legend>
<p>
......@@ -11,7 +11,13 @@ This will cut network delays by half, but note that there is a memory problem to
</p>
</fieldset>
<fieldset><legend>Profiling ZMap</legend>
<p><b>gprof</b> is available and does all the obvious stuff. A new build directory can be created with the necessary gcc options (-pg) and run in parallel with existing builds.
<p><b>gprof</b> is available and does all the obvious stuff. A new build directory can be created with the necessary gcc options (-pg) and run in parallel with existing builds. To set this up it is necessary to checkout a new version of ZMap and edit <b>scripts/build_config.sh</b> to set USE_GPROF=yes. This is better than editing your development version as it does not risk forgetting to remove the option for a live build.
</p>
<p>The man page for gprof does not mention whether or not it copes with threaded programs.
</p>
<p>Initial experiments show confusing numbers: the basic flat format output gives foo_canvas etc dominatiion the figures, cumultaive totals give a function somewhere in the middle (processfeature()) with no mention of appMain().
</p>
<p>Click <a href="Design_notes/notes/profile.shtml">here</a> for some ideas on DIY profiling.
</p>
</fieldset>
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment