Skip to content
Snippets Groups Projects
Commit 966c413d authored by mh17's avatar mh17
Browse files

describes how to use the pipeserver module
parent 33aaf2f3
No related branches found
No related tags found
No related merge requests found
Scripting interface and configuration for the ZMap pipeServer module
mgh 27 Nov 2009
The pipeServer module is based on the existing fileServer and is configured using a [source] stanza in the ZMap config file for each pipe. Scripts are to be installed locally with ZMap and the directory itself will be identified in the [ZMap] stanza in ~/ZMap/ZMap.
eg:
[ZMap]
script-dir=/nfs/users/nfs_m/mh17/ZMap/scripts
Of course these could be stored centrally somewhere if desired.
Each data source must be referenced in [ZMap] by listing the source stanzas like this:
sources = acedb ; other-feed ; yet_another_one
Columns are displayed in the order given: in the example abpve all the acedb features first then the 'other_feed' etc.
When configured, ZMap will request data from each source in parallel, hopefully speeding things up a lot. Each script will obtain and send the data 'somehow'. They will replace the existing mechanism of Otterlace retreiving the data sequentially and adding to ACEDB on startup.
In each source stanza (one must exist for each data source) the syntax is the same as for existing file:// and acedb:// sources, but specifically for pipe:// sources we interpret the configuration as follows:
URL's take the form
<scheme>://[user][:password]@<host>[:port]/[url-path][;typecode][?query][#fragment]
and:
<scheme> will be 'pipe'
user:password@host are not used and if present are ignored
port is not used and if present will be ignored
url-path is the path of the script.
Note that according to http://rfc.net/rfc1738.html a single leading '/' signifies a relative path and two signifies absolute. We will interpret relative paths as relative to the ZMap scripts directory.
typecode is not used and will be ignored if present
query will be expanded into a normal argv vector
fragment is not used and will be ignored
Typically we expect a pipe:// data source to have only one (or very few) feature sets, as a major design aim is to exploit concurrent operation. Other configuration parameters will operate as normal (eg 'sequence=true' (which can only appear in one source) and 'navigator_sets=xxx,yyyy').
Here is an example for a test script that simple outputs an existing GFF file.
[source]
url = pipe://getgff?file=b0250_curated.gff
featuresets = curated_features ; curated ; genomic_canonical
styles = curated_features ; curated ; genomic_canonical
stylesfile = /nfs/users/nfs_m/mh17/zmap/styles/ZMap.b0250.file.styles
A more realistic one with an absolute path... (but needs featuresets and styles and stylesfile specifying)
[source]
url=pipe:///software/anacode/bin/get_genes?dataset=human&name=1&analysis=ccds_gene&end=161655109&csver=Otter&cs=chromosome&type=chr1-14&metakey=ens_livemirror_ccds_db&start=161542637&featuresets=CCDS:Coding;CCDS:Transcript
Script operation: some rules
A script may obtain data in any way it likes but must output valid GFF data and nothing else on STDOUT (but anything is valid in a comment).
Brief error messages may be output to STDERR and these will be appended to the zmap log. STDERR output is intended only to alert users of some failure (eg 'warning not all data found' or 'cannot connect to database') and not as a detailed log of script activity - if this is needed then the script should maintain its own log file. A warning message will be presented to the user, consisting of the last line in STDERR and hopefully this will be enough to explain the situation with resorting to log files.
Regardless of whether an error message is sent ZMap will attempt to use the GFF data provided.
ZMap will probably read STDERR after STDOUT is closed, and only if some error is encountered.
Arguments will be given in the format key=value with no preceeding dashes, these will be as extracted from the server query string. (If people care about this we could change it...)
Extra arguments may be added subject to implementation:
zmap_start=zmap_start_coord // in zmap coordinates not bases
zmap_end=zmap_end_coord
wait=9 // delay some seconds before sending data (can be given in the query string, main use is for testing)
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment