Skip to content
Snippets Groups Projects
Commit 3172e1fd authored by Matthieu Muffato's avatar Matthieu Muffato
Browse files

Added scripts documentation

parent 24a3a417
No related branches found
No related tags found
No related merge requests found
......@@ -30,6 +30,9 @@ pre, code, kbd { font-family: Courier New, Courier, monospace; }
div, span, p, li, dd, dt, pre, code, kbd { font-size: 1em; }
th, td { font-size: 0.8em; }
div.tree { line-height: 1.1em }
div.tree span.tree {font-family: Times New Roman; Courier New, Courier, monospace; }
body.ie6 th, body.ie6 td { font-size: 13px; }
/*----------------------------------------------------------------------
......
......@@ -17,6 +17,21 @@ The name "Hive" comes from the way pipelines are processed by a swarm
<li>Introduction to eHive: <a href="presentations/HiveWorkshop_Sept2013/">Sept. 2013 workshop</a> (parts <a href="presentations/HiveWorkshop_Sept2013/Slides_part1.pdf">1</a>, <a href="presentations/HiveWorkshop_Sept2013/Slides_part2.pdf">2</a> and <a href="presentations/HiveWorkshop_Sept2013/Slides_part3.pdf">3</a> in PDF)</li>
<li><a href="install.html">Dependencies, installation and setup</a></li>
<li><a href="hive_schema.html">Database schema</a></li>
<li class="tree">eHive scripts<br>
<div class="tree">
<span class="tree"></span><em>Execution</em><br>
<span class="tree">├── </span><a href="scripts/init_pipeline.html">init_pipeline</a><br>
<span class="tree">├── </span><a href="scripts/seed_pipeline.html">seed_pipeline</a><br>
<span class="tree">├── </span><a href="scripts/beekeeper.html">beekeeper</a><br>
<span class="tree"></span><em>Debugging</em><br>
<span class="tree">├── </span><a href="scripts/runWorker.html">runWorker</a><br>
<span class="tree">├── </span><a href="scripts/db_cmd.html">db_cmd</a><br>
<span class="tree">├── </span><a href="scripts/standaloneJob.html">standaloneJob</a><br>
<span class="tree">├── </span><a href="scripts/hoover_pipeline.html">hoover_pipeline</a><br>
<span class="tree"></span><em>Reporting</em><br>
<span class="tree">├── </span><a href="scripts/generate_graph.html">generate_graph</a><br>
<span class="tree">├── </span><a href="scripts/lsf_report.html">lsf_report</a><br>
<span class="tree">└── </span><a href="scripts/generate_timeline.html">generate_timeline</a><br>
</div>
</li>
......
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>beekeeper.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> beekeeper.pl</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> The Beekeeper is in charge of interfacing between the Queen and a compute resource or &#39;compute farm&#39;.
Its job is to initialize/sync the eHive database (via the Queen), query the Queen if it needs any workers
and to send the requested number of workers to open machines via the runWorker.pl script.
It is also responsible for interfacing with the Queen to identify workers which died
unexpectedly so that she can free the dead workers and reclaim unfinished jobs.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # Usually run after the pipeline has been created to calculate the internal statistics necessary for eHive functioning
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname -sync
# Do not run any additional Workers, just check for the current status of the pipeline:
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname
# Run the pipeline in automatic mode (-loop), run all the workers locally (-meadow_type LOCAL) and allow for 3 parallel workers (-total_running_workers_max 3)
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -meadow_type LOCAL -total_running_workers_max 3 -loop
# Run in automatic mode, but only restrict to running the &#39;fast_blast&#39; analysis
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -logic_name fast_blast -loop
# Restrict the normal execution to one iteration only - can be used for testing a newly set up pipeline
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -run
# Reset failed &#39;buggy_analysis&#39; jobs to &#39;READY&#39; state, so that they can be run again
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -reset_failed_jobs_for_analysis buggy_analysis
# Do a cleanup: find and bury dead workers, reclaim their jobs
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -dead</code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<h2 id="Connection-parameters">Connection parameters</h2>
<pre><code> -reg_conf &lt;path&gt; : path to a Registry configuration file
-reg_type &lt;string&gt; : type of the registry entry (&#39;hive&#39;, &#39;core&#39;, &#39;compara&#39;, etc - defaults to &#39;hive&#39;)
-reg_alias &lt;string&gt; : species/alias name for the Hive DBAdaptor
-url &lt;url string&gt; : url defining where hive database is located</code></pre>
<h2 id="Looping-control">Looping control</h2>
<pre><code> -loop : run autonomously, loops and sleeps
-max_loops &lt;num&gt; : perform max this # of loops in autonomous mode
-keep_alive : do not stop when there are no more jobs to do - carry on looping
-job_id &lt;job_id&gt; : run 1 iteration for this job_id
-run : run 1 iteration of automation loop
-sleep &lt;num&gt; : when looping, sleep &lt;num&gt; minutes (default 2min)</code></pre>
<h2 id="Current-Meadow-control">Current Meadow control</h2>
<pre><code> -meadow_type &lt;string&gt; : the desired Meadow class name, such as &#39;LSF&#39; or &#39;LOCAL&#39;
-total_running_workers_max &lt;num&gt; : max # workers to be running in parallel
-submit_workers_max &lt;num&gt; : max # workers to create per loop iteration
-submission_options &lt;string&gt; : passes &lt;string&gt; to the Meadow submission command as &lt;options&gt; (formerly lsf_options)
-submit_log_dir &lt;dir&gt; : record submission output+error streams into files under the given directory (to see why some workers fail after submission)</code></pre>
<h2 id="Worker-control">Worker control</h2>
<pre><code> -job_limit &lt;num&gt; : #jobs to run before worker can die naturally
-life_span &lt;num&gt; : life_span limit for each worker
-logic_name &lt;string&gt; : restrict the pipeline stat/runs to this analysis logic_name
-retry_throwing_jobs 0|1 : if a job dies *knowingly*, should we retry it by default?
-can_respecialize &lt;0|1&gt; : allow workers to re-specialize into another analysis (within resource_class) after their previous analysis was exhausted
-hive_log_dir &lt;path&gt; : directory where stdout/stderr of the hive is redirected
-debug &lt;debug_level&gt; : set debug level of the workers</code></pre>
<h2 id="Other-commands-options">Other commands/options</h2>
<pre><code> -help : print this help
-versions : report both Hive code version and Hive database schema version
-dead : detect all unaccounted dead workers and reset their jobs for resubmission
-alldead : tell the database all workers are dead (no checks are performed in this mode, so be very careful!)
-balance_semaphores : set all semaphore_counts to the numbers of unDONE fan jobs (emergency use only)
-no_analysis_stats : don&#39;t show status of each analysis
-worker_stats : show status of each running worker
-failed_jobs : show all failed jobs
-reset_job_id &lt;num&gt; : reset a job back to READY so it can be rerun
-reset_failed_jobs_for_analysis &lt;logic_name&gt;
: reset FAILED jobs of an analysis back to READY so they can be rerun
-reset_all_jobs_for_analysis &lt;logic_name&gt;
: reset ALL jobs of an analysis back to READY so they can be rerun</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>db_cmd.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> db_cmd.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> db_cmd.pl {-url &lt;url&gt; | [-reg_conf &lt;reg_conf&gt;] -reg_alias &lt;reg_alias&gt; [-reg_type &lt;reg_type&gt;] } [ -sql &lt;sql_command&gt; ] [ -extra &lt;extra_params&gt; ] [ -to_params | -verbose ]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> db_cmd.pl is a generic script that connects you interactively to your database using either URL or Registry and optionally runs an SQL command.
-url is exclusive to -reg_alias. -reg_type is only needed if several databases map to that alias / species.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> db_cmd.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/&quot; -sql &#39;CREATE DATABASE lg4_long_mult&#39;
db_cmd.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot;
db_cmd.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot; -sql &#39;SELECT * FROM analysis_base&#39; -extra=&#39;--html&#39;
eval mysqldump -t `db_cmd.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot; -to_params` worker
db_cmd.pl -reg_conf ${ENSEMBL_CVS_ROOT_DIR}/ensembl-compara/scripts/pipeline/production_reg_conf.pl -reg_alias compara_master
db_cmd.pl -reg_conf ${ENSEMBL_CVS_ROOT_DIR}/ensembl-compara/scripts/pipeline/production_reg_conf.pl -reg_alias mus_musculus -reg_type core
db_cmd.pl -reg_conf ${ENSEMBL_CVS_ROOT_DIR}/ensembl-compara/scripts/pipeline/production_reg_conf.pl -reg_alias squirrel -reg_type core -sql &#39;SELECT * FROM coord_system&#39;</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>generate_graph.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> generate_graph.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> ./generate_graph.pl -url mysql://user:pass@server:port/dbname -output OUTPUT_LOC [-help]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> This program will generate a graphical representation of your hive pipeline.
This includes visalising the flow of data from the different analyses, blocking
rules &amp; table writers. The graph is also coloured to indicate the stage
an analysis is at. The colours &amp; fonts used can be configured via
hive_config.json configuration file.</code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<p><b>--url</b></p>
<pre><code> url defining where hive database is located</code></pre>
<p><b>--reg_conf</b></p>
<pre><code> path to a Registry configuration file</code></pre>
<p><b>--reg_alias</b></p>
<pre><code> species/alias name for the Hive DBAdaptor</code></pre>
<p><b>--output</b></p>
<pre><code> Location of the file to write to.
The file extension (.png , .jpeg , .dot , .gif , .ps) will define the output format.</code></pre>
<h1 id="EXTERNAL-DEPENDENCIES">EXTERNAL DEPENDENCIES</h1>
<pre><code> GraphViz</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>generate_timeline.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> generate_timeline.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> generate_timeline.pl {-url &lt;url&gt; | [-reg_conf &lt;reg_conf&gt;] -reg_alias &lt;reg_alias&gt; [-reg_type &lt;reg_type&gt;] }
[-start_date &lt;start_date&gt;] [-end_date &lt;end_date&gt;]
[-top &lt;float&gt;]
[-mode [workers | memory | cores | unused_memory | unused_cores | pending_workers]]
[-n_core &lt;int&gt;] [-mem &lt;int&gt;]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> This script is used for offline examination of the allocation of workers.
Based on the command-line parameters &#39;start_date&#39; and &#39;end_date&#39;, or on the start time of the first
worker and end time of the last worker (as recorded in pipeline DB), it pulls the relevant data out
of the &#39;worker&#39; table for accurate timing.
By default, the output is in CSV format, to allow extra analysis to be carried.
You can optionally ask the script to generate an image with Gnuplot.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # Just run it the usual way: only the top 20 analysis will be reported in CSV format
generate_timeline.pl -url mysql://username:secret@hostname:port/database &gt; timeline.csv
# The same, but getting the analysis that fill 99.5% of the global activity in a PNG file
generate_timeline.pl -url mysql://username:secret@hostname:port/database -top .995 -output timeline_top995.png
# Assuming you are only interested in a precise interval (in a PNG file)
generate_timeline.pl -url mysql://username:secret@hostname:port/database -start_date 2013-06-15T10:34 -end_date 2013-06-15T16:58 -output timeline_June15.png
# Get the required memory instead of the number of workers
generate_timeline.pl -url mysql://username:secret@hostname:port/database -mode memory -output timeline_memory.png</code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<pre><code> -help : print this help
-url &lt;url string&gt; : url defining where hive database is located
-reg_cong, -reg_type, -reg_alias : alternative connection details
-nosqlvc : Do not restrict the usage of this script to the current version of eHive
Be aware that generate_timeline.pl uses raw SQL queries that may break on different schema versions
-verbose : Print some info about the data loaded from the database
-start_date &lt;date&gt; : minimal start date of a worker (the format is ISO8601, e.g. &#39;2012-01-25T13:46&#39;)
-end_date &lt;date&gt; : maximal end date of a worker (the format is ISO8601, e.g. &#39;2012-01-25T13:46&#39;)
-top &lt;float&gt; : maximum number (&gt; 1) or fraction (&lt; 1) of analysis to report (default: 20)
-output &lt;string&gt; : output file: its extension must match one of the Gnuplot terminals. Otherwise, the CSV output is produced on stdout
-mode &lt;string&gt; : what should be displayed on the y-axis. Allowed values are &#39;workers&#39; (default), &#39;memory&#39;, &#39;cores&#39;, &#39;unused_memory&#39;, &#39;unused_cores&#39;, &#39;pending_workers&#39;
-n_core &lt;int&gt; : the default number of cores allocated to a worker (default: 1)
-mem &lt;int&gt; : the default memory allocated to a worker (default: 100Mb)</code></pre>
<h1 id="EXTERNAL-DEPENDENCIES">EXTERNAL DEPENDENCIES</h1>
<pre><code> Chart::Gnuplot</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>hoover_pipeline.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> hoover_pipeline.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> hoover_pipeline.pl {-url &lt;url&gt; | -reg_conf &lt;reg_conf&gt; -reg_alias &lt;reg_alias&gt;} [ { -before_datetime &lt;datetime&gt; | -days_ago &lt;days_ago&gt; } ]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> hoover_pipeline.pl is a script used to remove old &#39;DONE&#39; jobs from a continuously running pipeline database</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # delete all jobs that have been &#39;DONE&#39; for at least a week (default threshold) :
hoover_pipeline.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot;
# delete all jobs that have been &#39;DONE&#39; for at least a given number of days
hoover_pipeline.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot; -days_ago 3
# delete all jobs &#39;DONE&#39; before a specific datetime:
hoover_pipeline.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot; -before_datetime &quot;2013-02-14 15:42:50&quot;</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Author" content="Made by 'tree'">
<meta name="GENERATOR" content="$Version: $ tree v1.6.0 (c) 1996 - 2011 by Steve Baker, Thomas Moore, Francesc Rocher, Kyosuke Tokoro $">
<title>eHive 1.9 scripts</title>
<style type="text/css">
<!--
BODY { font-family : ariel, monospace, sans-serif; }
P { font-weight: normal; font-family : ariel, monospace, sans-serif; color: black; background-color: transparent;}
B { font-weight: normal; color: black; background-color: transparent;}
A:visited { font-weight : normal; text-decoration : none; background-color : transparent; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:link { font-weight : normal; text-decoration : none; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:hover { color : #000000; font-weight : normal; text-decoration : underline; background-color : yellow; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
A:active { color : #000000; font-weight: normal; background-color : transparent; margin : 0px 0px 0px 0px; padding : 0px 0px 0px 0px; display: inline; }
.VERSION { font-size: small; font-family : arial, sans-serif; }
.NORM { color: black; background-color: transparent;}
.FIFO { color: purple; background-color: transparent;}
.CHAR { color: yellow; background-color: transparent;}
.DIR { color: blue; background-color: transparent;}
.BLOCK { color: yellow; background-color: transparent;}
.LINK { color: aqua; background-color: transparent;}
.SOCK { color: fuchsia;background-color: transparent;}
.EXEC { color: green; background-color: transparent;}
-->
</style>
</head>
<body>
<h1>eHive 1.9 scripts</h1><p>
<a href=".">.</a><br>
├── <a href="./beekeeper.html">beekeeper.html</a><br>
├── <a href="./db_cmd.html">db_cmd.html</a><br>
├── <a href="./generate_graph.html">generate_graph.html</a><br>
├── <a href="./generate_timeline.html">generate_timeline.html</a><br>
├── <a href="./hoover_pipeline.html">hoover_pipeline.html</a><br>
├── <a href="./index.html">index.html</a><br>
├── <a href="./init_pipeline.html">init_pipeline.html</a><br>
├── <a href="./lsf_report.html">lsf_report.html</a><br>
├── <a href="./runWorker.html">runWorker.html</a><br>
├── <a href="./seed_pipeline.html">seed_pipeline.html</a><br>
└── <a href="./standaloneJob.html">standaloneJob.html</a><br>
<br><br>
</p>
<p>
0 directories, 11 files
<br><br>
</p>
<hr>
<p class="VERSION">
tree v1.6.0 © 1996 - 2011 by Steve Baker and Thomas Moore <br>
HTML output hacked and copyleft © 1998 by Francesc Rocher <br>
Charsets / OS/2 support © 2001 by Kyosuke Tokoro
</p>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>init_pipeline.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> init_pipeline.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> init_pipeline.pl &lt;config_module_or_filename&gt; [-help | [-analysis_topup | -job_topup] &lt;options_for_this_particular_pipeline&gt;]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> init_pipeline.pl is a generic script that is used to create+setup=initialize eHive pipelines from PipeConfig configuration modules.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # get this help message:
init_pipeline.pl
# initialize a generic eHive pipeline:
init_pipeline.pl Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf -password &lt;yourpassword&gt;
# see what command line options are available when initializing long multiplication example pipeline
# (assuming your current directory is ensembl-hive/modules/Bio/EnsEMBL/Hive) :
init_pipeline.pl PipeConfig/LongMult_conf -help
# initialize the long multiplicaton pipeline by supplying not only mandatory but also optional data:
# (assuming your current directory is ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig) :
init_pipeline.pl LongMult_conf -password &lt;yourpassword&gt; -first_mult 375857335 -second_mult 1111333355556666 </code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<pre><code> -help : Gets this help message and exits
-analysis_topup : A special initialization mode when (1) pipeline_create_commands are switched off and (2) only newly defined analyses are added to the database
This mode is only useful in the process of putting together a new pipeline.
-job_topup : Another special initialization mode when only jobs are created - no other structural changes to the pipeline are acted upon.
-hive_force_init : If set to 1, forces the (re)creation of the hive database even if a previous version of it is present in the server.</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>lsf_report.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> lsf_report.pl</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> This script is used for offline examination of resources used by a Hive pipeline running on LSF
(the script is [Pp]latform-dependent).
Based on the command-line parameters &#39;start_date&#39; and &#39;end_date&#39;, or on the start time of the first
worker and end time of the last worker (as recorded in pipeline DB), it pulls the relevant data out
of LSF&#39;s &#39;bacct&#39; database, parses it and stores in &#39;lsf_report&#39; table.
You can join this table to &#39;worker&#39; table USING(meadow_name,process_id) in the usual MySQL way
to filter by analysis_id, do various stats, etc.
You can optionally ask the script to dump the &#39;bacct&#39; database in a dump file,
or fill in the &#39;lsf_report&#39; table from an existing dump file (most time is taken by querying bacct).
Please note the script may additionally pull information about LSF processes that you ran simultaneously
with running the pipeline. It is easy to ignore them by joining into &#39;worker&#39; table.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # Just run it the usual way: query &#39;bacct&#39; and load the relevant data into &#39;lsf_report&#39; table:
lsf_report.pl -url mysql://username:secret@hostname:port/long_mult_test
# The same, but assuming LSF user someone_else ran the pipeline:
lsf_report.pl -url mysql://username:secret@hostname:port/long_mult_test -lsf_user someone_else
# Assuming the dump file existed. Load the dumped bacct data into &#39;lsf_report&#39; table:
lsf_report.pl -url mysql://username:secret@hostname:port/long_mult_test -dump long_mult.bacct
# Assuming the dump file did not exist. Query &#39;bacct&#39;, dump the data into a file and load it into &#39;lsf_report&#39;:
lsf_report.pl -url mysql://username:secret@hostname:port/long_mult_test -dump long_mult_again.bacct</code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<pre><code> -help : print this help
-url &lt;url string&gt; : url defining where hive database is located
-dump &lt;filename&gt; : a filename for bacct dump. It will be read from if the file exists, and written to otherwise.
-lsf_user &lt;username&gt; : if it wasn&#39;t you who ran the pipeline, LSF user name of that user can be provided
-start_date &lt;date&gt; : minimal start date of a job (the format is &#39;2012/01/25/13:46&#39;)
-end_date &lt;date&gt; : maximal end date of a job (the format is &#39;2012/01/25/13:46&#39;)</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>runWorker.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> runWorker.pl</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> runWorker.pl is an eHive component script that does the work of a single Worker -
specializes in one of the analyses and starts executing jobs of that analysis one-by-one or batch-by-batch.
Most of the functionality of the eHive is accessible via beekeeper.pl script,
but feel free to run the runWorker.pl if you think you know what you are doing :)</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # Run one local worker process in ehive_dbname and let the system pick up the analysis
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname
# Run one local worker process in ehive_dbname and let the system pick up the analysis from the given resource_class
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -rc_name low_mem
# Run one local worker process in ehive_dbname and specify the logic_name
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -logic_name fast_blast
# Run a specific job in a local worker process:
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -job_id 123456</code></pre>
<h1 id="OPTIONS">OPTIONS</h1>
<h2 id="Connection-parameters">Connection parameters:</h2>
<pre><code> -reg_conf &lt;path&gt; : path to a Registry configuration file
-reg_alias &lt;string&gt; : species/alias name for the Hive DBAdaptor
-url &lt;url string&gt; : url defining where database is located</code></pre>
<h2 id="Task-specificaton-parameters">Task specificaton parameters:</h2>
<pre><code> -rc_id &lt;id&gt; : resource class id
-rc_name &lt;string&gt; : resource class name
-analysis_id &lt;id&gt; : pre-specify this worker in a particular analysis defined by database id
-logic_name &lt;string&gt; : pre-specify this worker in a particular analysis defined by name
-job_id &lt;id&gt; : run a specific job defined by its database id
-force 0|1 : set to 1 if you want to force running a Worker over a BLOCKED analysis or to run a specific DONE/SEMAPHORED job_id</code></pre>
<h2 id="Worker-control-parameters">Worker control parameters:</h2>
<pre><code> -job_limit &lt;num&gt; : #jobs to run before worker can die naturally
-life_span &lt;num&gt; : number of minutes this worker is allowed to run
-no_cleanup : don&#39;t perform temp directory cleanup when worker exits
-no_write : don&#39;t write_output or auto_dataflow input_job
-hive_log_dir &lt;path&gt; : directory where stdout/stderr of the whole hive of workers is redirected
-worker_log_dir &lt;path&gt; : directory where stdout/stderr of this particular worker is redirected
-retry_throwing_jobs &lt;0|1&gt; : if a job dies *knowingly*, should we retry it by default?
-can_respecialize &lt;0|1&gt; : allow this worker to re-specialize into another analysis (within resource_class) after it has exhausted all jobs of the current one</code></pre>
<h2 id="Other-options">Other options:</h2>
<pre><code> -help : print this help
-versions : report both Hive code version and Hive database schema version
-debug &lt;level&gt; : turn on debug messages at &lt;level&gt;
-analysis_stats : show status of each analysis in hive</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>seed_pipeline.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> seed_pipeline.pl</code></pre>
<h1 id="SYNOPSIS">SYNOPSIS</h1>
<pre><code> seed_pipeline.pl {-url &lt;url&gt; | -reg_conf &lt;reg_conf&gt; [-reg_type &lt;reg_type&gt;] -reg_alias &lt;reg_alias&gt;} [ {-analysis_id &lt;analysis_id&gt; | -logic_name &lt;logic_name&gt;} [ -input_id &lt;input_id&gt; ] ]</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> seed_pipeline.pl is a generic script that is used to create {initial or top-up} jobs for hive pipelines</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # find out which analyses may need seeding (with an example input_id):
seed_pipeline.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot;
# seed one job into the &quot;start&quot; analysis:
seed_pipeline.pl -url &quot;mysql://ensadmin:${ENSADMIN_PSW}@localhost:3306/lg4_long_mult&quot; \
-logic_name start -input_id &#39;{&quot;a_multiplier&quot; =&gt; 2222222222, &quot;b_multiplier&quot; =&gt; 3434343434}&#39;</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>standaloneJob.pl</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body style="background-color: white">
<h1 id="NAME">NAME</h1>
<pre><code> standaloneJob.pl</code></pre>
<h1 id="DESCRIPTION">DESCRIPTION</h1>
<pre><code> standaloneJob.pl is an eHive component script that
1. takes in a RunnableDB module,
2. creates a standalone job outside an eHive database by initializing parameters from command line arguments (ARRAY- and HASH- arguments can be passed+parsed too!)
3. and runs that job outside the database.
4. can optionally dataflow into tables fully defined by URLs
Naturally, only certain RunnableDB modules can be run using this script, and some database-related functionality will be lost.</code></pre>
<h1 id="USAGE-EXAMPLES">USAGE EXAMPLES</h1>
<pre><code> # Run a job with default parameters, specify module by its package name:
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::FailureTest
# Run the same job with default parameters, but specify module by its relative filename:
standaloneJob.pl RunnableDB/FailureTest.pm
# Run a job and re-define some of the default parameters:
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::FailureTest -time_RUN=2 -time_WRITE_OUTPUT=3 -state=WRITE_OUTPUT -value=2
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::SystemCmd -cmd &#39;ls -l&#39;
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::SystemCmd -input_id &quot;{ &#39;cmd&#39; =&gt; &#39;ls -l&#39; }&quot;
# Run a job and re-define its &#39;db_conn&#39; parameter to allow it to perform some database-related operations:
standaloneJob.pl RunnableDB/SqlCmd.pm -db_conn mysql://ensadmin:xxxxxxx@127.0.0.1:2912/lg4_compara_families_63 -sql &#39;INSERT INTO meta (meta_key,meta_value) VALUES (&quot;hello&quot;, &quot;world2&quot;)&#39;
# Run a job with given parameters, but skip the write_output() step:
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::FailureTest -no_write -time_RUN=2 -time_WRITE_OUTPUT=3 -state=WRITE_OUTPUT -value=2
# Run a job and re-direct its dataflow into tables:
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::JobFactory -inputfile foo.txt -delimiter &#39;\t&#39; -column_names &quot;[ &#39;name&#39;, &#39;age&#39; ]&quot; \
-flow_into &quot;{ 2 =&gt; [&#39;mysql://ensadmin:xxxxxxx@127.0.0.1:2914/lg4_triggers/foo&#39;, &#39;mysql://ensadmin:xxxxxxx@127.0.0.1:2914/lg4_triggers/bar&#39;] }&quot;
# Run a Compara job that needs a connection to Compara database:
standaloneJob.pl Bio::EnsEMBL::Compara::RunnableDB::ObjectFactory -compara_db &#39;mysql://ensadmin:xxxxxxx@127.0.0.1:2911/sf5_ensembl_compara_master&#39; \
-adaptor_name MethodLinkSpeciesSetAdaptor -adaptor_method fetch_all_by_method_link_type -method_param_list &quot;[ &#39;ENSEMBL_ORTHOLOGUES&#39; ]&quot; \
-column_names2getters &quot;{ &#39;name&#39; =&gt; &#39;name&#39;, &#39;mlss_id&#39; =&gt; &#39;dbID&#39; }&quot; -flow_into &quot;{ 2 =&gt; &#39;mysql://ensadmin:xxxxxxx@127.0.0.1:2914/lg4_triggers/baz&#39; }&quot;
# Create a new job in a database using automatic dataflow from a database-less Dummy job:
standaloneJob.pl Bio::EnsEMBL::Hive::RunnableDB::Dummy -a_multiplier 1234567 -b_multiplier 9876543 \
-flow_into &quot;{ 1 =&gt; &#39;mysql://ensadmin:xxxxxxx@127.0.0.1/lg4_long_mult/analysis?logic_name=start&#39; }&quot;</code></pre>
<h1 id="SCRIPT-SPECIFIC-OPTIONS">SCRIPT-SPECIFIC OPTIONS</h1>
<pre><code> -help : print this help
-debug &lt;level&gt; : turn on debug messages at &lt;level&gt;
-no_write : skip the execution of write_output() step this time
-reg_conf &lt;path&gt; : load registry entries from the given file (these entries may be needed by the RunnableDB itself)
-input_id &quot;&lt;hash&gt;&quot; : specify the whole input_id parameter in one stringified hash
-flow_out &quot;&lt;hash&gt;&quot; : defines the dataflow re-direction rules in a format similar to PipeConfig&#39;s - see the last example
NB: all other options will be passed to the runnable (leading dashes removed) and will constitute the parameters for the job.</code></pre>
<h1 id="LICENSE">LICENSE</h1>
<pre><code> Copyright [1999-2014] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.</code></pre>
<h1 id="CONTACT">CONTACT</h1>
<pre><code> Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates</code></pre>
</body>
</html>
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment