Commit 25d5645e authored by Matthieu Muffato's avatar Matthieu Muffato
Browse files

This is now in RTD

parent 79d2f3be
eHive installation, setup and usage.
1. Download and install the necessary external software:
1.1. Perl 5.10 or higher, since eHive code is written in Perl
see http://www.perl.com/download.csp
1.2. A database engine of your choice
eHive keeps its state in a database, so you will need
(1) a server installed on the machine where you want to maintain the state and
(2) clients installed on the machines where the jobs are to be executed.
At the moment, the following engines are supported:
1.2.1. MySQL 5.1 or higher
see http://dev.mysql.com/downloads/
WARNING: eHive is not compatible with MysQL 5.6.20 but is with versions 5.6.16 and 5.6.23. We suggest avoiding the 5.6.[17-22] interval
1.2.2. SQLite 3.6 or higher
see http://www.sqlite.org/download.html
1.2.3. PostgreSQL 9.2 or higher
see http://www.postgresql.org/download/
1.3. Perl DBI API version 1.6 or higher
Perl database interface that includes API to your desired database engine
see http://dbi.perl.org/
1.4. Perl libraries for visualisation (optional)
You can find them on CPAN:
- GraphViz (needed for generate_graph.pl)
- Chart::Gnuplot (needed for generate_timeline.pl)
2. Install the Hive repository from GitHub:
git clone https://github.com/Ensembl/ensembl-hive.git
This will create ensembl-hive directory with all the code and documentation.
You may find it convenient (although it is not necessary) to add "ensembl-hive/scripts"
to your $PATH variable to make it easier to run beekeeper.pl and other useful Hive scripts.
Also, if you are developing the code and not just running ready pipelines,
you may find it convenient to add "ensembl-hive/modules" to your $PERL5LIB variable.
3. Useful files and directories of the eHive repository.
3.1 In ensembl-hive/scripts we keep perl scripts used for controlling the pipelines.
Adding this directory to your $PATH may make your life easier.
* init_pipeline.pl is used to create hive databases, populate hive-specific and pipeline-specific tables and load data
* seed_pipeline.pl is used to add new jobs to the pipeline.
* beekeeper.pl is used to run the pipeline; send 'Workers' to the 'Meadow' to run the jobs of the pipeline
3.2 In ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig we keep example pipeline configuration modules that can be used by init_pipeline.pl .
A PipeConfig is a parametric module that defines the structure of the pipeline.
That is, which analyses with what parameters will have to be run and in which order.
The code for each analysis is contained in a RunnableDB module.
For some tasks bespoke RunnableDB have to be written, whereas some other problems can be solved by only using 'universal buliding blocks'.
A typical pipeline is a mixture of both.
3.3 In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB we keep 'universal building block' RunnableDBs:
* SystemCmd.pm is a parameter substitution wrapper for any command line executed by the current shell
* SqlCmd.pm is a parameter substitution wrapper for running any MySQL query or a session of linked queries
against a particular database (eHive pipeline database by default, but not necessarily)
* JobFactory.pm is a universal module for dynamically creating batches of same analysis jobs (with different parameters)
to be run within the current pipeline
3.4 In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult we keep bespoke RunnableDBs for long multiplication example pipeline.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment