Commit e83a48f1 authored by Leo Gordon's avatar Leo Gordon
Browse files

slides for an eHive introductory talk

parent e7a498dc
1- code API needed and executable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
perl DBI
Data::UUID (from CPAN.org)
ensembl
ensembl-analysis
ensembl-compara
ensembl-pipeline
1.2 Code checkout
bioperl code
cvs -d :ext:bio.perl.org:/home/repository/bioperl co -r branch-07 bioperl-live
core ensembl code
cvs -d :ext:cvs.sanger.ac.uk:/nfs/ensembl/cvsroot co ensembl
ensembl-analysis, ensembl-pipeline, ensembl-compara code (OPTIONAL, for using e! Runnables)
cvs -d :ext:cvs.sanger.ac.uk:/nfs/ensembl/cvsroot co ensembl-pipeline ensembl-compara ensembl-analysis
ensembl-hive code
cvs -d :ext:cvs.sanger.ac.uk:/nfs/ensembl/cvsroot co ensembl-hive
in tcsh
setenv BASEDIR /some/path/to/modules
setenv PERL5LIB ${PERL5LIB}:${BASEDIR}/ensembl/modules
setenv PERL5LIB ${PERL5LIB}:${BASEDIR}/ensembl-hive/modules
setenv PERL5LIB ${PERL5LIB}:${BASEDIR}/ensembl-analysis/modules (OPTIONAL)
setenv PERL5LIB ${PERL5LIB}:${BASEDIR}/ensembl-compara/modules (OPTIONAL)
setenv PERL5LIB ${PERL5LIB}:${BASEDIR}/ensembl-pipeline/modules (OPTIONAL)
in bash
BASEDIR=/some/path/to/modules
PERL5LIB=${PERL5LIB}:${BASEDIR}/ensembl/modules
PERL5LIB=${PERL5LIB}:${BASEDIR}/ensembl-hive/modules
PERL5LIB=${PERL5LIB}:${BASEDIR}/ensembl-compara/modules (OPTIONAL)
PERL5LIB=${PERL5LIB}:${BASEDIR}/ensembl-analysis/modules (OPTIONAL)
PERL5LIB=${PERL5LIB}:${BASEDIR}/ensembl-pipeline/modules (OPTIONAL)
export PERL5LIB
2- Setup a eHive database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pick a mysql instance and create a database
mysql -h HOST -u USER -pSECRET -e "create database hive_test1"
cd ${BASEDIR}/ensembl-hive/sql
mysql -h HOST -u USER -pSECRET hive_test1 < tables.sql
3- (OPTIONAL) Create location where worker and job STDOUT/STDERR is redirected to
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) create a working directory with enough disk space to hold hive worker output
mkdir /scratch/hive_test1/
b) insert into meta table
$outdir = '/scratch/hive_test1/'
mysql -h HOST -u USER -pSECRET hive_test1 \
-e "INSERT INTO meta(meta_key, meta_value) VALUES ('hive_output_dir', '$outdir')"
4a- Create pipeline graph
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) write RunnableDB modules to process data
b) configure instances of these modules by inserting rows into the analysis table
c) link dataflow graph of analyses (module instances) by inserting into dataflow_rule table
d) insert into analysis_ctrl_rule any blocking rules where 'all' of something needs to be
done before another part of pipeline needs to 'unblock'
e) insert starting job(s) into analysis_job table to kick off pipeline
4b- To use the eHive as a simple batch job throttlingmanager
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) Create one analysis for the SystemCmd module
mysql -h HOST -u USER -pSECRET hive_test1
mysql> INSERT INTO analysis(logic_name, module)
VALUES ('SystemCmd', 'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd');
b) Add as many jobs as needed (in this case, the command line is "echo 1")
mysql> INSERT INTO analysis_job (analysis_id, input_id)
SELECT analysis_id, 'echo 1' FROM analysis WHERE logic_name = 'SystemCmd';
mysql> quit
c) Synchronise the eHive database
beekeeper.pl -url mysql://USER:SECRET@HOST/hive_test1 --sync
d) Change the number of concurrent workers
mysql -h HOST -u USER -pSECRET hive_test1
mysql> UPDATE analysis, analysis_stats SET hive_capacity = 100
WHERE analysis.analysis_id = analysis_stats.analysis_id AND logic_name = 'SystemCmd';
mysql> quit
5) Run hive (queen and workers) through a beekeeper
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
beekeeper.pl -url mysql://USER:SECRET@HOST/hive_test1 -loop
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment