DatabaseDumper.pm 9.38 KB
Newer Older
1 2 3 4
=pod 

=head1 NAME

5 6 7 8 9 10
    Bio::EnsEMBL::Hive::RunnableDB::DatabaseDumper

=head1 SYNOPSIS

    standaloneJob.pl RunnableDB/DatabaseDumper.pm -exclude_ehive 1 -exclude_list 1 \
        -table_list "['peptide_align_%']" -src_db_conn mysql://ensro@127.0.0.1:4313/mm14_compara_homology_67 -output_file ~/dump1.sql
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

=head1 DESCRIPTION

This is a Runnable to dump the tables of a database (by default,
all of them).

The following parameters are accepted:

 - src_db_conn : the connection parameters to the database to be
    dumped (by default, the current eHive database if available)

 - exclude_ehive [boolean=0] : do we exclude the eHive-specific tables
    from the dump

 - table_list [string or array of strings]: the list of tables
    to include in the dump. The '%' wildcard is accepted.

 - exclude_list [boolean=0] : do we consider 'table_list' as a list
    of tables to be excluded from the dump (instead of included)

31 32
 - output_file [string] : the file to write the dump to. If the filename
    ends with ".gz", the file is compressed with "gzip" (default parameters)
33

34 35
 - output_db [string] : URL of a database to write the dump to. In this
    mode, the Runnable acts like MySQLTransfer
36

37 38
 - skip_dump [boolean=0] : set this to 1 to skip the dump

39 40 41 42 43 44 45 46

The decision process regarding which tables should be dumped is quite complex.
The following sections explain the various scenarios.

1. eHive database

1.a. Hybrid database

47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
If "table_list" is undefined or maps to an empty list, the list
of tables to be dumped is decided accordingly to "exclude_list" (EL)
and "exclude_ehive" (EH). "exclude_list" controls the whole list of
non-eHive tables.

EL EH    List of tables to dump
0  0  => all the tables
0  1  => all the tables, except the eHive ones
1  0  => all the tables, except the non-eHive ones = only the eHive tables
1  1  => both eHive and non-eHive tables are excluded = nothing is dumped

If "table_list" is defined to non-empty list T, the table of decision is:

EL EH    List of tables to dump
0  0  => all the tables in T + the eHive tables
0  1  => all the tables in T
1  0  => all the tables, except the ones in T
1  1  => all the tables, except the ones in T and the eHive ones
65

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1.b. eHive-only database

The decision table can be simplified if the database only contains eHive tables.
In particular, the "exclude_list" and "table_list" parameters have no effect.

EH    List of tables to dump
0  => All the eHive tables, i.e. the whole database
1  => No eHive tables, i.e. nothing

2. non-eHive database

The "exclude_ehive" parameter is ignored.

empty "table_list":
EL    List of tables to dump
0  => all the tables
1  => all the tables are excluded = nothing is dumped

non-empty "table_list" T:
EL    List of tables to dump
0  => all the tables in T
1  => all the tables, except the ones in T


90 91
=head1 LICENSE

92 93
    Copyright [1999-2015] Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute
    Copyright [2016] EMBL-European Bioinformatics Institute
94

95 96 97 98 99 100 101 102 103 104 105
    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

         http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License
    is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and limitations under the License.

=head1 CONTACT

106
    Please subscribe to the Hive mailing list:  http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users  to discuss Hive-related questions or to be notified of our updates
107 108 109

=cut

110

111 112 113
package Bio::EnsEMBL::Hive::RunnableDB::DatabaseDumper;

use strict;
114 115 116
use warnings;

use Bio::EnsEMBL::Hive::Utils ('go_figure_dbc');
117 118 119

use base ('Bio::EnsEMBL::Hive::Process');

120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
sub param_defaults {
    return {
        # Which tables to dump. How the options are combined is explained above
        'table_list'    => undef,   # array-ref
        'exclude_ehive' => 0,       # boolean
        'exclude_list'  => 0,       # boolean

        # Input / output
        'src_db_conn'   => undef,   # URL, hash-ref, or Registry name
        'output_file'   => undef,   # String
        'output_db'     => undef,   # URL, hash-ref, or Registry name

        # Other options
        'skip_dump'     => 0,       # boolean
    }
}

137 138 139
sub fetch_input {
    my $self = shift @_;

140
    # The final list of tables
141
    my @tables = ();
142
    $self->param('tables', \@tables);
143
    my @ignores = ();
144 145
    $self->param('ignores', \@ignores);

146
    # Connection parameters
147
    my $src_db_conn  = $self->param('src_db_conn');
148
    my $src_dbc = $src_db_conn ? go_figure_dbc($src_db_conn) : $self->data_dbc;
149
    $self->param('src_dbc', $src_dbc);
150

151
    $self->input_job->transient_error(0);
152 153
    die 'Only the "mysql" driver is supported.' if $src_dbc->driver ne 'mysql';

154
    # Get the table list in either "tables" or "ignores"
155
    my $table_list = $self->_get_table_list($self->param('table_list') || '');
156
    print "table_list: ", scalar(@$table_list), " ", join('/', @$table_list), "\n" if $self->debug;
157
    my $nothing_to_dump = 0;
158

159 160
    if ($self->param('exclude_list')) {
        push @ignores, @$table_list;
161
        $nothing_to_dump = 1 if !$self->param('table_list');
162
    } else {
163
        push @tables, @$table_list;
164
        $nothing_to_dump = 1 if $self->param('table_list') and !@$table_list;
165
    }
166

167 168 169 170 171 172 173 174 175 176 177 178 179 180
    # Would be good to have this from eHive
    my @ref_ehive_tables = qw(hive_meta worker dataflow_rule analysis_base analysis_ctrl_rule job accu log_message job_file analysis_data resource_description analysis_stats analysis_stats_monitor monitor msg progress resource_class);

    ## Only eHive databases have a table named "hive_meta"
    my $meta_sth = $src_dbc->db_handle->table_info(undef, undef, 'hive_meta');
    my @ehive_tables;
    if ($meta_sth->fetchrow_arrayref) {
        # The hard-coded list is comprehensive, so some tables may not be
        # in this database (which may be on a different version)
        push @ehive_tables, @{$self->_get_table_list($_)} for @ref_ehive_tables;
    }
    $meta_sth->finish();

    # eHive tables are ignored if exclude_ehive is set
181 182
    if ($self->param('exclude_ehive')) {
        push @ignores, @ehive_tables;
183 184 185 186 187
    } elsif (@ehive_tables) {
        if (@tables || $nothing_to_dump) {
            push @tables, @ehive_tables;
            $nothing_to_dump = 0;
        }
188 189
    }

190 191
    $self->param('nothing_to_dump', $nothing_to_dump);

192 193
    # Output file / output database
    $self->param('output_file') || $self->param('output_db') || die 'One of the parameters "output_file" and "output_db" is mandatory';
194
    unless ($self->param('output_file')) {
195
        $self->param('real_output_db', go_figure_dbc( $self->param('output_db') ) );
196
        die 'Only the "mysql" driver is supported.' if $self->param('real_output_db')->driver ne 'mysql';
197 198 199 200 201 202 203 204
    }

    $self->input_job->transient_error(1);
}


# Splits a string into a list of strings
# Ask the database for the list of tables that match the wildcard "%"
205
# and also select the tables that actually exist
206
sub _get_table_list {
207
    my ($self, $table_list) = @_;
208

209
    my @newtables = ();
210
    my $dbc = $self->param('src_dbc');
211
    foreach my $initable (ref($table_list) eq 'ARRAY' ? @$table_list : split(' ', $table_list)) {
212 213 214
        if ($initable =~ /%/) {
            $initable =~ s/_/\\_/g;
        }
215 216
        my $sth = $dbc->db_handle->table_info(undef, undef, $initable, undef);
        push @newtables, map( {$_->[2]} @{$sth->fetchall_arrayref});
217 218 219 220 221 222 223 224 225 226 227 228
    }
    return \@newtables;
}


sub run {
    my $self = shift @_;

    my $src_dbc = $self->param('src_dbc');
    my $tables = $self->param('tables');
    my $ignores = $self->param('ignores');

229 230 231
    print "tables: ", scalar(@$tables), " ", join('/', @$tables), "\n" if $self->debug;
    print "ignores: ", scalar(@$ignores), " ", join('/', @$ignores), "\n" if $self->debug;

232 233 234
    my @options = qw(--skip-lock-tables);
    # Without any table names, mysqldump thinks that it should dump
    # everything. We need to add special arguments to handle this
235
    if ($self->param('nothing_to_dump')) {
236 237 238 239
        print "everything is excluded, nothing to dump !\n" if $self->debug;
        push @options, qw(--no-create-info --no-data);
        $ignores = [];  # to clean-up the command-line
    }
240 241

    # mysqldump command
242 243 244 245 246 247 248 249 250 251 252
    my $output = "";
    if ($self->param('output_file')) {
        if (lc $self->param('output_file') =~ /\.gz$/) {
            $output = sprintf(' | gzip > %s', $self->param('output_file'));
        } else {
            $output = sprintf('> %s', $self->param('output_file'));
        }
    } else {
        $output = sprintf(' | mysql %s', $self->mysql_conn_from_dbc($self->param('real_output_db')));
    };

253 254 255
    my $cmd = join(' ', 
        'mysqldump',
        $self->mysql_conn_from_dbc($src_dbc),
256
        @options,
257
        @$tables,
258
        (map {sprintf('--ignore-table=%s.%s', $src_dbc->dbname, $_)} @$ignores),
259
        $output
260
    );
261
    print "$cmd\n" if $self->debug;
262 263 264 265 266

    # We have to skip the dump
    return if ($self->param('skip_dump'));

    # OK, we can dump
267
    if(my $return_value = system(bash => (-o => 'pipefail', -c => $cmd))) {
268
        die "system( $cmd ) failed: $return_value";
269 270 271 272 273 274 275
    }
}


sub mysql_conn_from_dbc {
    my ($self, $dbc) = @_; 

276
    return '--host='.$dbc->host.' --port='.$dbc->port." --user='".$dbc->username."' --password='".$dbc->password."' ".$dbc->dbname;
277 278 279 280
}


1;