Skip to content
Snippets Groups Projects
Commit 69c8e18c authored by Andreas Kusalananda Kähäri's avatar Andreas Kusalananda Kähäri
Browse files

Remove files no longer used.

parent 3a5ee200
No related branches found
No related tags found
No related merge requests found
EXTERNAL_DB UPDATES
===================
DESCRIPTION
-----------
The external_db table contains a list of all external databases which are
referenced by ensembl. Specifically this table is used by the xref table
which contains lists of external database identifiers.
Foremerly all EnsEMBL core databases had their own external_db tables
and used an enumeration of db_names. This was difficult to maintain and
required schema updates every release. The new system requires that
every ensembl database contains the exact same external_db table, which
ensures consistancy across all databases and makes it easier to test.
MASTER EXTERNAL_DB LIST
-----------------------
The master list of external databases is stored in the file:
${ENSEMBL_HOME}/ensembl/misc-scripts/external_db/external_dbs.txt
This file contains a tab-seperated list of values which are loaded into
the external_db tables. In order the columns are:
'external_db_id', 'db_name', 'release', 'status',
'db_primary_acc_linkable', 'display_label_linkable', 'priority',
'db_display_name', 'type', 'secondary_db_name', 'secondary_db_table',
'description'
- external_db_id -- internal identifier for this entry, primary key
- release -- is currently not used by the software and is always set to 1
- status -- must be one of 'KNOWNXREF','KNOWN', 'XREF', 'PRED', 'ORTH', 'PSEUDO'.
This is used by the webcode to determine which genes/transcripts can be considered
to be known or unknown etc.
- dbprimary_acc_linkable -- used by the webcode to indicate if the linkable
element is the internal name in the database (e.g. HGNC_curated_gene)
- display_label_linkable -- used by the webcode to indicate if the linkable
element is the name of the database (e.g. WikiGene )
- priority -- used for the website to indicate priority of display in page
( the higher the number, the closer to the top of the page)
- db_display_name -- name to be displayed in the website might be different
to name of database (e.g. HGNC Symbol rather than HGNC)
- type -- indicates kind of information the xref database offers (e.g. ALT_GENE
is used in OTTG to indicate that this external database produces alternative gene from Vega)
- secondary_db_name -- not used at the moment (requested by functional genomics team)
- secondary_db_table -- not used at the moment (requested by functiona genomics team)
- description -- free column to describe the external database
...
12300 HGNC_curated_gene 1 KNOWNXREF 1 0 5 HGNC (curated) MISC \N \N
12305 HGNC_automatic_gene 1 KNOWNXREF 1 0 5 HGNC (automatic) MISC \N \N
12310 Clone_based_vega_gene 1 KNOWNXREF 1 0 5 Clone-based (Vega) MISC \N \N
12315 Clone_based_ensembl_gene 1 XREF 1 0 5 Clone-based (Ensembl) MISC \N \N
12400 HGNC_curated_transcript 1 KNOWNXREF 1 0 5 HGNC (curated) MISC \N \N
...
UPDATE PROCEDURE
----------------
The following describes the steps necessary to update the external_db table
(1) Add new external database(s) if the appropriate database(s) are not in the
master list:
(a) Add a row to the external_dbs.txt file. The columns must be tab
seperated and the external_db identifier must be unique. The
db_release should be set to 1 and the status should reflect the
how xrefs from this external database are used by web.
For example a new external_db 'Celera_gene' could be added as the
following:
400 Celera_Gene 1 PRED 1 0 5 Celera gene MISC \N \N
(b) Commit the external_dbs.txt file using cvs commit. This is to
ensure that nobody else who may also be updating the file will use
the same identifier that you chose (in the example ID 400).
(2) Propagate the contents of the file to all of the core style databases
(core|cdna|vega|otherfeatures). To update all of the core style databases
for release 56:
perl update_external_dbs.pl -host ens-staging -file external_dbs.txt \
-user ensadmin -pass secret -release 56
To update the human core database:
perl update_external_dbs.pl -host ens-staging -file external_dbs.txt \
-user ensadmin -pass secret -dbnames homo_sapiens_core_56_37a
Upon executing the script it will display a list of dbs that the updates
will be applied and you will have to type 'yes' at a confirmation.
If the databases to be updated contain rows that are not in the file,
a warning will be given and the database in question skipped.
The flag -nonreleasemode it is used when we we want to load the master
database (use in combination with -master): e.g. we want to create a new
database with all the external_db from the file
The flag -force is used to update the databases even there is a difference
between the master file and the databases. This should only be used when
we are sure the data in the database is wrong/not necessary and we want to
overwrite it with the information in the external_db.txt file.
In Ensembl, the release coordinator is responsible for running this script
during the release process.
use strict;
use warnings;
use DBI;
my $user = 'ecs2dadmin';
my $host = 'ecs2d';
my $pass = 'TyhRv';
my $dbname = shift;
my $dbh = DBI->connect("DBI:mysql:host=$host;dbname=$dbname;", $user, $pass,
{RaiseError => 1});
#
# Store the old external_db table in a hash
#
my $sth = $dbh->prepare('SELECT external_db_id, db_name
FROM external_db');
print STDERR "READING OLD EXTERNAL DB\n";
$sth->execute();
my %old_ext_db = map {$_->[0], $_->[1]} @{$sth->fetchall_arrayref};
$sth->finish();
#
# drop the existing external_db table and replace it with the new table
#
print STDERR "REPLACING OLD EXTERNAL DB\n";
`cat external_db.sql | mysql -h $host -u $user -p$pass $dbname`;
#
# Store the new external_db table in a hash
#
print STDERR "READING NEW EXTERNAL DB\n";
$sth->execute();
my %new_ext_db = map {$_->[1], $_->[0]} @{$sth->fetchall_arrayref};
$sth->finish();
#
# update each row in the xref table
#
print STDERR "UPDATING XREF TABLE\n";
$sth = $dbh->prepare('SELECT external_db_id, xref_id FROM xref');
my($external_db_id, $xref_id);
$sth->execute();
$sth->bind_columns(\$external_db_id, \$xref_id);
my $update_sth =
$dbh->prepare('UPDATE xref SET external_db_id = ? WHERE xref_id = ?');
my $count = 0;
while($sth->fetch()) {
my $dbname = $old_ext_db{$external_db_id};
my $id = $new_ext_db{$dbname};
if($id) {
$update_sth->execute($id, $xref_id);
$count++;
if($count % 1_000 == 0) {
print STDERR '.';
}
} else {
warn("Could not convert ext_id=[$external_db_id] dbname=[$dbname]\n");
}
}
$sth->finish();
$dbh->disconnect();
print STDERR "COMPLETE. Converted $count xrefs.\n";
This diff is collapsed.
#Contact: Emmanuel Mongin (mongin@ebi.ac.uk)
use strict;
use DBI;
use Getopt::Long;
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::DBSQL::DBEntryAdaptor;
use Bio::EnsEMBL::DBEntry;
use Bio::SeqIO;
my ( $host, $dbuser, $dbname, $dbpass, $port, $filename );
my %map;
GetOptions( "host=s", \$host,
"user=s", \$dbuser,
"pass=s", \$dbpass,
"port=i", \$port,
"dbname=s", \$dbname,
"file=s", \$filename
);
if( ! $filename ) {
usage()
}
print STDERR "Connecting to $host, $dbname\n";
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(
'-host' => $host,
'-user' => $dbuser,
'-dbname' => $dbname,
'-pass' => $dbpass,
'-port' => $port
);
my $adaptor = $db->get_DBEntryAdaptor();
print STDERR "Loading expression data\n";
open (AFFY, $filename ) || die "Can't open AFFY file";
while (<AFFY>) {
chomp;
my ($transl_id,$db1,$id) = split;
if ($id ne "NULL") {
my $dbentry = Bio::EnsEMBL::DBEntry->new
( -adaptor => $adaptor,
-primary_id => $id,
-display_id => $id,
-version => 1,
-release => 1,
-dbname => $db1);
$dbentry->status("XREF");
print "$transl_id\t$db1\t$id\n";
$adaptor->store($dbentry,$transl_id,"Translation");
}
}
close(AFFY);
sub usage {
print STDERR <<HELP
Usage: perl load_additional_human_affy_xrefs.pl
-host db connection detail
-user
-pass
-port
-dbname
-file filename
File with xrefs to upload
HELP
;
exit();
}
#Contact: Emmanuel Mongin (mongin@ebi.ac.uk)
use strict;
use DBI;
use Getopt::Long;
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::DBSQL::DBEntryAdaptor;
use Bio::EnsEMBL::DBEntry;
use Bio::SeqIO;
my ( $host, $dbuser, $dbname, $dbpass, $port, $filename );
my %map;
GetOptions( "host=s", \$host,
"user=s", \$dbuser,
"pass=s", \$dbpass,
"port=i", \$port,
"dbname=s", \$dbname,
"file=s", \$filename
);
if( ! $filename ) {
usage()
}
print STDERR "Connecting to $host, $dbname\n";
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(
'-host' => $host,
'-user' => $dbuser,
'-dbname' => $dbname,
'-pass' => $dbpass,
'-port' => $port
);
my $adaptor = $db->get_DBEntryAdaptor();
print STDERR "Loading GKB mapping\n";
open (GKB, $filename ) || die "Can't open GKB file";
my $sth =
$db->prepare("SELECT o.ensembl_id
FROM object_xref o, xref x
WHERE x.dbprimary_acc = ? AND x.xref_id = o.xref_id");
my $db1 = "GKB";
while (<GKB>) {
chomp;
my ($sp,$id) = split;
$sth->execute($sp);
while (my $transl_id = $sth->fetchrow) {
my $dbentry = Bio::EnsEMBL::DBEntry->new
( -adaptor => $adaptor,
-primary_id => $id,
-display_id => $id,
-version => 1,
-release => 1,
-dbname => $db1);
$dbentry->status("XREF");
print STDERR "$transl_id\t$db1\t$id\n";
$adaptor->store($dbentry,$transl_id,"Translation");
}
}
sub usage {
print STDERR <<HELP
Usage: perl load_additional_human_gkb_xrefs.pl
-host db connection detail
-user
-pass
-port
-dbname
-file filename
File with xrefs to upload
HELP
;
exit();
}
#!/usr/local/ensembl/bin/perl -w
#
# updates the external db tables on all of the core databases on a given host
#
use strict;
use Getopt::Long;
use DBI;
use IO::File;
my ( $host, $user, $pass, $port, @dbnames,
$file, $release_num, $master, $nonreleasemode, $force );
GetOptions( "dbhost|host=s", \$host,
"dbuser|user=s", \$user,
"dbpass|pass=s", \$pass,
"dbport|port=i", \$port,
"file=s", \$file,
"dbnames=s@", \@dbnames,
"release_num=i", \$release_num,
"master=s", \$master,
"nonreleasemode", \$nonreleasemode,
"force", \$force );
$port ||= 3306;
$file ||= "external_dbs.txt";
usage("[DIE] Need a host") if(!$host);
#release num XOR dbname are required.
usage( "[DIE] Need either both a release number and "
. "database names or neither" )
if ( ( $release_num && @dbnames ) || ( !$release_num && !@dbnames ) );
if(!$nonreleasemode){
# master database is required
usage("[DIE] Master database required") if (!$master);
}
my $dsn = "DBI:mysql:host=$host;port=$port";
my $db = DBI->connect( $dsn, $user, $pass, {RaiseError => 1} );
if($release_num) {
@dbnames = map {$_->[0] } @{ $db->selectall_arrayref( "show databases" ) };
#
# filter out all non-core databases
#
@dbnames = grep {/^[a-zA-Z]+\_[a-zA-Z]+\_(core|est|estgene|vega|otherfeatures|cdna)\_${release_num}\_\d+[A-Za-z]?$/} @dbnames;
}
my @field_names = qw(external_db_id db_name release status dbprimary_acc_linkable display_label_linkable priority db_display_name type);
my @types = qw(ARRAY ALT_TRANS MISC LIT PRIMARY_DB_SYNONYM ALT_GENE);
#
# make sure the user wishes to continue
#
print STDERR "Please make sure you've updated $file from CVS!\n";
print STDERR
"The following databases will have their external_db tables "
. "updated if necessary:\n ";
print join( "\n ", @dbnames );
print "\nContinue with update (yes/no)> ";
my $input = lc(<STDIN>);
chomp($input);
if ($input ne 'yes') {
print "external_db conversion aborted\n";
exit();
}
#
# read all of the new external_db entries from the file
#
my $fh = IO::File->new();
$fh->open($file) or die("Could not open input file $file");
my @rows;
my %bad_lines;
while (my $row = <$fh>) {
chomp($row);
next if ($row =~ /^#/); # skip comments
next if ($row =~ /^$/); # and blank lines
next if ($row =~ /^\s+$/); # and whitespace-only lines
my @a = split(/\t/, $row);
push @rows, {
'external_db_id' => $a[0],
'db_name' => $a[1],
'release' => $a[2],
'status' => $a[3],
'dbprimary_acc_linkable' => $a[4],
'display_label_linkable' => $a[5],
'priority' => $a[6],
'db_display_name' => $a[7],
'type' => $a[8] };
if ( $a[1] =~ /-/ ) {
print STDERR "Database name "
. $a[1]
. " contains '-' characters "
. "which will break Mart, "
. "please replace them with '_' until Mart is fixed\n";
exit(1);
}
# do some formatting checks
my $blank;
for (my $i=0; $i < scalar(@a); $i++) {
if ($a[$i] eq '') {
$bad_lines{$row} = $field_names[$i] . " - field blank - check all tabs/spaces in line";
}
}
if ($a[1] =~ /\s/) {
$bad_lines{$row} = "db_name field appears to contain spaces";
}
if ($a[1] =~ /^$/) {
$bad_lines{$row} = "db_name field appears to be missing";
}
if ($a[1] =~ /^\s+$/) {
$bad_lines{$row} = "db_name field appears to be blank";
}
if ($a[1] =~ /^\d+$/) {
$bad_lines{$row} = "db_name field appears to be numeric - check formatting";
}
my $type_ok;
foreach my $type (@types) {
$type_ok = 1 if ($a[8] eq $type);
}
$bad_lines{$row} = "type field is " . $a[8] . ", not one of the recognised types" if (!$type_ok);
}
$fh->close();
if (%bad_lines) {
print STDERR "Cannot parse the following line(s) from $file; check that all fields are present and are separated by one tab (not spaces). \n";
print STDERR "Name of problem field, and the error is printed in brackets first\n\n";
foreach my $row (keys %bad_lines) {
print STDERR "[". $bad_lines{$row} . "]" . " $row\n";
}
exit(1);
}
# Load into master database
if(!$nonreleasemode){
load_database($db, $master, @rows);
}
# Check each other database in turn
# Load if no extra rows in db that aren't in master
# Warn and skip if there are
foreach my $dbname (@dbnames) {
print STDERR "Looking at $dbname ... \n";
if ($force || $nonreleasemode) {
print STDERR "Forcing overwrite of external_db table in "
. "$dbname from $file\n";
load_database( $db, $dbname, @rows );
} elsif (compare_external_db($db, $master, $dbname)) {
print STDERR "$dbname has no additional rows. "
. "Overwriting external_db table from $file\n";
load_database( $db, $dbname, @rows );
} else {
print STDERR "$dbname has extra rows "
. "that are not in $file, skipping\n";
}
}
print STDERR "Updates complete\n";
sub load_database {
my ($db, $dbname, @rows) = @_;
$db->do("USE $dbname");
# Save all existing release information from the table.
my $sth = $db->prepare(
qq( SELECT external_db_id, db_release
FROM external_db) );
$sth->execute();
my %saved_release;
while ( my ( $id, $release ) = $sth->fetchrow_array() ) {
if ( defined($release) && $release ne '1' ) {
$saved_release{$id} = $release;
}
}
$sth->finish();
# Delete the existing table
$sth = $db->prepare('DELETE FROM external_db');
$sth->execute();
$sth->finish();
# Populate the table with data from the file (using the saved release
# information)
$sth = $db->prepare(
qq( INSERT INTO external_db (
external_db_id,
db_name,
db_release,
status,
dbprimary_acc_linkable,
display_label_linkable,
priority,
db_display_name,
type) VALUES (?,?,?,?,?,?,?,?,?)) );
foreach my $row (@rows) {
my $id = $row->{'external_db_id'};
$sth->execute( $id,
$row->{'db_name'}, (
exists( $saved_release{$id} )
? $saved_release{$id}
: $row->{'release'}
),
$row->{'status'},
$row->{'dbprimary_acc_linkable'},
$row->{'display_label_linkable'},
$row->{'priority'},
$row->{'db_display_name'},
$row->{'type'} );
}
$sth->finish();
}
# return true if the tables are the same, undef if not
sub compare_external_db {
my ($db, $master, $dbname) = @_;
my $same = 1;
# check each row in $dbname against each row in $master
# only compare ID and name since we're only aiming to catch extra rows in $dbname
$db->do("use $dbname");
my $sth = $db->prepare(qq {SELECT d.external_db_id, d.db_name
FROM $dbname.external_db d
LEFT JOIN $master.external_db m
ON (d.external_db_id=m.external_db_id AND d.db_name=m.db_name)
WHERE m.external_db_id IS NULL OR m.db_name IS NULL });
$sth->execute();
while (my ($id, $external_db_name) = $sth->fetchrow_array) {
print "$dbname has external_db entry for $external_db_name (ID $id) which is not present in $master\n";
$same = undef;
}
$sth->finish();
return $same;
}
sub usage {
my $error = shift;
print STDERR <<EOF;
$error
Usage: $0 options
-host hostname
-user username
-pass password
-port port_of_server optional
-master the name of the master database to load the file into
-force force update, even if there are rows in the database
that are not in the file
-release the release of the database to update used to match
database names, e.g. 13
-file the path of the file containing the insert statements
of the entries of the external_db table. Default is
'external_dbs.txt'
-dbnames the names of the database to update. If not provided
all of the core databases matching the release arg
will be updated. Either -dbnames or -release must
be specified, but not both. Multiple dbnames can be
provided.
-nonreleasemode Does not require master schema and forces the update.
Examples:
# Update two databases
./update_external_dbs.pl -host ecs1c -file external_dbs.txt \\
-user ensadmin -pass secret -master master_schema_14 \\
-dbnames homo_sapiens_core_14_33 -dbnames mus_musculus_core_14_30
# Update all Core databases for release 14
./update_external_dbs.pl -host ens-staging -file external_dbs.txt \\
-user ensadmin -pass secret -release 42 -master master_schema_42
If the databases to be updated contain rows that are not in the file,
a warning will be given and the database in question skipped, unless
-force is used.
This program will not overwrite the db_release column of any table.
EOF
exit;
} ## end sub usage
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment