Skip to content
Snippets Groups Projects
Commit 1ba403e3 authored by Andreas Kusalananda Kähäri's avatar Andreas Kusalananda Kähäri
Browse files

Describe procedure for new config system.

*** HEADS UP ***

The xref_parser will now croak if used with the old
populate_metadata.sql file.
parent e23172d6
No related branches found
No related tags found
No related merge requests found
......@@ -11,36 +11,35 @@ database.
Parsing the external database References
------------------------------------------------------------------------
In the directory sql you will find a file populate_metadata.sql. In this
file the data there is used to get the files to be parsed. So for each
species there will be a list of datafiles that will be parsed. Most
sources will start with 'ftp://' or 'http://' which indicates that they
will be downloaded from external sites. Those starting with 'file://'
(or 'LOCAL:') are not downloaded and must be copied manually from
another source.
When xref_parser.pm is run it will load this data for all species into
the database and will then down load and parse all those files for a
In this directory you will find an ini-file called 'xref_config.ini'.
This file contains two types of configuration sections; source sections
and species sections. A source section defines Xref priority, order
etc. (as key-value pairs) for the source and also the URIs pointing
to the data files that the source should use. The source label will
only be used to refer to the source within the ini-file (from a species
section), so this can be any text string which is easy to undeerstand
the meaning of.
A species section contains information about species aliases, taxonomy
ID and what sources to use for that species. The name of the species is
defined by the source label and will be store in the Xref database.
For now, the script 'xref_config2sql.pl', also found in this directory,
should be used to convert the ini-file into a SQL file which you should
replace the file 'sql/populate_metadata.sql' with.
When 'xref_parser.pl' is run it will load this data for all species into
the database and will then download and parse all those files for a
given specified species.
If you want to add a new source you will have to add a new source line
i.e.
If you want to add a new source you will have to add a new source
section, following the pattern used by the other source sections. You
will then have to add it to the species that require the data.
INSERT INTO source VALUES (10020, 'NEWSOURCE', 1, 'Y', 10, 1, "");
then for a particular species you will have to defined how to get the
file and parse it. i.e.
INSERT INTO source_url
(source_id, species_id, url, checksum,
file_modified_date, upload_date, parser)
VALUES (10020, 9606,
'ftp://ftp.ebi.ac.uk/pub/databases/new_source.dat', '',
now(), now(), "NewSourceParser");
You will now also have to write the parser NewSourceParser.pm in the
XrefParser directory. You can find lots of examples of parsers in this
directory.
If the new data comes in files not previously handeled by the Xref
system, you will now also have to write the parser NewSourceParser.pm in
the XrefParser directory. You can find lots of examples of parsers in
this directory.
The parsing can create three types of xrefs these are
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment