Script to read wwPDB chemical component dictionary and split it to produce PDBeChem outputs
What
This is the next step once all components in issues: #3 (closed) #4 (closed) #6 (closed) #8 (?) #9 (closed) #10 (closed) have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %Improve PDBe Chemical Components backend infrastructure using RDKit: beta test version
How
- Anticipate using a single process initially
- Performance Testing will be important
- how many PDB CCD fail?
- how long does the process take? How can it be parallelized to use than more processor
task list of things to code
- Use description from http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/readme.htm
- and files in
/nfs/ftp/pub/databases/msd/pdbechem/
- For each CCD write a file in:
-
files/mmcif/
individual CCD.cif files for each component -
files/sdf/
Molfile (SDF) with ideal coordinates and hydrogen atoms -
files/sdf_nh/
Molfile (SDF) with ideal coordinates without hydrogen atoms -
files/sdf_r/
Molfile (SDF) with representative coordinates and hydrogen atoms -
files/sdf_r_nh/
Molfile (SDF) with representative coordinates without hydrogen atoms -
files/pdb/
PDB with ideal coordinates -
files/pdb_r
PDB with representative coordinates. -
files/cml
CML format ideal coorinates -
files/xyz
(not mentioned inreadme.html
) xyz format ideal (see https://en.wikipedia.org /wiki/XYZ_file_format) -
files/xyz_r
same for representative coordinates. -
images svg - 3 different images (see below) -
images gif - convert the svg images.
- overall write
-
chem_comp.list
a simple list of the chem_comp_id's one per line -
chem.xml
an xml file for all chem_comps -
readme.htm
start with existing -
tar.gz files for each of the subdirectories in files
andimages
directories.
- new:
-
use logging warn for any problematic inchikey like HEM and CDL. -
divided
subdirectory where all files for a chem_component are provided in a separate directory for that component. So for ATP thedivided/A/ATP/
directory will contain .cif, four .sdf files, .... do using softlinks. Need a separate issue for this. issue #23 (closed)
Edited by Oliver Smart