Skip to content

Script to read wwPDB chemical component dictionary and split it to produce PDBeChem outputs

What

This is the next step once all components in issues: #3 (closed) #4 (closed) #6 (closed) #8 (?) #9 (closed) #10 (closed) have been developed as classes with unit tests.

All of this will be put together to write a program to do the complete job described in %Improve PDBe Chemical Components backend infrastructure using RDKit: beta test version

How

  • Anticipate using a single process initially
  • Performance Testing will be important
    • how many PDB CCD fail?
    • how long does the process take? How can it be parallelized to use than more processor

task list of things to code

  • files/mmcif/ individual CCD.cif files for each component
  • files/sdf/ Molfile (SDF) with ideal coordinates and hydrogen atoms
  • files/sdf_nh/ Molfile (SDF) with ideal coordinates without hydrogen atoms
  • files/sdf_r/ Molfile (SDF) with representative coordinates and hydrogen atoms
  • files/sdf_r_nh/ Molfile (SDF) with representative coordinates without hydrogen atoms
  • files/pdb/ PDB with ideal coordinates
  • files/pdb_r PDB with representative coordinates.
  • files/cml CML format ideal coorinates
  • files/xyz (not mentioned in readme.html ) xyz format ideal (see https://en.wikipedia.org /wiki/XYZ_file_format)
  • files/xyz_r same for representative coordinates.
  • images svg - 3 different images (see below)
  • images gif - convert the svg images.
  • overall write
  • chem_comp.list a simple list of the chem_comp_id's one per line
  • chem.xml an xml file for all chem_comps
  • readme.htm start with existing
  • tar.gz files for each of the subdirectories in files and images directories.
  • new:
  • use logging warn for any problematic inchikey like HEM and CDL.
  • divided subdirectory where all files for a chem_component are provided in a separate directory for that component. So for ATP the divided/A/ATP/ directory will contain .cif, four .sdf files, .... do using softlinks. Need a separate issue for this. issue #23 (closed)
Edited by Oliver Smart