Split wwPDB chemical component dictionary file into separate cif file for each component
- The wwPDB chemical component dictionary is available as a single big (around 215MB) file each week.
- download link ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif
- details https://www.wwpdb.org/data/ccd
- this needs to be split into individual CCD files - one for each chemical component.
- these then need to be processed to sdf/pdb and images.
- It might be sensible to do this by parsing the complete file using a cif parser - could then process each component using the tools developed to write sdf, pdb files and images. If so could address issue #10 (closed) at same time.
- Or it might be necessary to split the file into individual small files in a separate program without using cif parser each component starts with line
ABCis the chem_comp.id (aka residue name). ** for instance http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/mmcif/001.cif
data_001 # _chem_comp.id 001 _chem_comp.name "1-[2,2-DIFLUORO-2-(3,4,5-TRIMETHOXY-PHENYL)-ACETYL]-PIPERIDINE-2-CARBOXYLIC ACID 4-PHENYL-1-(3-PYRIDIN-3-YL-PROPYL)-BUTYL ESTER" _chem_comp.type NON-POLYMER _chem_comp.pdbx_type HETAIN _chem_comp.formula "C35 H42 F2 N2 O6"
- could simply read file and look for lines starting
data_...when line found close previous file and open a new one for the code