pdbe issueshttps://gitlab.ebi.ac.uk/groups/pdbe/-/issues2017-09-20T07:56:31Zhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/25Rethink on 2D chemical diagram images2017-09-20T07:56:31ZOliver SmartRethink on 2D chemical diagram images* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrog...* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrogen images are not very good and the ftp area images are not used on the PDBe page
* just produce the svg images.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/24improve fragment searching method & library2017-10-02T14:38:24ZOliver Smartimprove fragment searching method & library* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it ...* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it would be useful to generate pictures of fragment library molecules
- [x] the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
- [x] The tools need to be usable by other people #18 needs to be done.
- [x] needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns `name, smarts?, query, comment`.
- [ ] *peptide* fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial *amide*
- [x] *steroid* needs to pick up all steroids.
- [x] *deoxyribose* fragment needs to not pick up ribose.
- [ ] *pyranose* fragment is wrong
- [ ] A.M. wants to add additional fragments - provide tools for him to be able to take over the work.
## it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.
* This would help developing checking the fragment library but is a reasonably big task.
* thinking of a simple interactive command line 'server'
* that processes the components.cif holding all PDBCCD in memory
* then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
* does a substructure search against the PDBCCD
* then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment
## Question: do we really want SMILES substrings to define fragments?
* currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. [Daylight>SMARTS Examples](http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) **27-Sept-2017**PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/23PDBeChem ftp Output: Supply divided directory2018-10-13T09:14:48ZOliver SmartPDBeChem ftp Output: Supply divided directory* From issue #13
* Currently it is difficult to navigate the ftp site because it takes around 40 seconds to see http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/images/large/ as it contains 24936 files.
* Instead provide a divided direct...* From issue #13
* Currently it is difficult to navigate the ftp site because it takes around 40 seconds to see http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/images/large/ as it contains 24936 files.
* Instead provide a divided directory with all the files for an individual chemical component.
* so for ATP directory `divided/A/ATP` would contain:
```
divided/A/ATP/ATP.cif
divided/A/ATP/coordinates/ideal/ATP.sdf
divided/A/ATP/coordinates/ideal/ATP.pdb
divided/A/ATP/coordinates/ideal/ATP_no_hydrogen.sdf
divided/A/ATP/coordinates/ideal/ATP.xml
divided/A/ATP/coordinates/model/ATP.sdf
divided/A/ATP/coordinates/model/ATP.pdb
divided/A/ATP/coordinates/model/ATP_no_hydrogen.sdf
divided/A/ATP/2Dimages/with_labels/ATP.xml
divided/A/ATP/2Dimages/with_labels/ATP.png
divided/A/ATP/2Dimages/without_labels/ATP.xml
divided/A/ATP/2Dimages/without_labels/ATP.png
```
* this will enable users to quickly find what they want (hopefully?)PDBeChem Backend Processing: get into preproductionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/22chem.xml add useful rdkit descriptors like number of rotatable bonds2018-11-22T10:02:11ZOliver Smartchem.xml add useful rdkit descriptors like number of rotatable bonds* from issue #10
* good idea to output information from RDKit like number of rotatable bonds (wrote temporary jiffy for this [temp_jiffy_number_rotatable_bonds_jiffy.py](temp_jiffy_number_rotatable_bonds_jiffy.py)
* there are many other...* from issue #10
* good idea to output information from RDKit like number of rotatable bonds (wrote temporary jiffy for this [temp_jiffy_number_rotatable_bonds_jiffy.py](temp_jiffy_number_rotatable_bonds_jiffy.py)
* there are many other descriptors available - see http://www.rdkit.org/Python_Docs/rdkit.Chem.Descriptors-module.htmlhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/21write xyz format file for PDB-CCD2017-08-25T15:59:49ZOliver Smartwrite xyz format file for PDB-CCDPDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd...PDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/EOH.xyz
9
EOH
C 0.0070 -0.5690 0.0000
C -1.2850 0.2500 -0.0000
O 1.1300 0.3150 -0.0000
H 0.0390 -1.1970 0.8900
H 0.0390 -1.1970 -0.8900
H -1.3170 0.8780 0.8900
H -1.3170 0.8780 -0.8900
H -2.1420 -0.4240 0.0000
H 1.9860 -0.1370 0.0000
```
* model coordinates:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz_r/EOH.xyz
9
EOH
C 15.2120 49.1980 7.4910
C 16.0690 50.3860 7.1040
O 15.8610 48.1850 8.2560
H 14.3750 49.5790 8.0940
H 14.8580 48.7310 6.5600
H 15.4670 51.0980 6.5200
H 16.4420 50.8800 8.0130
H 16.9200 50.0420 6.4980
H 15.2440 47.4880 8.4470
```
* Currently elements are upper case:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/FES.xyz
4
FES
FE 0.0000 -0.2130 -1.5310
FE 0.0000 -0.2130 1.5310
S 1.4610 0.3720 0.0000
S -1.4610 0.3720 0.0000
```
but better if the iron atoms are written as Fe.
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/20Proof of concept: Mogul report on PDB-CCD ideal coordinates2017-11-18T16:36:27ZOliver SmartProof of concept: Mogul report on PDB-CCD ideal coordinates# idea
* For a given wwPDB chemical components definition (PDB-CCD) run the [Mogul tool](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/mogul/using) the chemical description from PDB-CCD
* Produce an html format report for th...# idea
* For a given wwPDB chemical components definition (PDB-CCD) run the [Mogul tool](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/mogul/using) the chemical description from PDB-CCD
* Produce an html format report for the PDB-CCD ideal coordinates
* Use [Global Phasing Buster-Report](https://www.globalphasing.com/buster/wiki/index.cgi?BusterReport) detailed ligand report [example](http://grade.globalphasing.org/tut/erice_workshop/introtutorial/buster/00_MapOnly.report/ligand/detailedreport_A_501_.html#atableBOND) as a starting point - want coloured 2D diagrams to show which bonds/angles/torsions/rings are outliers.
* use ccd_utils with rdkit code
* for Mogul use the [CSD Python API](https://downloads.ccdc.cam.ac.uk/documentation/API/) rather than directly running MogulOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/19Develop utility command line script to read single cif and write coordinate f...2017-09-27T13:21:16ZOliver SmartDevelop utility command line script to read single cif and write coordinate files/images etc.## What
* Want a command line script so a user can read in any ccd cif and
* write a sdf if they want to - with options for ideal/model coordinates, hydrogen, alias on off
* write a pdb file with options ...
* write an image w...## What
* Want a command line script so a user can read in any ccd cif and
* write a sdf if they want to - with options for ideal/model coordinates, hydrogen, alias on off
* write a pdb file with options ...
* write an image with options ....
* Display properties about the molecule - lipinski things - num rings, rotable bonds etc. etc.
## How
* The command line arguments and the help text for each must be proposed as a comment on this page. **The proposal must be agreed to before any coding is done!**
* Script is to use argparse
* All points except 2 in https://ajminich.com/2013/08/01/10-things-i-wish-every-python-script-did/ must be followed.
* Can you unit test a command line script?
* If there are exceptions (file does not exist etc.) catch them produce a sensible error message to the user and call `sys.exit(1)` to indicate to the calling process there was problem.PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/18Package code so it can be installed with pip2017-09-19T19:06:24ZOliver SmartPackage code so it can be installed with pip* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-github* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-githubImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/17process_components_cif script to read complete components.cif and produce P...2017-09-27T08:53:24ZOliver Smartprocess_components_cif script to read complete components.cif and produce PDBeChem ftp areaIssue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no...Issue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no ideal coordinates. Do so now:
* make sure script does not fall over on problems but log's error and continues
* for each problem deal with it (adding unit test were possible).
-------------------
12 September 2017
# Summary of progress and outstanding issues.
* Have got script that produces required output in a reasonable way.
* needs some clean up and further work of:
- [x] need to look into inchi mismatch observation.
- [x] improve command line options.
- [ ] logging output - would be good to list number of unsuccessful sdf, pdb, images etc.
- [x] how gif images are produced (avoid svg conversion)
- [x] need to look into RDKit Invariant Violation
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/16Test fails in directories other than ccd_utils2017-09-21T14:18:05ZIjaz AhmadTest fails in directories other than ccd_utilsTest works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
-----------------------------...Test works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
----------------------------------------------------------------------
Ran 36 tests in 0.090s
OK
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ cd ..
(my-rdkit-env) [qyuan@ch-qyuan-z440 pdbe]$ nosetests ccd_utils/test_pdb_chemical_components.py
ERROR: Failure: ValueError (cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/qyuan/anaconda2/envs/my-rdkit-env/lib/python2.7/site-packages/nose/loader.py", line 251, in generate
for test in g():
File "/home/qyuan/pdbe/ccd_utils/test_pdb_chemical_components.py", line 99, in test_load_hem_from_cif
hem = PdbChemicalComponents(file_name=cif_filename('HEM'), cif_parser=cif_parser)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 80, in __init__
self.read_ccd_from_cif_file(file_name)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 317, in read_ccd_from_cif_file
raise ValueError('cannot read PDB chemical components from {} as file not found'.format(file_name))
ValueError: cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found
```Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/14Document the current installation procedure2017-09-21T14:18:05ZOliver SmartDocument the current installation procedure* Currently the ccd_utils project needs a parallel check out of PDBeCIF project. https://github.com/glenveegee/PDBeCIF.git
* How to do this should be explained in the [README.md](README.md) file.
* Add a section *Installation instructions** Currently the ccd_utils project needs a parallel check out of PDBeCIF project. https://github.com/glenveegee/PDBeCIF.git
* How to do this should be explained in the [README.md](README.md) file.
* Add a section *Installation instructions*Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/13Script to read wwPDB chemical component dictionary and split it to produce PD...2017-09-08T09:57:00ZOliver SmartScript to read wwPDB chemical component dictionary and split it to produce PDBeChem outputs# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* An...# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* Anticipate using a single process initially
* Performance Testing will be important
* how many PDB CCD fail?
* how long does the process take? How can it be parallelized to use than more processor
# task list of things to code
* Use description from http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/readme.htm
* and files in `/nfs/ftp/pub/databases/msd/pdbechem/`
* For each CCD write a file in:
- [x] `files/mmcif/` individual CCD.cif files for each component
- [x] `files/sdf/` Molfile (SDF) with ideal coordinates and hydrogen atoms
- [x] `files/sdf_nh/` Molfile (SDF) with ideal coordinates without hydrogen atoms
- [x] `files/sdf_r/` Molfile (SDF) with representative coordinates and hydrogen atoms
- [x] `files/sdf_r_nh/` Molfile (SDF) with representative coordinates without hydrogen atoms
- [x] `files/pdb/` PDB with ideal coordinates
- [x] `files/pdb_r` PDB with representative coordinates.
- [x] `files/cml` CML format ideal coorinates
- [x] `files/xyz` (not mentioned in `readme.html `) xyz format ideal (see https://en.wikipedia.org
/wiki/XYZ_file_format)
- [x] `files/xyz_r` same for representative coordinates.
- [x] images svg - 3 different images (see below)
- [x] images gif - convert the svg images.
* overall write
- [x] `chem_comp.list` a simple list of the chem_comp_id's one per line
- [x] `chem.xml` an xml file for all chem_comps
- [x] `readme.htm` start with existing
- [x] tar.gz files for each of the subdirectories in `files` and `images` directories.
* new:
- [x] use logging warn for any problematic inchikey like HEM and CDL.
- [ ] `divided` subdirectory where all files for a chem_component are provided in a separate directory for that component. So for ATP the `divided/A/ATP/` directory will contain .cif, four .sdf files, .... do using softlinks. *Need a separate issue for this.* issue #23Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/12Add DMSO and Sildenafil to list of test CCD cif files2017-09-21T14:18:05ZOliver SmartAdd DMSO and Sildenafil to list of test CCD cif files* In previous work I found that SOx groups could cause problems.
* Please look up [DMSO](https://en.wikipedia.org/wiki/Dimethyl_sulfoxide) and [Sildenafil (Viagra)](https://en.wikipedia.org/wiki/Sildenafil) in PDBeChem and add to directo...* In previous work I found that SOx groups could cause problems.
* Please look up [DMSO](https://en.wikipedia.org/wiki/Dimethyl_sulfoxide) and [Sildenafil (Viagra)](https://en.wikipedia.org/wiki/Sildenafil) in PDBeChem and add to directory with test cif files.
* How do the molecules perform in current tests?
* When committing the files you must include where the files were obtained (exact url) in the commit message.
* Include reference to this issue number in the commit message. See https://docs.gitlab.com/ee/user/project/issues/automatic_issue_closing.htmlImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/11Tool for finding "fragments" in each CCD molecule.2017-09-14T15:35:12ZOliver SmartTool for finding "fragments" in each CCD molecule.## What
Requirements From PDBe confluence: **16 June 2016**
SV wants a tool to produce a file that lists the fragments present in each of the chemical compounds:
* read in the chemical component definition cif file.
* read in...## What
Requirements From PDBe confluence: **16 June 2016**
SV wants a tool to produce a file that lists the fragments present in each of the chemical compounds:
* read in the chemical component definition cif file.
* read in file smi.txt that contains lines like:
```
cyclopentane:C1CCCC1
cyclopropane:C1CC1
cytosine:C1=CNC(NC1N)O
```
* use rdkit to find which of the fragments is in the ccd.
* write results as a csv format file with the contents:
3 letter code eg. "ATP", fragment name from smi.txt, atom names comma delimited e.g. "C1,C2,C3,C4,O5"
* Producing this tool is a priority.
* name for tool ccd_find_fragments.py (provisional)Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/10Create chem.xml for all components2017-08-29T17:00:52ZOliver SmartCreate chem.xml for all components# What
Currently http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/chem.xml lists in xml format information for every PDB chemical component. For instance for ATP and ATQ:
```xml
<chemComp>
<id>ATP</id>
<name>ADENOSINE-5'-TRIPHOSPHATE</...# What
Currently http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/chem.xml lists in xml format information for every PDB chemical component. For instance for ATP and ATQ:
```xml
<chemComp>
<id>ATP</id>
<name>ADENOSINE-5'-TRIPHOSPHATE</name>
<formula>C10 H16 N5 O13 P3</formula>
<systematicName>[[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxy-oxolan-2-yl]methoxy-hydroxy-phosphoryl] phosphono hydrogen phosphate</systematicName>
<stereoSmiles>Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P@](O)(=O)O[P@@](O)(=O)O[P](O)(O)=O)[C@@H](O)[C@H]3O</stereoSmiles>
<nonStereoSmiles>Nc1ncnc2n(cnc12)[CH]3O[CH](CO[P](O)(=O)O[P](O)(=O)O[P](O)(O)=O)[CH](O)[CH]3O</nonStereoSmiles>
<InChi>InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1</InChi>
</chemComp>
<chemComp>
<id>ATQ</id>
<name>2-AMINOTHIAZOLINE</name><formula>C3 H6 N2 S</formula>
<systematicName>4,5-dihydro-1,3-thiazol-2-amine</systematicName>
<stereoSmiles>NC1=NCCS1</stereoSmiles>
<nonStereoSmiles>NC1=NCCS1</nonStereoSmiles>
<InChi>InChI=1S/C3H6N2S/c4-3-5-1-2-6-3/h1-2H2,(H2,4,5)</InChi>
</chemComp>
```
the process developed needs to be able to produce this file.
N.B. file starts:
```
<chemCompList>
<chemComp>
<id>000</id>
<name>methyl hydrogen carbonate</name>
<formula>C2 H4 O3</formula>
<systematicName>methyl hydrogen carbonate</systematicName>
<stereoSmiles>COC(O)=O</stereoSmiles>
<nonStereoSmiles>COC(O)=O</nonStereoSmiles>
<InChi>InChI=1S/C2H4O3/c1-5-2(3)4/h1H3,(H,3,4)</InChi>
</chemComp>
```
and ends with ```</chemCompList>```
# How
* related to issue #9 if the cif file could be processed and split creating `chem.xml` this would be good.
```Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/9Split wwPDB chemical component dictionary file into separate cif file for eac...2017-09-11T14:57:41ZOliver SmartSplit wwPDB chemical component dictionary file into separate cif file for each component## What
* The wwPDB chemical component dictionary is available as a single big (around 215MB) file each week.
* download link ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif
* details https://www.wwpdb.org/data/ccd
* this needs...## What
* The wwPDB chemical component dictionary is available as a single big (around 215MB) file each week.
* download link ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif
* details https://www.wwpdb.org/data/ccd
* this needs to be split into individual CCD files - one for each chemical component.
* these then need to be processed to sdf/pdb and images.
## How
* It might be sensible to do this by parsing the complete file using a cif parser - could then process each component using the tools developed to write sdf, pdb files and images. If so could address issue #10 at same time.
* Or it might be necessary to split the file into individual small files in a separate program without using cif parser each component starts with line `data_ABC` where `ABC` is the chem_comp.id (aka residue name).
** for instance http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/mmcif/001.cif
```
data_001
#
_chem_comp.id 001
_chem_comp.name "1-[2,2-DIFLUORO-2-(3,4,5-TRIMETHOXY-PHENYL)-ACETYL]-PIPERIDINE-2-CARBOXYLIC ACID 4-PHENYL-1-(3-PYRIDIN-3-YL-PROPYL)-BUTYL ESTER"
_chem_comp.type NON-POLYMER
_chem_comp.pdbx_type HETAIN
_chem_comp.formula "C35 H42 F2 N2 O6"
```
* could simply read file and look for lines starting `data_...` when line found close previous file and open a new one for the code `...`Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/8Write CML files: *needs to be checked*2017-10-01T19:13:29ZOliver SmartWrite CML files: *needs to be checked** PDBeChem produces CML files.
* CML http://www.xml-cml.org/ might be a bit unpopular
* Should be fairly easy to write using standard xml library?
* But if difficult we could drop CML but it should be easy.* PDBeChem produces CML files.
* CML http://www.xml-cml.org/ might be a bit unpopular
* Should be fairly easy to write using standard xml library?
* But if difficult we could drop CML but it should be easy.PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/6Write 2D images of PDB CCD molecule2017-10-01T19:13:29ZOliver SmartWrite 2D images of PDB CCD moleculeImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/5Handling heme2017-09-29T08:35:50ZOliver SmartHandling hemeThere are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit ...There are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit produces a warning line when parsing HEM:
```
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] WARNING: Accepted unusual valence(s): N(4); Metal was disconnected; Proton(s) added/removed
ccd_utils.test_write_pdb.test_inchikey_match_for_all_sample_cifs('FEDYMSUPMFCVOD-UJJXFSCMSA-N', 'KABFMIBPWCXCRK-RGGAHWMASA-L', 'check inchikeys match for HEM') ... FAIL
```
The initial image created by Qi's test is:
![HEM.img_withH.svg](/uploads/e25842c4fce19867ed1765fe4862831f/HEM.img_withH.svg)
The pubchem inchikey is KABFMIBPWCXCRK-UHFFFAOYSA-L so the initial part of the from RDKit one agrees but the last part does not.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/4Write pdb files for PDB CCD2017-09-21T14:18:05ZOliver SmartWrite pdb files for PDB CCD# What
* See #3 for sdf files
* in addition we need method to write old style PDB format files.
# How
* RDKit includes code to write PDB files
* Note that it is important the PDB files produce are well formed with correct atom nam...# What
* See #3 for sdf files
* in addition we need method to write old style PDB format files.
# How
* RDKit includes code to write PDB files
* Note that it is important the PDB files produce are well formed with correct atom names and residue names.
* In addition the occupancy and temperature factors should be well formed.
* probably also neccessary to include dummy CRYST1 card
* Testing should include comparison to existing files at PDBeChem as well as loading in coot, pymol, litemol.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smart