|
|
# Development documentation: RDKit molecule from PDB CCD definition
|
|
|
* *25 May 2017*: got initial version working in commit [26111ee5 First version of PdbChemicalComponentsRDKit that works (a bit!)](https://gitlab.com/pdbe/ccd_utils/commit/26111ee59f745bea16cdf37013d84540b752bf01) testing that the inchikey from the RDKit molecule matches that from stored in the cif file for 14 example ccd cif files in [/data/cif/](https://gitlab.com/pdbe/ccd_utils/tree/master/data/cif) directory. Test results (can be found in the commit message), show that that:
|
|
|
* in inchikeys match for 007, FES, V55 and EOH
|
|
|
* in other cases get `WARNING: Omitted undefined stereo` from RDKit. So most other problems with chirality - would expect the first 14 characters of the inchikeys to match. This is true to 009 ATP GLC GLU MAN SAC SEH
|
|
|
* CMO, NAD and HEM has complete mismatch
|
|
|
* for CMO (carbon monoxide) the likely problem is that do not process the charges smiles is `[C-]#[O+]`
|
|
|
* HEM (heme) I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-
|
|
|
* NAD: in the test my rdkit got :pen_fountain:
|
|
|
* Sameer's prototype code [sameer_prototype_chem.py](https://gitlab.com/pdbe/ccd_utils/blob/master/sameer_prototype_chem.py) has procedure for setting chirality from the chem_comp_atom.pdbx_stereo_config value
|
|
|
|
|
|
* *27 May 2017*:
|
|
|
* Try the procedure from [sameer_prototype_chem.py](https://gitlab.com/pdbe/ccd_utils/blob/master/sameer_prototype_chem.py) results:
|
|
|
* commit [d4cd4046](https://gitlab.com/pdbe/ccd_utils/commit/d4cd40468c2fc999be8a92bd4fc095f5b27491bb) "Adapt procedure from sameer_prototype_chem.py to set the chirality of the rdkit atoms."
|
|
|
* failure [commit message](https://gitlab.com/pdbe/ccd_utils/commit/d4cd40468c2fc999be8a92bd4fc095f5b27491bb) includes details.
|
|
|
* the change sorts the inchikey match for GLU and SAC but GLC MAN remain problematic with end of inchikey not matching. GLU and SAC are simple with one chiral centre.
|
|
|
* likely reason is that the pdbx_stereo_config is based on a different atom order from rdkit.
|
|
|
* Could try to understand pdbx_chirality or randomly iterate through chiral centres flipping until chiral centres match but neither is a good idea! Instead:
|
|
|
## Try using ~~assignChiralTypesFrom3D~~ AssignAtomChiralTagsFromStructure
|
|
|
* https://sourceforge.net/p/rdkit/mailman/message/35774134/ has Greg Landrum's suggestion:
|
|
|
* sanitizeMol
|
|
|
* Kekulize(mol);
|
|
|
* assignChiralTypesFrom3D(mol);
|
|
|
* assignStereochemistry(mol,true,true)
|
|
|
* ~~compute2DCoords(mol)~~
|
|
|
* this should also get rid of `WARNING: Omitted undefined stereo` message as RDKit would do its own stereo centre assignment.
|
|
|
* will need ideal coordinates from ccd for this, done in [Commit da94277f](https://gitlab.com/pdbe/ccd_utils/commit/da94277fb9900b4d878ed9985edd160601ecde8f)
|
|
|
* **done** and it works
|
|
|
* found that the python method is AssignAtomChiralTagsFromStructure rather than assignChiralTypesFrom3D
|
|
|
* [commit 70e50a91](https://gitlab.com/pdbe/ccd_utils/commit/70e50a914d9549e8739b753dbfce182250129389#970d1f070e53958f9808344f5522ca910295c71f_16_16) - message includes test results.
|
|
|
* summary - ATP and the sugars GLC and MAN now give consistent inchikeys only CMO, HEM, and NAD still have issues.
|
|
|
|
|
|
## Supply the formal charges from the CCD to RDKit
|
|
|
* CCD includes a formal (integer) charge on each atom. Setting this charge should sort the issue for carbon monoxide (hopefully)
|
|
|
* get charges from ccd for this, done in [Commit 00435dee](https://gitlab.com/pdbe/ccd_utils/commit/00435deef5bc07053b08679fe64999a3a24109a7)
|
|
|
* Then supply these to rdkit:
|
|
|
* [Commit 1df92212](https://gitlab.com/pdbe/ccd_utils/commit/1df92212d35ce7a41389900296261d5a3ebb6cdf)
|
|
|
* commit message shows the encouraging result all the test cases other than HEM now work (so CMO and NAD are fixed).
|
|
|
* Can now use the Greg Landrum recommendation of Chem.Sanitize to avoid problems down stream [Commit 360eca41](https://gitlab.com/pdbe/ccd_utils/commit/360eca41aa95b89616ea9a8a135afec215aa0497)
|
|
|
|
|
|
## Conclusion
|
|
|
*28 May 2017*
|
|
|
* First version of pdb_chemical_components_rdkit.py written and tested.
|
|
|
* deals with basic setup of the rdkit molecule.
|
|
|
* Tested by comparison of the inchikey from rdkit with that from the PDB CCD cif - this works for the limited number of test cases used so far, with the exception of HEM.
|
|
|
* Next stage is to extend the class to do something - in the first instance write a sdf file, see [use RDKit to write sdf file for PDB CCD](rdkit-sdf-for-ccd) |
|
|
\ No newline at end of file |
|
|
Content moved to issue #2. |
|
|
\ No newline at end of file |