pdbe issueshttps://gitlab.ebi.ac.uk/groups/pdbe/-/issues2019-02-05T14:50:42Zhttps://gitlab.ebi.ac.uk/pdbe/web-components/ligand-env/-/issues/11Display long name of ligands2019-02-05T14:50:42ZLukas PravdaDisplay long name of ligands@mvaradi said: "Perhaps you could also show the long ligand names on tooltip? HET code might not be clear for some more obscure ligands"@mvaradi said: "Perhaps you could also show the long ligand names on tooltip? HET code might not be clear for some more obscure ligands"Lukas PravdaLukas Pravdahttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/5Handling heme2017-09-29T08:35:50ZOliver SmartHandling hemeThere are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit ...There are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit produces a warning line when parsing HEM:
```
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] WARNING: Accepted unusual valence(s): N(4); Metal was disconnected; Proton(s) added/removed
ccd_utils.test_write_pdb.test_inchikey_match_for_all_sample_cifs('FEDYMSUPMFCVOD-UJJXFSCMSA-N', 'KABFMIBPWCXCRK-RGGAHWMASA-L', 'check inchikeys match for HEM') ... FAIL
```
The initial image created by Qi's test is:
![HEM.img_withH.svg](/uploads/e25842c4fce19867ed1765fe4862831f/HEM.img_withH.svg)
The pubchem inchikey is KABFMIBPWCXCRK-UHFFFAOYSA-L so the initial part of the from RDKit one agrees but the last part does not.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/13Script to read wwPDB chemical component dictionary and split it to produce PD...2017-09-08T09:57:00ZOliver SmartScript to read wwPDB chemical component dictionary and split it to produce PDBeChem outputs# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* An...# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* Anticipate using a single process initially
* Performance Testing will be important
* how many PDB CCD fail?
* how long does the process take? How can it be parallelized to use than more processor
# task list of things to code
* Use description from http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/readme.htm
* and files in `/nfs/ftp/pub/databases/msd/pdbechem/`
* For each CCD write a file in:
- [x] `files/mmcif/` individual CCD.cif files for each component
- [x] `files/sdf/` Molfile (SDF) with ideal coordinates and hydrogen atoms
- [x] `files/sdf_nh/` Molfile (SDF) with ideal coordinates without hydrogen atoms
- [x] `files/sdf_r/` Molfile (SDF) with representative coordinates and hydrogen atoms
- [x] `files/sdf_r_nh/` Molfile (SDF) with representative coordinates without hydrogen atoms
- [x] `files/pdb/` PDB with ideal coordinates
- [x] `files/pdb_r` PDB with representative coordinates.
- [x] `files/cml` CML format ideal coorinates
- [x] `files/xyz` (not mentioned in `readme.html `) xyz format ideal (see https://en.wikipedia.org
/wiki/XYZ_file_format)
- [x] `files/xyz_r` same for representative coordinates.
- [x] images svg - 3 different images (see below)
- [x] images gif - convert the svg images.
* overall write
- [x] `chem_comp.list` a simple list of the chem_comp_id's one per line
- [x] `chem.xml` an xml file for all chem_comps
- [x] `readme.htm` start with existing
- [x] tar.gz files for each of the subdirectories in `files` and `images` directories.
* new:
- [x] use logging warn for any problematic inchikey like HEM and CDL.
- [ ] `divided` subdirectory where all files for a chem_component are provided in a separate directory for that component. So for ATP the `divided/A/ATP/` directory will contain .cif, four .sdf files, .... do using softlinks. *Need a separate issue for this.* issue #23Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/8Write CML files: *needs to be checked*2017-10-01T19:13:29ZOliver SmartWrite CML files: *needs to be checked** PDBeChem produces CML files.
* CML http://www.xml-cml.org/ might be a bit unpopular
* Should be fairly easy to write using standard xml library?
* But if difficult we could drop CML but it should be easy.* PDBeChem produces CML files.
* CML http://www.xml-cml.org/ might be a bit unpopular
* Should be fairly easy to write using standard xml library?
* But if difficult we could drop CML but it should be easy.PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/web-components/ligand-env/-/issues/12Change pointer based on the context2019-02-05T17:10:36ZLukas PravdaChange pointer based on the context@mvaradi said: "When going one level below the main (i.e. clicking on something that is part of the ligand), it would be nice to have a quick and obvious way of going back. It was not clear to me that the home button would do the trick. ...@mvaradi said: "When going one level below the main (i.e. clicking on something that is part of the ligand), it would be nice to have a quick and obvious way of going back. It was not clear to me that the home button would do the trick. I also didn't know what happened when I suddenly got down to the sub-level after merrily clicking around :) (I.e. I first clicked nodes that didn't take me anywhere, and then suddenly I clicked on a node that did, and I was like "damn, how do I get back")"Lukas PravdaLukas Pravdahttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/19Develop utility command line script to read single cif and write coordinate f...2017-09-27T13:21:16ZOliver SmartDevelop utility command line script to read single cif and write coordinate files/images etc.## What
* Want a command line script so a user can read in any ccd cif and
* write a sdf if they want to - with options for ideal/model coordinates, hydrogen, alias on off
* write a pdb file with options ...
* write an image w...## What
* Want a command line script so a user can read in any ccd cif and
* write a sdf if they want to - with options for ideal/model coordinates, hydrogen, alias on off
* write a pdb file with options ...
* write an image with options ....
* Display properties about the molecule - lipinski things - num rings, rotable bonds etc. etc.
## How
* The command line arguments and the help text for each must be proposed as a comment on this page. **The proposal must be agreed to before any coding is done!**
* Script is to use argparse
* All points except 2 in https://ajminich.com/2013/08/01/10-things-i-wish-every-python-script-did/ must be followed.
* Can you unit test a command line script?
* If there are exceptions (file does not exist etc.) catch them produce a sensible error message to the user and call `sys.exit(1)` to indicate to the calling process there was problem.PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/15test PDB cif directory name should be clearer2017-09-21T14:18:05ZOliver Smarttest PDB cif directory name should be clearer## What
* currently the directory with test PDB CCD mmcif files is called `data/cif`
* lets rename it to `data/pdb_ccd_mmcif_test_files`
* Consider how many times does the directory name appear in the code?
* Read wikipedia page...## What
* currently the directory with test PDB CCD mmcif files is called `data/cif`
* lets rename it to `data/pdb_ccd_mmcif_test_files`
* Consider how many times does the directory name appear in the code?
* Read wikipedia page https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
* Please improve the code.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test version2017-05-31https://gitlab.ebi.ac.uk/pdbe/web-components/ligand-env/-/issues/13rethink border color of the graphical elements2019-03-21T17:02:34ZLukas Pravdarethink border color of the graphical elements@mvaradi said: "I'm not sure why some circles have grey borders, and others are black - the legends don't mention reasons for this"@mvaradi said: "I'm not sure why some circles have grey borders, and others are black - the legends don't mention reasons for this"Lukas PravdaLukas Pravdahttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/16Test fails in directories other than ccd_utils2017-09-21T14:18:05ZIjaz AhmadTest fails in directories other than ccd_utilsTest works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
-----------------------------...Test works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
----------------------------------------------------------------------
Ran 36 tests in 0.090s
OK
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ cd ..
(my-rdkit-env) [qyuan@ch-qyuan-z440 pdbe]$ nosetests ccd_utils/test_pdb_chemical_components.py
ERROR: Failure: ValueError (cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/qyuan/anaconda2/envs/my-rdkit-env/lib/python2.7/site-packages/nose/loader.py", line 251, in generate
for test in g():
File "/home/qyuan/pdbe/ccd_utils/test_pdb_chemical_components.py", line 99, in test_load_hem_from_cif
hem = PdbChemicalComponents(file_name=cif_filename('HEM'), cif_parser=cif_parser)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 80, in __init__
self.read_ccd_from_cif_file(file_name)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 317, in read_ccd_from_cif_file
raise ValueError('cannot read PDB chemical components from {} as file not found'.format(file_name))
ValueError: cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found
```Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/web-components/ligand-env/-/issues/14make sure text does not clip the border of elements2019-02-05T17:10:33ZLukas Pravdamake sure text does not clip the border of elements@mvaradi said: "I would try to ensure that the text never clips with the shape outline - currently with circle shaped nodes you have text overlapping with the borders"@mvaradi said: "I would try to ensure that the text never clips with the shape outline - currently with circle shaped nodes you have text overlapping with the borders"Lukas PravdaLukas Pravdahttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/17process_components_cif script to read complete components.cif and produce P...2017-09-27T08:53:24ZOliver Smartprocess_components_cif script to read complete components.cif and produce PDBeChem ftp areaIssue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no...Issue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no ideal coordinates. Do so now:
* make sure script does not fall over on problems but log's error and continues
* for each problem deal with it (adding unit test were possible).
-------------------
12 September 2017
# Summary of progress and outstanding issues.
* Have got script that produces required output in a reasonable way.
* needs some clean up and further work of:
- [x] need to look into inchi mismatch observation.
- [x] improve command line options.
- [ ] logging output - would be good to list number of unsuccessful sdf, pdb, images etc.
- [x] how gif images are produced (avoid svg conversion)
- [x] need to look into RDKit Invariant Violation
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/18Package code so it can be installed with pip2017-09-19T19:06:24ZOliver SmartPackage code so it can be installed with pip* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-github* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-githubImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/20Proof of concept: Mogul report on PDB-CCD ideal coordinates2017-11-18T16:36:27ZOliver SmartProof of concept: Mogul report on PDB-CCD ideal coordinates# idea
* For a given wwPDB chemical components definition (PDB-CCD) run the [Mogul tool](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/mogul/using) the chemical description from PDB-CCD
* Produce an html format report for th...# idea
* For a given wwPDB chemical components definition (PDB-CCD) run the [Mogul tool](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/mogul/using) the chemical description from PDB-CCD
* Produce an html format report for the PDB-CCD ideal coordinates
* Use [Global Phasing Buster-Report](https://www.globalphasing.com/buster/wiki/index.cgi?BusterReport) detailed ligand report [example](http://grade.globalphasing.org/tut/erice_workshop/introtutorial/buster/00_MapOnly.report/ligand/detailedreport_A_501_.html#atableBOND) as a starting point - want coloured 2D diagrams to show which bonds/angles/torsions/rings are outliers.
* use ccd_utils with rdkit code
* for Mogul use the [CSD Python API](https://downloads.ccdc.cam.ac.uk/documentation/API/) rather than directly running MogulOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/21write xyz format file for PDB-CCD2017-08-25T15:59:49ZOliver Smartwrite xyz format file for PDB-CCDPDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd...PDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/EOH.xyz
9
EOH
C 0.0070 -0.5690 0.0000
C -1.2850 0.2500 -0.0000
O 1.1300 0.3150 -0.0000
H 0.0390 -1.1970 0.8900
H 0.0390 -1.1970 -0.8900
H -1.3170 0.8780 0.8900
H -1.3170 0.8780 -0.8900
H -2.1420 -0.4240 0.0000
H 1.9860 -0.1370 0.0000
```
* model coordinates:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz_r/EOH.xyz
9
EOH
C 15.2120 49.1980 7.4910
C 16.0690 50.3860 7.1040
O 15.8610 48.1850 8.2560
H 14.3750 49.5790 8.0940
H 14.8580 48.7310 6.5600
H 15.4670 51.0980 6.5200
H 16.4420 50.8800 8.0130
H 16.9200 50.0420 6.4980
H 15.2440 47.4880 8.4470
```
* Currently elements are upper case:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/FES.xyz
4
FES
FE 0.0000 -0.2130 -1.5310
FE 0.0000 -0.2130 1.5310
S 1.4610 0.3720 0.0000
S -1.4610 0.3720 0.0000
```
but better if the iron atoms are written as Fe.
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/22chem.xml add useful rdkit descriptors like number of rotatable bonds2018-11-22T10:02:11ZOliver Smartchem.xml add useful rdkit descriptors like number of rotatable bonds* from issue #10
* good idea to output information from RDKit like number of rotatable bonds (wrote temporary jiffy for this [temp_jiffy_number_rotatable_bonds_jiffy.py](temp_jiffy_number_rotatable_bonds_jiffy.py)
* there are many other...* from issue #10
* good idea to output information from RDKit like number of rotatable bonds (wrote temporary jiffy for this [temp_jiffy_number_rotatable_bonds_jiffy.py](temp_jiffy_number_rotatable_bonds_jiffy.py)
* there are many other descriptors available - see http://www.rdkit.org/Python_Docs/rdkit.Chem.Descriptors-module.htmlhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/24improve fragment searching method & library2017-10-02T14:38:24ZOliver Smartimprove fragment searching method & library* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it ...* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it would be useful to generate pictures of fragment library molecules
- [x] the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
- [x] The tools need to be usable by other people #18 needs to be done.
- [x] needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns `name, smarts?, query, comment`.
- [ ] *peptide* fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial *amide*
- [x] *steroid* needs to pick up all steroids.
- [x] *deoxyribose* fragment needs to not pick up ribose.
- [ ] *pyranose* fragment is wrong
- [ ] A.M. wants to add additional fragments - provide tools for him to be able to take over the work.
## it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.
* This would help developing checking the fragment library but is a reasonably big task.
* thinking of a simple interactive command line 'server'
* that processes the components.cif holding all PDBCCD in memory
* then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
* does a substructure search against the PDBCCD
* then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment
## Question: do we really want SMILES substrings to define fragments?
* currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. [Daylight>SMARTS Examples](http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) **27-Sept-2017**PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/25Rethink on 2D chemical diagram images2017-09-20T07:56:31ZOliver SmartRethink on 2D chemical diagram images* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrog...* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrogen images are not very good and the ftp area images are not used on the PDBe page
* just produce the svg images.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/29Document the existing chemistry-related release processes2017-09-28T11:40:32ZOliver SmartDocument the existing chemistry-related release processesWith Stephen prepare a confluence page documenting the current weekly release process for ligands
* script(s) that run the process
* what programs are run
* what the outcome is - what files are produced to which directories, what goes in...With Stephen prepare a confluence page documenting the current weekly release process for ligands
* script(s) that run the process
* what programs are run
* what the outcome is - what files are produced to which directories, what goes into the database.
* timingsPDBeChem Backend Processing: get into preproductionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/30Agree exact aims for the project and timecourse2017-09-28T11:40:32ZOliver SmartAgree exact aims for the project and timecourse* exactly which processes are to be replaced.
* requires issue #29 to be done first.
* exactly which processes are to be replaced.
* requires issue #29 to be done first.
PDBeChem Backend Processing: get into preproductionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/33fragment searching library: check current fragments and add any new2017-10-17T10:19:31ZOliver Smartfragment searching library: check current fragments and add any newIn previous work: issue #24 improved the fragment searching library to allow use of SMARTS and connectivity only (LIKE) SMILES and got porphin, porphin-like, steroid and amide fragments working.
This issue is to check the fragment libra...In previous work: issue #24 improved the fragment searching library to allow use of SMARTS and connectivity only (LIKE) SMILES and got porphin, porphin-like, steroid and amide fragments working.
This issue is to check the fragment library sufficiently to get it ready for initial production.
The current fragment searching library is a tsv file and can be found:
[fragment_library.tsv](pdbeccdutils/data/fragment_library.tsv)
includes some rather strange entries:
* [ ] `peptide` is not a peptide at all. Do we want an `amino acid`?
* [ ] `prosto` - do know what this is meant to be. There are no hits with current PDBeChem fragment search
* [ ] `acridone` - 2 hits with current PDBeChem does not work in ccd_utils
* [ ] `pyranose` and `furanose` vs ribose???
* [ ] additional
It would be worth checking that each fragment in the tsv file works. There is a column `comment` that is currently `unchecked`.
# Questions
* what are the named fragments **for**? Currently they are used in PDBeChem search but they could be used for a number of other things:
* probes for understanding interactions
* to produce images naming the different parts of a PDB-CCD
* Can one set of fragments fulfil all the different purposes? For instance the current `amide` SMARTS (see #24) matches carboxyamide sidechain like ASN and GLN and peptide bonds. If one is interested in probe then do we want C(=O)[NH2]PDBeChem Backend Processing: get into preproduction