pdbe issueshttps://gitlab.ebi.ac.uk/groups/pdbe/-/issues2018-03-28T06:56:43Zhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/1Class to parse PDB chemical component cif definitions: initial version2018-03-28T06:56:43ZOliver SmartClass to parse PDB chemical component cif definitions: initial versionRequirement is to produce a class that parses PDB chemical components cif files.
It should not be coupled to any cif parser (PDBeCIF or the one used in OneDep) but instead use any and provide a user interface that does not vary so tha...Requirement is to produce a class that parses PDB chemical components cif files.
It should not be coupled to any cif parser (PDBeCIF or the one used in OneDep) but instead use any and provide a user interface that does not vary so that same code can be shared between validator and PDBeChem processes.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/2Initial version of class to create RDKit molecule from a PDB CCD definition2017-05-29T14:47:22ZOliver SmartInitial version of class to create RDKit molecule from a PDB CCD definitionIdea create daughter class of [pdb_chemical_components.py](pdb_chemical_components.py) that would parse CCD cif file but afterwards create an RDKit molecule. By keeping functionality separate can use the original [pdb_chemical_component...Idea create daughter class of [pdb_chemical_components.py](pdb_chemical_components.py) that would parse CCD cif file but afterwards create an RDKit molecule. By keeping functionality separate can use the original [pdb_chemical_components.py](pdb_chemical_components.py) without importing RDKit if desired.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/3Use RDKit to write sdf file for PDB CCD2018-03-28T06:56:43ZOliver SmartUse RDKit to write sdf file for PDB CCD## What
* mol and the related sdf file formats are the standard way to pass molecules with chemical information between application
* Currently pdbechem provides 4 different molfile/sdf for each chemical component (with/without hydrogen ...## What
* mol and the related sdf file formats are the standard way to pass molecules with chemical information between application
* Currently pdbechem provides 4 different molfile/sdf for each chemical component (with/without hydrogen atoms and ideal/model coordinates), see http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/readme.htm
* This feature request to get RDKit to be able to write each of the 4 different sdf files.
* It is possible to include atom names and other information in sdf files. This may be useful. For an example see rcsb:
## How
* should be relatively easy method in [pdb_chemical_components_rdkit.py](pdb_chemical_components_rdkit.py)
* Note that the adding/removing hydrogen atoms should not effect the main self.rdkit_mol object in the class
## tasks
This is a big issue so lets separate out into checklist:
- [x] write sdf ideal with hydrogen atoms and alias
- [x] write different sdf model and ideal
- [x] write sdf without hydrogen atoms
- [x] turn on/off atom alias in the sdf.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/4Write pdb files for PDB CCD2017-09-21T14:18:05ZOliver SmartWrite pdb files for PDB CCD# What
* See #3 for sdf files
* in addition we need method to write old style PDB format files.
# How
* RDKit includes code to write PDB files
* Note that it is important the PDB files produce are well formed with correct atom nam...# What
* See #3 for sdf files
* in addition we need method to write old style PDB format files.
# How
* RDKit includes code to write PDB files
* Note that it is important the PDB files produce are well formed with correct atom names and residue names.
* In addition the occupancy and temperature factors should be well formed.
* probably also neccessary to include dummy CRYST1 card
* Testing should include comparison to existing files at PDBeChem as well as loading in coot, pymol, litemol.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/5Handling heme2017-09-29T08:35:50ZOliver SmartHandling hemeThere are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit ...There are problems with HEM. See #2 HEM (heme): initial reaction _I think the CCD definition is wrong and pubchem defines it correctly https://pubchem.ncbi.nlm.nih.gov/compound/444098 with Fe2+ and the two nitrogen atoms as N-_
RDKit produces a warning line when parsing HEM:
```
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] Explicit valence for atom # 39 N, 4, is greater than permitted
[10:52:01] WARNING: Accepted unusual valence(s): N(4); Metal was disconnected; Proton(s) added/removed
ccd_utils.test_write_pdb.test_inchikey_match_for_all_sample_cifs('FEDYMSUPMFCVOD-UJJXFSCMSA-N', 'KABFMIBPWCXCRK-RGGAHWMASA-L', 'check inchikeys match for HEM') ... FAIL
```
The initial image created by Qi's test is:
![HEM.img_withH.svg](/uploads/e25842c4fce19867ed1765fe4862831f/HEM.img_withH.svg)
The pubchem inchikey is KABFMIBPWCXCRK-UHFFFAOYSA-L so the initial part of the from RDKit one agrees but the last part does not.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/6Write 2D images of PDB CCD molecule2017-10-01T19:13:29ZOliver SmartWrite 2D images of PDB CCD moleculeImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/9Split wwPDB chemical component dictionary file into separate cif file for eac...2017-09-11T14:57:41ZOliver SmartSplit wwPDB chemical component dictionary file into separate cif file for each component## What
* The wwPDB chemical component dictionary is available as a single big (around 215MB) file each week.
* download link ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif
* details https://www.wwpdb.org/data/ccd
* this needs...## What
* The wwPDB chemical component dictionary is available as a single big (around 215MB) file each week.
* download link ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif
* details https://www.wwpdb.org/data/ccd
* this needs to be split into individual CCD files - one for each chemical component.
* these then need to be processed to sdf/pdb and images.
## How
* It might be sensible to do this by parsing the complete file using a cif parser - could then process each component using the tools developed to write sdf, pdb files and images. If so could address issue #10 at same time.
* Or it might be necessary to split the file into individual small files in a separate program without using cif parser each component starts with line `data_ABC` where `ABC` is the chem_comp.id (aka residue name).
** for instance http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/mmcif/001.cif
```
data_001
#
_chem_comp.id 001
_chem_comp.name "1-[2,2-DIFLUORO-2-(3,4,5-TRIMETHOXY-PHENYL)-ACETYL]-PIPERIDINE-2-CARBOXYLIC ACID 4-PHENYL-1-(3-PYRIDIN-3-YL-PROPYL)-BUTYL ESTER"
_chem_comp.type NON-POLYMER
_chem_comp.pdbx_type HETAIN
_chem_comp.formula "C35 H42 F2 N2 O6"
```
* could simply read file and look for lines starting `data_...` when line found close previous file and open a new one for the code `...`Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/10Create chem.xml for all components2017-08-29T17:00:52ZOliver SmartCreate chem.xml for all components# What
Currently http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/chem.xml lists in xml format information for every PDB chemical component. For instance for ATP and ATQ:
```xml
<chemComp>
<id>ATP</id>
<name>ADENOSINE-5'-TRIPHOSPHATE</...# What
Currently http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/chem.xml lists in xml format information for every PDB chemical component. For instance for ATP and ATQ:
```xml
<chemComp>
<id>ATP</id>
<name>ADENOSINE-5'-TRIPHOSPHATE</name>
<formula>C10 H16 N5 O13 P3</formula>
<systematicName>[[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxy-oxolan-2-yl]methoxy-hydroxy-phosphoryl] phosphono hydrogen phosphate</systematicName>
<stereoSmiles>Nc1ncnc2n(cnc12)[C@@H]3O[C@H](CO[P@](O)(=O)O[P@@](O)(=O)O[P](O)(O)=O)[C@@H](O)[C@H]3O</stereoSmiles>
<nonStereoSmiles>Nc1ncnc2n(cnc12)[CH]3O[CH](CO[P](O)(=O)O[P](O)(=O)O[P](O)(O)=O)[CH](O)[CH]3O</nonStereoSmiles>
<InChi>InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1</InChi>
</chemComp>
<chemComp>
<id>ATQ</id>
<name>2-AMINOTHIAZOLINE</name><formula>C3 H6 N2 S</formula>
<systematicName>4,5-dihydro-1,3-thiazol-2-amine</systematicName>
<stereoSmiles>NC1=NCCS1</stereoSmiles>
<nonStereoSmiles>NC1=NCCS1</nonStereoSmiles>
<InChi>InChI=1S/C3H6N2S/c4-3-5-1-2-6-3/h1-2H2,(H2,4,5)</InChi>
</chemComp>
```
the process developed needs to be able to produce this file.
N.B. file starts:
```
<chemCompList>
<chemComp>
<id>000</id>
<name>methyl hydrogen carbonate</name>
<formula>C2 H4 O3</formula>
<systematicName>methyl hydrogen carbonate</systematicName>
<stereoSmiles>COC(O)=O</stereoSmiles>
<nonStereoSmiles>COC(O)=O</nonStereoSmiles>
<InChi>InChI=1S/C2H4O3/c1-5-2(3)4/h1H3,(H,3,4)</InChi>
</chemComp>
```
and ends with ```</chemCompList>```
# How
* related to issue #9 if the cif file could be processed and split creating `chem.xml` this would be good.
```Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/11Tool for finding "fragments" in each CCD molecule.2017-09-14T15:35:12ZOliver SmartTool for finding "fragments" in each CCD molecule.## What
Requirements From PDBe confluence: **16 June 2016**
SV wants a tool to produce a file that lists the fragments present in each of the chemical compounds:
* read in the chemical component definition cif file.
* read in...## What
Requirements From PDBe confluence: **16 June 2016**
SV wants a tool to produce a file that lists the fragments present in each of the chemical compounds:
* read in the chemical component definition cif file.
* read in file smi.txt that contains lines like:
```
cyclopentane:C1CCCC1
cyclopropane:C1CC1
cytosine:C1=CNC(NC1N)O
```
* use rdkit to find which of the fragments is in the ccd.
* write results as a csv format file with the contents:
3 letter code eg. "ATP", fragment name from smi.txt, atom names comma delimited e.g. "C1,C2,C3,C4,O5"
* Producing this tool is a priority.
* name for tool ccd_find_fragments.py (provisional)Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/12Add DMSO and Sildenafil to list of test CCD cif files2017-09-21T14:18:05ZOliver SmartAdd DMSO and Sildenafil to list of test CCD cif files* In previous work I found that SOx groups could cause problems.
* Please look up [DMSO](https://en.wikipedia.org/wiki/Dimethyl_sulfoxide) and [Sildenafil (Viagra)](https://en.wikipedia.org/wiki/Sildenafil) in PDBeChem and add to directo...* In previous work I found that SOx groups could cause problems.
* Please look up [DMSO](https://en.wikipedia.org/wiki/Dimethyl_sulfoxide) and [Sildenafil (Viagra)](https://en.wikipedia.org/wiki/Sildenafil) in PDBeChem and add to directory with test cif files.
* How do the molecules perform in current tests?
* When committing the files you must include where the files were obtained (exact url) in the commit message.
* Include reference to this issue number in the commit message. See https://docs.gitlab.com/ee/user/project/issues/automatic_issue_closing.htmlImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/13Script to read wwPDB chemical component dictionary and split it to produce PD...2017-09-08T09:57:00ZOliver SmartScript to read wwPDB chemical component dictionary and split it to produce PDBeChem outputs# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* An...# What
This is the next step once all components in issues: #3 #4 #6 #8 (?) #9 #10 have been developed as classes with unit tests.
All of this will be put together to write a program to do the complete job described in %1
# How
* Anticipate using a single process initially
* Performance Testing will be important
* how many PDB CCD fail?
* how long does the process take? How can it be parallelized to use than more processor
# task list of things to code
* Use description from http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/readme.htm
* and files in `/nfs/ftp/pub/databases/msd/pdbechem/`
* For each CCD write a file in:
- [x] `files/mmcif/` individual CCD.cif files for each component
- [x] `files/sdf/` Molfile (SDF) with ideal coordinates and hydrogen atoms
- [x] `files/sdf_nh/` Molfile (SDF) with ideal coordinates without hydrogen atoms
- [x] `files/sdf_r/` Molfile (SDF) with representative coordinates and hydrogen atoms
- [x] `files/sdf_r_nh/` Molfile (SDF) with representative coordinates without hydrogen atoms
- [x] `files/pdb/` PDB with ideal coordinates
- [x] `files/pdb_r` PDB with representative coordinates.
- [x] `files/cml` CML format ideal coorinates
- [x] `files/xyz` (not mentioned in `readme.html `) xyz format ideal (see https://en.wikipedia.org
/wiki/XYZ_file_format)
- [x] `files/xyz_r` same for representative coordinates.
- [x] images svg - 3 different images (see below)
- [x] images gif - convert the svg images.
* overall write
- [x] `chem_comp.list` a simple list of the chem_comp_id's one per line
- [x] `chem.xml` an xml file for all chem_comps
- [x] `readme.htm` start with existing
- [x] tar.gz files for each of the subdirectories in `files` and `images` directories.
* new:
- [x] use logging warn for any problematic inchikey like HEM and CDL.
- [ ] `divided` subdirectory where all files for a chem_component are provided in a separate directory for that component. So for ATP the `divided/A/ATP/` directory will contain .cif, four .sdf files, .... do using softlinks. *Need a separate issue for this.* issue #23Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/14Document the current installation procedure2017-09-21T14:18:05ZOliver SmartDocument the current installation procedure* Currently the ccd_utils project needs a parallel check out of PDBeCIF project. https://github.com/glenveegee/PDBeCIF.git
* How to do this should be explained in the [README.md](README.md) file.
* Add a section *Installation instructions** Currently the ccd_utils project needs a parallel check out of PDBeCIF project. https://github.com/glenveegee/PDBeCIF.git
* How to do this should be explained in the [README.md](README.md) file.
* Add a section *Installation instructions*Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/15test PDB cif directory name should be clearer2017-09-21T14:18:05ZOliver Smarttest PDB cif directory name should be clearer## What
* currently the directory with test PDB CCD mmcif files is called `data/cif`
* lets rename it to `data/pdb_ccd_mmcif_test_files`
* Consider how many times does the directory name appear in the code?
* Read wikipedia page...## What
* currently the directory with test PDB CCD mmcif files is called `data/cif`
* lets rename it to `data/pdb_ccd_mmcif_test_files`
* Consider how many times does the directory name appear in the code?
* Read wikipedia page https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
* Please improve the code.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test version2017-05-31https://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/16Test fails in directories other than ccd_utils2017-09-21T14:18:05ZIjaz AhmadTest fails in directories other than ccd_utilsTest works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
-----------------------------...Test works in ccd_utils, but if we go upper directory, then we get problems.
```
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ nosetests test_pdb_chemical_components.py
....................................
----------------------------------------------------------------------
Ran 36 tests in 0.090s
OK
(my-rdkit-env) [qyuan@ch-qyuan-z440 ccd_utils]$ cd ..
(my-rdkit-env) [qyuan@ch-qyuan-z440 pdbe]$ nosetests ccd_utils/test_pdb_chemical_components.py
ERROR: Failure: ValueError (cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/qyuan/anaconda2/envs/my-rdkit-env/lib/python2.7/site-packages/nose/loader.py", line 251, in generate
for test in g():
File "/home/qyuan/pdbe/ccd_utils/test_pdb_chemical_components.py", line 99, in test_load_hem_from_cif
hem = PdbChemicalComponents(file_name=cif_filename('HEM'), cif_parser=cif_parser)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 80, in __init__
self.read_ccd_from_cif_file(file_name)
File "/home/qyuan/pdbe/ccd_utils/pdb_chemical_components.py", line 317, in read_ccd_from_cif_file
raise ValueError('cannot read PDB chemical components from {} as file not found'.format(file_name))
ValueError: cannot read PDB chemical components from data/pdb_ccd_mmcif_test_files/HEM.cif as file not found
```Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/17process_components_cif script to read complete components.cif and produce P...2017-09-27T08:53:24ZOliver Smartprocess_components_cif script to read complete components.cif and produce PDBeChem ftp areaIssue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no...Issue #13 have got the basic functioning of process_components_cif_cli.py with a simple test with 4 chemical components.
Now exercise script on the real thing. We have not worried about edge cases so far - what happens if there are no ideal coordinates. Do so now:
* make sure script does not fall over on problems but log's error and continues
* for each problem deal with it (adding unit test were possible).
-------------------
12 September 2017
# Summary of progress and outstanding issues.
* Have got script that produces required output in a reasonable way.
* needs some clean up and further work of:
- [x] need to look into inchi mismatch observation.
- [x] improve command line options.
- [ ] logging output - would be good to list number of unsuccessful sdf, pdb, images etc.
- [x] how gif images are produced (avoid svg conversion)
- [x] need to look into RDKit Invariant Violation
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/18Package code so it can be installed with pip2017-09-19T19:06:24ZOliver SmartPackage code so it can be installed with pip* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-github* https://python-packaging.readthedocs.io/en/latest/index.html
* https://stackoverflow.com/questions/8247605/configuring-so-that-pip-install-can-work-from-githubImprove PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/21write xyz format file for PDB-CCD2017-08-25T15:59:49ZOliver Smartwrite xyz format file for PDB-CCDPDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd...PDBeChem currently provides xyz format files
https://en.wikipedia.org/wiki/XYZ_file_format
not sure how useful they are is but easy to write so just implement.
# current files:
* ideal coordinates
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/EOH.xyz
9
EOH
C 0.0070 -0.5690 0.0000
C -1.2850 0.2500 -0.0000
O 1.1300 0.3150 -0.0000
H 0.0390 -1.1970 0.8900
H 0.0390 -1.1970 -0.8900
H -1.3170 0.8780 0.8900
H -1.3170 0.8780 -0.8900
H -2.1420 -0.4240 0.0000
H 1.9860 -0.1370 0.0000
```
* model coordinates:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz_r/EOH.xyz
9
EOH
C 15.2120 49.1980 7.4910
C 16.0690 50.3860 7.1040
O 15.8610 48.1850 8.2560
H 14.3750 49.5790 8.0940
H 14.8580 48.7310 6.5600
H 15.4670 51.0980 6.5200
H 16.4420 50.8800 8.0130
H 16.9200 50.0420 6.4980
H 15.2440 47.4880 8.4470
```
* Currently elements are upper case:
```
cat /nfs/ftp/pub/databases/msd/pdbechem/files/xyz/FES.xyz
4
FES
FE 0.0000 -0.2130 -1.5310
FE 0.0000 -0.2130 1.5310
S 1.4610 0.3720 0.0000
S -1.4610 0.3720 0.0000
```
but better if the iron atoms are written as Fe.
Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/23PDBeChem ftp Output: Supply divided directory2018-10-13T09:14:48ZOliver SmartPDBeChem ftp Output: Supply divided directory* From issue #13
* Currently it is difficult to navigate the ftp site because it takes around 40 seconds to see http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/images/large/ as it contains 24936 files.
* Instead provide a divided direct...* From issue #13
* Currently it is difficult to navigate the ftp site because it takes around 40 seconds to see http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/images/large/ as it contains 24936 files.
* Instead provide a divided directory with all the files for an individual chemical component.
* so for ATP directory `divided/A/ATP` would contain:
```
divided/A/ATP/ATP.cif
divided/A/ATP/coordinates/ideal/ATP.sdf
divided/A/ATP/coordinates/ideal/ATP.pdb
divided/A/ATP/coordinates/ideal/ATP_no_hydrogen.sdf
divided/A/ATP/coordinates/ideal/ATP.xml
divided/A/ATP/coordinates/model/ATP.sdf
divided/A/ATP/coordinates/model/ATP.pdb
divided/A/ATP/coordinates/model/ATP_no_hydrogen.sdf
divided/A/ATP/2Dimages/with_labels/ATP.xml
divided/A/ATP/2Dimages/with_labels/ATP.png
divided/A/ATP/2Dimages/without_labels/ATP.xml
divided/A/ATP/2Dimages/without_labels/ATP.png
```
* this will enable users to quickly find what they want (hopefully?)PDBeChem Backend Processing: get into preproductionhttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/24improve fragment searching method & library2017-10-02T14:38:24ZOliver Smartimprove fragment searching method & library* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it ...* issue #11 developed fragment matching code.
* accepted the fragment file from the original prototype.
* there are some issues with this file:
- [x] file format - switch to using a normal multimolecule SMILES .smi file format.
- [x] it would be useful to generate pictures of fragment library molecules
- [x] the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
- [x] The tools need to be usable by other people #18 needs to be done.
- [x] needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns `name, smarts?, query, comment`.
- [ ] *peptide* fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial *amide*
- [x] *steroid* needs to pick up all steroids.
- [x] *deoxyribose* fragment needs to not pick up ribose.
- [ ] *pyranose* fragment is wrong
- [ ] A.M. wants to add additional fragments - provide tools for him to be able to take over the work.
## it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.
* This would help developing checking the fragment library but is a reasonably big task.
* thinking of a simple interactive command line 'server'
* that processes the components.cif holding all PDBCCD in memory
* then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
* does a substructure search against the PDBCCD
* then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment
## Question: do we really want SMILES substrings to define fragments?
* currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. [Daylight>SMARTS Examples](http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) **27-Sept-2017**PDBeChem Backend Processing: get into preproductionOliver SmartOliver Smarthttps://gitlab.ebi.ac.uk/pdbe/ccdutils/-/issues/25Rethink on 2D chemical diagram images2017-09-20T07:56:31ZOliver SmartRethink on 2D chemical diagram images* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrog...* original aim was to replace images in the ftp area (3 gifs: a `large`, a `small` and `hydrogen` for each ccd).
* in addition produce labelled and unlabelled svg images
* but on reconsideration this is not wise - as the small and hydrogen images are not very good and the ftp area images are not used on the PDBe page
* just produce the svg images.Improve PDBe Chemical Components backend infrastructure using RDKit: beta test versionOliver SmartOliver Smart