improve fragment searching method & library
- issue #11 (closed) developed fragment matching code.
- accepted the fragment file from the original prototype.
- there are some issues with this file:
file format - switch to using a normal multimolecule SMILES .smi file format.
it would be useful to generate pictures of fragment library molecules
the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
The tools need to be usable by other people #18 (closed) needs to be done.
needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns
name, smarts?, query, comment.
peptide fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial amide
steroid needs to pick up all steroids.
deoxyribose fragment needs to not pick up ribose.
pyranose fragment is wrong
A.M. wants to add additional fragments - provide tools for him to be able to take over the work.
it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.
- This would help developing checking the fragment library but is a reasonably big task.
- thinking of a simple interactive command line 'server'
- that processes the components.cif holding all PDBCCD in memory
- then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
- does a substructure search against the PDBCCD
- then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment
Question: do we really want SMILES substrings to define fragments?
- currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. Daylight>SMARTS Examples 27-Sept-2017