improve fragment searching method & library
- issue #11 (closed) developed fragment matching code.
- accepted the fragment file from the original prototype.
- there are some issues with this file:
- file format - switch to using a normal multimolecule SMILES .smi file format.
- it would be useful to generate pictures of fragment library molecules
- the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
- The tools need to be usable by other people #18 (closed) needs to be done.
needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns
name, smarts?, query, comment.
- peptide fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial amide
- steroid needs to pick up all steroids.
- deoxyribose fragment needs to not pick up ribose.
- pyranose fragment is wrong
- A.M. wants to add additional fragments - provide tools for him to be able to take over the work.
it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.
- This would help developing checking the fragment library but is a reasonably big task.
- thinking of a simple interactive command line 'server'
- that processes the components.cif holding all PDBCCD in memory
- then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
- does a substructure search against the PDBCCD
- then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment
Question: do we really want SMILES substrings to define fragments?
- currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. Daylight>SMARTS Examples 27-Sept-2017