Skip to content

improve fragment searching method & library

  • issue #11 (closed) developed fragment matching code.
  • accepted the fragment file from the original prototype.
  • there are some issues with this file:
  • file format - switch to using a normal multimolecule SMILES .smi file format.
  • it would be useful to generate pictures of fragment library molecules
  • the single cif command line script would be very useful to test what fragments are in an individual ccd cif. Need issue #19 to be done!
  • The tools need to be usable by other people #18 (closed) needs to be done.
  • needs to improved to support SMARTS as well as SMILES alter the fragment library file to tsv with columns name, smarts?, query, comment.
  • peptide fragment is not a peptide bond - is it an attempt to search for amino acid? replace with Daylight tutorial amide
  • steroid needs to pick up all steroids.
  • deoxyribose fragment needs to not pick up ribose.
  • pyranose fragment is wrong
  • A.M. wants to add additional fragments - provide tools for him to be able to take over the work.

it would be useful to run a search for a SMILES or SMARTS fragment against the complete CCD.

  • This would help developing checking the fragment library but is a reasonably big task.
    • thinking of a simple interactive command line 'server'
    • that processes the components.cif holding all PDBCCD in memory
    • then waits for user to enter a SMILES or SMARTS string (or fragment name) on keyboard
    • does a substructure search against the PDBCCD
    • then waits for the next string or to switch mode - S=SMILES T=SMARTS F=fragment

Question: do we really want SMILES substrings to define fragments?

  • currently the deoxyribose fragment matches every ribose because deoxyribose is a substructure of ribose. Can SMILES be used to say that C2' is CH2? Does it matter. SMART could be used. Daylight>SMARTS Examples 27-Sept-2017
Edited by Oliver Smart