Commit b5a0c0df authored by Lukas Pravda's avatar Lukas Pravda
Browse files

Merge branch 'dev'

parents 78248a8a 4b551193
......@@ -27,7 +27,7 @@
* Once you have installed RDKit, as described above then install pdbeccdutils using pip:
```console
pip install git+https://github.com/PDBeurope/ccdutils.git
pip install pdbeccdutils
```
## Features
......
......@@ -4,21 +4,27 @@
# Introduction
The `pdbeccdutils` is presently under development and new functionality is added regularly as well as its functionality is being revised and updated. When properly installed all the code should have documentation. All the *public* methods do have [static typing](http://mypy-lang.org/) introduced in Python 3.5. All the interfaces should be well documented.
The `pdbeccdutils` is under development and new functionality is added regularly as well as its functionality is being revised and updated. When properly installed all the code should have documentation. All the *public* methods do have [static typing](http://mypy-lang.org/) introduced in Python 3.5. All the interfaces should be well documented.
## Installation
The `pdbeccdutils` can be presently obtained from the [EBI Gitlab](https://gitlab.ebi.ac.uk/pdbe/ccdutils.git) using the following command:
The `pdbeccdutils` can be presently obtained from [PYPI](https://pypi.org/project/pdbeccdutils/) using the following command:
```console
pip install git+https://gitlab.ebi.ac.uk/pdbe/ccdutils.git
pip install pdbeccdutils
```
Alternativelly, you can install the reposotory from [Github](https://github.com/PDBeurope/ccdutils) using the following command:
```console
pip install git+https://github.com/PDBeurope/ccdutils.git@master#egg=pdbeccdutils
```
If you want to contribute to the project please fork it first and then do a pull request.
# Getting started
The centerpoint of the `pdbecccdutils` package is a `Component` object, which is a wrapper around the default `rdkit.Chem.rdchem.Mol` object (object property `,mol`) providing most of the functionality and access to its properties. All the conformers are stored in the `rdkit.Chem.rdchem.Mol` with the exception of 2D depiction, as this one does not contain hydrogen atoms. `pdbeccdutils.core.modes.ConformerType` object allows accessing all of them.
The centerpoint of the `pdbecccdutils` package is a `Component` object, which is a wrapper around the default `rdkit.Chem.rdchem.Mol` object (object property `mol`) providing most of the functionality and access to its properties. All the conformers are stored in the `rdkit.Chem.rdchem.Mol` with the exception of 2D depiction, as this one does not contain explicit hydrogens. `pdbeccdutils.core.modes.ConformerType` object allows accessing all of them.
Below you can find a few typical use cases.
......@@ -35,7 +41,8 @@ rdkit_mol = component.mol
The `rdkit.Chem.rdchem.Mol` object is sanitized already.
### Chemical component dictionary
### Chemical component dictionary
Chemical component dictionary can be read in a single command and `rdkit.Chem.rdchem.Mol` representations obtained immediately. Resulting data structure of `ccd_reader.read_pdb_components_file` function is `Dict<str,pdbeccdutils.core.Component>` keyed on component ID as provided by the `data_XXX` element in the respective mmCIF file.
```python
......
__version__ = '0.5.4'
__version__ = '0.5.6'
......@@ -59,13 +59,14 @@ class CCDReaderResult(NamedTuple):
component: Component
def read_pdb_cif_file(path_to_cif: str) -> CCDReaderResult:
def read_pdb_cif_file(path_to_cif: str, sanitize: bool = True) -> CCDReaderResult:
"""
Read in single wwPDB CCD CIF component and create its internal
representation.
Args:
path_to_cif (str): Path to the cif file
sanitize (bool): [Defaults: True]
Raises:
ValueError: if file does not exist
......@@ -79,10 +80,10 @@ def read_pdb_cif_file(path_to_cif: str) -> CCDReaderResult:
cif_dict = list(MMCIF2Dict().parse(path_to_cif).values())[0]
return _parse_pdb_mmcif(cif_dict)
return _parse_pdb_mmcif(cif_dict, sanitize)
def read_pdb_components_file(path_to_cif: str) -> Dict[str, CCDReaderResult]:
def read_pdb_components_file(path_to_cif: str, sanitize: bool = True) -> Dict[str, CCDReaderResult]:
"""
Process multiple compounds stored in the wwPDB CCD
`components.cif` file.
......@@ -90,6 +91,8 @@ def read_pdb_components_file(path_to_cif: str) -> Dict[str, CCDReaderResult]:
Args:
path_to_cif (str): Path to the `components.cif` file with
multiple ligands in it.
sanitize (bool): Whether or not the components should be sanitized
Defaults to True.
Raises:
ValueError: if the file does not exist.
......@@ -113,12 +116,14 @@ def read_pdb_components_file(path_to_cif: str) -> Dict[str, CCDReaderResult]:
# region parse mmcif
def _parse_pdb_mmcif(cif_dict):
def _parse_pdb_mmcif(cif_dict, sanitize=True):
"""
Create internal representation of the molecule from mmcif format.
Args:
cif_dict (dict): mmcif category
sanitize (bool): Whether or not the rdkit component should
be sanitized. Defaults to True.
Returns:
CCDReaderResult: internal representation with the results
......@@ -143,7 +148,7 @@ def _parse_pdb_mmcif(cif_dict):
descriptors += _parse_pdb_descriptors(identifiers_dict, 'identifier')
properties = _parse_pdb_properties(properties_dict)
comp = Component(mol.GetMol(), cif_dict, properties, descriptors)
comp = Component(mol.GetMol(), cif_dict, properties, descriptors, sanitize=sanitize)
reader_result = CCDReaderResult(warnings=warnings, errors=errors, component=comp)
return reader_result
......
This diff is collapsed.
......@@ -33,6 +33,7 @@ from pdbeccdutils.utils import config
from rdkit import Chem, Geometry
from rdkit.Chem import AllChem, rdCoordGen
from scipy.spatial import KDTree
from pdbeccdutils.helpers.rdkit_fixtures import fix_conformer
class DepictionManager:
......@@ -108,7 +109,11 @@ class DepictionManager:
results.sort(key=lambda l: (l.score, l.source))
if results:
return results[0]
to_return = results[0]
fix_conformer(to_return.mol.GetConformer(0))
return to_return
return DepictionResult(source=DepictionSource.Failed, template_name='', mol=None, score=1000)
......
......@@ -161,7 +161,7 @@ def convert_svg(svg_string, ccd_id, mol: rdkit.Chem.Mol):
kd_tree = KDTree(atom_centers)
for bond_svg in bond_elem:
if 'class' not in bond_svg.attrib:
if 'class' not in bond_svg.attrib or 'bond-selector' in bond_svg.attrib['class']:
continue
bond_id_str = re.search(r'\d+', bond_svg.attrib['class']).group(0)
......
#!/usr/bin/env python
# software from PDBe: Protein Data Bank in Europe; https://pdbe.org
#
# Copyright 2018 EMBL - European Bioinformatics Institute
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on
# an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import numpy as np
import rdkit
def fix_conformer(conformer):
"""In place fixing of rdkit conformer.
In certain cases the resulting conformer (mainly from depiction process)
can contain not valid atom coordinatesp [NaN, NaN, NaN]. This
results in errors in downstream processes so that it is easier
to fix it when the problem occurs
Args:
conformer (rdkit.Chem.rdchem.Conformer): RDKit conformer
"""
positions = conformer.GetPositions()
for index, pos in enumerate(positions):
if all(np.isnan(pos)):
new_pos = rdkit.Chem.rdGeometry.Point3D(0, 0, 0)
conformer.SetAtomPosition(index, new_pos)
......@@ -33,7 +33,7 @@ import logging
import os
import sys
import traceback
from typing import List
from typing import Dict, Optional
import rdkit
......@@ -58,12 +58,12 @@ class PDBeChemManager:
Args:
logger (logging.Logger, optional): Defaults to None. Application log
"""
self.compounds: List[ccd_reader.CCDReaderResult] = [] # processed compounds
self.compounds: Dict[str, ccd_reader.CCDReaderResult] = {} # processed compounds
self.ligands_to_process: int = 0 # no. ligands to process
self.output_dir: str = "" # where the results will be written
self.depictions: DepictionManager = None # helper class to get nice depictions
self.pubchem: PubChemDownloader = None # helper class to download templates if needed
self.fragment_library: FragmentLibrary = None # Fragments library to get substructure matches
self.depictions: Optional[DepictionManager] = None # helper class to get nice depictions
self.pubchem: Optional[PubChemDownloader] = None # helper class to download templates if needed
self.fragment_library: Optional[FragmentLibrary] = None # Fragments library to get substructure matches
self.logger = (
logger if logger is not None else logging.getLogger(__name__)
) # log of the application
......
import rdkit
import numpy as np
from pdbeccdutils.helpers.rdkit_fixtures import fix_conformer
class TestRDKitFixtures:
@staticmethod
def test_conformer_is_broken_ion():
mol = rdkit.Chem.RWMol()
atom = rdkit.Chem.Atom('H')
mol.AddAtom(atom)
conformer = rdkit.Chem.Conformer(1)
atom_position = rdkit.Chem.rdGeometry.Point3D(np.NaN, np.NaN, np.NaN)
conformer.SetAtomPosition(0, atom_position)
mol.AddConformer(conformer, assignId=True)
m = mol.GetMol()
c = m.GetConformer(0)
fix_conformer(c)
assert c.GetAtomPosition(0).x == 0.0
assert c.GetAtomPosition(0).y == 0.0
assert c.GetAtomPosition(0).z == 0.0
@staticmethod
def test_conformer_has_broken_atom():
mol = rdkit.Chem.RWMol()
o = rdkit.Chem.Atom('O')
h = rdkit.Chem.Atom('H')
mol.AddAtom(o)
mol.AddAtom(h)
mol.AddBond(0, 1, rdkit.Chem.rdchem.BondType(1))
conformer = rdkit.Chem.Conformer(1)
o_position = rdkit.Chem.rdGeometry.Point3D(1, 2, 3)
h_position = rdkit.Chem.rdGeometry.Point3D(np.NaN, np.NaN, np.NaN)
conformer.SetAtomPosition(0, o_position)
conformer.SetAtomPosition(1, h_position)
mol.AddConformer(conformer, assignId=True)
m = mol.GetMol()
c = m.GetConformer(0)
fix_conformer(c)
assert c.GetAtomPosition(0).x != 0.0
assert c.GetAtomPosition(0).y != 0.0
assert c.GetAtomPosition(0).z != 0.0
assert c.GetAtomPosition(1).x == 0.0
assert c.GetAtomPosition(1).y == 0.0
assert c.GetAtomPosition(1).z == 0.0
from setuptools import setup, find_namespace_packages
import pdbeccdutils
import os
from setuptools import find_namespace_packages, setup
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()
def get_version(rel_path):
for line in read(rel_path).splitlines():
if line.startswith('__version__'):
delim = '"' if '"' in line else "'"
return line.split(delim)[1]
else:
raise RuntimeError("Unable to find version string.")
setup(
name="pdbeccdutils",
version=pdbeccdutils.__version__,
version=get_version('pdbeccdutils/__init__.py'),
description="Toolkit to deal with wwPDB chemical components definitions for small molecules.",
long_description=read("README.md"),
long_description_content_type='text/markdown',
project_urls={
"Source code": "https://github.com/PDBeurope/ccdutils",
"Documentation": "https://pdbeurope.github.io/ccdutils/",
......@@ -13,14 +29,16 @@ setup(
author_email="pdbehelp@ebi.ac.uk",
license="Apache License 2.0.",
keywords="PDB CCD wwPDB small molecule",
url="http://pypi.python.org/pypi/pdbeccdutils/",
packages=find_namespace_packages(),
zip_safe=False,
include_package_data=True,
python_requires='>=3.6',
install_requires=[
"Pillow",
"scipy",
"numpy",
"pdbecif @ git+https://github.com/PDBeurope/pdbecif.git@master#egg=pdbecif",
"pdbecif>=1.5",
],
tests_require=["pytest"],
entry_points={
......@@ -29,4 +47,21 @@ setup(
"setup_pubchem_library=pdbeccdutils.scripts.setup_pubchem_library_cli:main",
]
},
classifiers=[
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
"Operating System :: Unix",
"Operating System :: MacOS",
"Operating System :: POSIX",
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Development Status :: 5 - Production/Stable",
],
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment