Chebi ETL | Calculate WURCS for Carbohydrates
Carbohydrates with a structure (molfile) will need to have their WURCS identifier calculated in the new wurcs table. This was previously manually done but it can be automated. To know if a molecule is a carbohydrate we need to look at the ontology. If a molecule has a relation is a
with carbohydrate (id 16646) then the WURCS needs to be calculated.
Here is a simple function to find out if a molecule is a carbohydrate, just to take it as reference:
from sqlalchemy import create_engine
engine = create_engine('postgresql://chebi:bi22Wd1@pgsql-hlvm-067.ebi.ac.uk:5432/chmchebipro?client_encoding=utf8')
def is_carbohydrate(cpd_id, debug=False):
query = f"""
select
init_id,
cpd.name,
cpd.parent_id -- some relations might be linked to its parent_id
from
development.relation rel,
development.compounds cpd
where
rel.init_id = cpd.id
and rel.relation_type_id = 5 -- 'is a' relation
and rel.status_id = 1
and rel.final_id = {cpd_id}"""
with engine.connect() as conn:
res = conn.execute(query).all()
for r in res:
if debug:
print(r)
# carbohydrate chebi_id = 16646
if r[0] != 16646 and r[2] != 16646:
return is_carbohydrate(r[2] if r[2] else r[0], debug=debug)
else:
return True
return False