Fix formulas with R groups as carbons-alias
Based on the mismatches we are getting in the chemical data information, which is condensed in #44 (closed) . We need to make some changes in the ETL, specifically in the chemical_data
table as follows:
- Update the
librdchebi
library. - We need to validate which compounds have problems with the R groups using the function
no_r_group_and_alias
, this function will returnTrue
if it detects the potential issue. - For those wrong compounds getting above, we are going to fix them using the function
transform_alias_to_r
, this function returns a new molfile with the error fixed. So we need to use this new molfile to calculate the extra chemical data (e.g. mass, charge, formula, etc.). In the end, for the wrong compounds, we are going to insert the new molfile in the PostgreSQL database. - Finally, once we fix the compounds with the above process, we need to generate the reports shown in #44 (closed) And we should have reduced the number of wrong compounds.