Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • C Chebi ETL
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 0
    • Merge requests 0
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • ChEMBLChEMBL
  • ChEBI
  • ChEBI-2.0ChEBI-2.0
  • Chebi ETL
  • Issues
  • #17
Closed
Open
Issue created Nov 02, 2022 by Carlos Andres Moreno Velez@carlosm🌳Owner

Chebi ETL | One structure has more than 1 wurcs

In the migration process of wurcs table, we realized that there are 19 structures in CHEZTST database with more than 1 wurcs name. The idea is to migrate only the wurcs name corresponding to the molfile. For example, this compound has two WURC in the database:

  • WURCS=2.0/3,6,5/[a1122h-1x_1-5][a1122h-1a_1-5][a1122h-1b_1-5]/1-2-2-2-3-2/a2-b1_b2-c1_c3-d1_c6-f1_d2-e1
  • WURCS=2.0/2,6,5/[a1122h-1x_1-5][a1122h-1a_1-5]/1-2-2-2-2-2/a2-b1_b2-c1_c3-d1_c6-f1_d2-e1

But checking its molfile, we only get the first one:

  • WURCS=2.0/3,6,5/[a1122h-1x_1-5][a1122h-1a_1-5][a1122h-1b_1-5]/1-2-2-2-3-2/a2-b1_b2-c1_c3-d1_c6-f1_d2-e1

So, the idea is to migrate only the above. Finally, this was the SQL sentence used to get the structure ids with more than 1 wurcs

-- Gettings the structure ids
SELECT
    s.id,
    --n.name AS wurcs,
    COUNT(*)
FROM names n
    INNER JOIN structures s ON s.compound_id = n.compound_id
        AND n.status = s.status
WHERE
    s.TYPE = 'mol'
    AND n.type = 'SYNONYM'
    AND n.name LIKE 'WURCS%'
GROUP BY s.id
HAVING COUNT(*) > 1;

-- Checking the wurcs names of the above ids.
SELECT 
    s.id AS structure_id,
    s.COMPOUND_ID,
    n.name AS wurcs
FROM names n
    INNER JOIN structures s ON s.compound_id = n.compound_id
        AND s.status = n.status
WHERE n.type = 'SYNONYM'
    AND n.name LIKE 'WURCS%'
    AND s.type = 'mol'
    AND s.id IN (3151748, 3141532, 3141937, 3150376, 3129897, 3135232, 3151379, 3148262, 3152216, 3130245, 3140566, 3129414, 3134433, 3141118,3151664, 3138920, 3150556, 3155468, 3137171);

See this slack message for additional information

Edited Nov 03, 2022 by Carlos Andres Moreno Velez
Assignee
Assign to
Time tracking