Viability of creating a ChEBI Database Management Repository
Taking advantage of dbt framework, and taking into account the need to centralize our database schema migrations into one repository, we should create a new repository called chebi-data-management
. The goal is to avoid SQL code duplication across other data repositories (chebi-dumps
, chebi-ontology-generator
, chebi-elasticsearch
) that need to perform writing/reading actions over the ChEBI PostgreSQL database as part of their workflow.
The above also helps us to keep business logic in one place (validations about public data), also we can perform data quality checks over database schemas before generating a new release.
-
Separate the chebi-dumps
project into two projects:chebi-database-management
with all code related to dbt stuff. This project will be responsible for updating the different schemas in PostgreSQL and executing data quality checks against the database. The idea is that we can use the docker image created in this repo to perform necessary actions in the CI/CD over other repos. -
Let chebi-dumps
project only with the code necessary to generate data dumps (SDF, PostgreSQL, flat files, etc). -
chebi-ontology-generator
needs to include somehow these changes in the release workflow.