Skip to content

Fix combine CLI producing empty output with invalid data

Florent Yvon requested to merge fix-65 into main

Created by: nebfield

When processing a single file, if the combine CLI encountered an invalid variant it would quietly fail to write out any variants (quiet except for a misleading log statement). Some investigation notes:

  • The invalid data wasn't invalid (some variants failed harmonisation and were missing mandatory fields)
    • Update the pydantic models to support this case properly
  • The CLI wasn't re-raising exceptions correctly because I had a return statement inside a finally block 😬
    • Add tests to make sure badly harmonised variants do create an output file and invalid variants do throw a ValidationError exception
    • Add a check to the CLI that an output file actually exists
  • Re-test this branch on the entire Catalog and check for exceptions

Closes #55 (closed)

Test results

22 (older) scoring files contain invalid rsIDs:

pgscatalog.core.cli.combine_cli: 2024-10-18 14:43:12 CRITICAL PGS000019 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 14:52:23 CRITICAL PGS000042 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000212 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000213 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000214 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000215 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000216 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000310 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000311 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:06 CRITICAL PGS000317 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:36:01 CRITICAL PGS000330 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000332 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000333 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000344 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000345 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000346 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000347 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:14:11 CRITICAL PGS000727 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000728 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000729 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:22:39 CRITICAL PGS000754 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:33:12 CRITICAL PGS000867 contains invalid data, stopping and exploding

Fix is to relax the rsID check when harmonisation goes wrong. No other ValidationErrors get thrown.

Merge request reports

Loading