Fix combine CLI producing empty output with invalid data
Created by: nebfield
When processing a single file, if the combine CLI encountered an invalid variant it would quietly fail to write out any variants (quiet except for a misleading log statement). Some investigation notes:
-
The invalid data wasn't invalid (some variants failed harmonisation and were missing mandatory fields) -
Update the pydantic models to support this case properly
-
-
The CLI wasn't re-raising exceptions correctly because I had a returnstatement inside afinallyblock😬 -
Add tests to make sure badly harmonised variants do create an output file and invalid variants do throw a ValidationErrorexception -
Add a check to the CLI that an output file actually exists
-
-
Re-test this branch on the entire Catalog and check for exceptions
Closes #55 (closed)
Test results
22 (older) scoring files contain invalid rsIDs:
pgscatalog.core.cli.combine_cli: 2024-10-18 14:43:12 CRITICAL PGS000019 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 14:52:23 CRITICAL PGS000042 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000212 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000213 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000214 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000215 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000216 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000310 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000311 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:06 CRITICAL PGS000317 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:36:01 CRITICAL PGS000330 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000332 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000333 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000344 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000345 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000346 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000347 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:14:11 CRITICAL PGS000727 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000728 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000729 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:22:39 CRITICAL PGS000754 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:33:12 CRITICAL PGS000867 contains invalid data, stopping and exploding
Fix is to relax the rsID check when harmonisation goes wrong. No other ValidationErrors get thrown.