bugfix: "returncode" is never set, so get the exit status from `waitpid`
Created by: muffato
Use case
(reported by @tweep on the eHive-users mailing-list) When running an eHive command through Docker, the container returns the exit code 0 even if there are some errors.
$ docker run -it ensemblorg/ensembl-hive seed_pipeline.pl --url mysql://user@blst.abc
Use of uninitialized value in concatenation (.) or string at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 324.
Could not connect to database as user user using [DBI:mysql:host=blst.abc;port=3306] as a locator:
DBI connect('host=blst.abc;port=3306','user',...) failed: Unknown MySQL server host 'blst.abc' (0) at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 317.
Use of uninitialized value in concatenation (.) or string at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 333.
DB(mysql://user@blst.abc:3306/) Could not connect to database as user user using [DBI:mysql:host=blst.abc;port=3306] as a locator:
DBI connect('host=blst.abc;port=3306','user',...) failed: Unknown MySQL server host 'blst.abc' (0) at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 317.
at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 333.
Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::connect(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBConnection.pm line 139
eval {...} called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBConnection.pm line 141
Bio::EnsEMBL::Hive::DBSQL::DBConnection::connect(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 736
Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::db_handle(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 976
Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::__ANON__(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780), undef, undef, "hive_meta") called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 243
Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::_table_info_loader(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 187
Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::column_set(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 628
Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::AUTOLOAD(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830), "hive_sql_schema_version") called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm line 128
eval {...} called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm line 128
Bio::EnsEMBL::Hive::DBSQL::DBAdaptor::new("Bio::EnsEMBL::Hive::DBSQL::DBAdaptor", "-reg_alias", undef, "-reg_type", undef, "-url", "mysql://user\@blst.abc", "-no_sql_schema_version_check", undef, ...) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/HivePipeline.pm line 181
Bio::EnsEMBL::Hive::HivePipeline::new("Bio::EnsEMBL::Hive::HivePipeline", "-url", "mysql://user\@blst.abc", "-reg_conf", undef, "-reg_type", undef, "-reg_alias", undef, ...) called at /repo/ensembl-hive/scripts/seed_pipeline.pl line 82
main::main() called at /repo/ensembl-hive/scripts/seed_pipeline.pl line 152
$ echo $?
0
Description
It seems that main_cmd.returncode
is not set (still None). According to the Python docs, I may have to call main_cmd.wait()
but it didn't help. I suppose it's because the process has already been ripped (see the waitpid
call above).
So my fix is to capture the main return code in wait_for_all_processes
as well (alongside the other children's return codes) and return the first failure (the main process has the priority)
Possible Drawbacks
Maybe some applications are used to eHive containers returning 0 ...
Testing
Have you added/modified unit tests to test the changes?
We don't have any tests for the Docker image, so I've tested it locally:
docker run -it -v $PWD/scripts/dev/:/repo/ensembl-hive/scripts/dev/ ensemblorg/ensembl-hive seed_pipeline.pl --url mysql://user@blst.abc; echo $?
Have you run the entire test suite and no regression was detected?
The rest of the source code is not impacted