Skip to content

bugfix: "returncode" is never set, so get the exit status from `waitpid`

Marek Szuba requested to merge bugfix/docker_exit_code into version/2.5

Created by: muffato

Use case

(reported by @tweep on the eHive-users mailing-list) When running an eHive command through Docker, the container returns the exit code 0 even if there are some errors.

$ docker run -it ensemblorg/ensembl-hive seed_pipeline.pl --url mysql://user@blst.abc

Use of uninitialized value in concatenation (.) or string at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 324.
Could not connect to database  as user user using [DBI:mysql:host=blst.abc;port=3306] as a locator:
DBI connect('host=blst.abc;port=3306','user',...) failed: Unknown MySQL server host 'blst.abc' (0) at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 317.
Use of uninitialized value in concatenation (.) or string at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 333.
DB(mysql://user@blst.abc:3306/) Could not connect to database  as user user using [DBI:mysql:host=blst.abc;port=3306] as a locator:
DBI connect('host=blst.abc;port=3306','user',...) failed: Unknown MySQL server host 'blst.abc' (0) at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 317.
 at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 333.
	Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::connect(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBConnection.pm line 139
	eval {...} called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBConnection.pm line 141
	Bio::EnsEMBL::Hive::DBSQL::DBConnection::connect(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 736
	Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::db_handle(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/CoreDBConnection.pm line 976
	Bio::EnsEMBL::Hive::DBSQL::CoreDBConnection::__ANON__(Bio::EnsEMBL::Hive::DBSQL::DBConnection=HASH(0x1a10780), undef, undef, "hive_meta") called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 243
	Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::_table_info_loader(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 187
	Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::column_set(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830)) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm line 628
	Bio::EnsEMBL::Hive::DBSQL::BaseAdaptor::AUTOLOAD(Bio::EnsEMBL::Hive::DBSQL::MetaAdaptor=HASH(0xa5e830), "hive_sql_schema_version") called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm line 128
	eval {...} called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm line 128
	Bio::EnsEMBL::Hive::DBSQL::DBAdaptor::new("Bio::EnsEMBL::Hive::DBSQL::DBAdaptor", "-reg_alias", undef, "-reg_type", undef, "-url", "mysql://user\@blst.abc", "-no_sql_schema_version_check", undef, ...) called at /repo/ensembl-hive/modules/Bio/EnsEMBL/Hive/HivePipeline.pm line 181
	Bio::EnsEMBL::Hive::HivePipeline::new("Bio::EnsEMBL::Hive::HivePipeline", "-url", "mysql://user\@blst.abc", "-reg_conf", undef, "-reg_type", undef, "-reg_alias", undef, ...) called at /repo/ensembl-hive/scripts/seed_pipeline.pl line 82
	main::main() called at /repo/ensembl-hive/scripts/seed_pipeline.pl line 152
$ echo $?
0

Description

It seems that main_cmd.returncode is not set (still None). According to the Python docs, I may have to call main_cmd.wait() but it didn't help. I suppose it's because the process has already been ripped (see the waitpid call above). So my fix is to capture the main return code in wait_for_all_processes as well (alongside the other children's return codes) and return the first failure (the main process has the priority)

Possible Drawbacks

Maybe some applications are used to eHive containers returning 0 ...

Testing

Have you added/modified unit tests to test the changes?

We don't have any tests for the Docker image, so I've tested it locally:

docker run -it -v $PWD/scripts/dev/:/repo/ensembl-hive/scripts/dev/ ensemblorg/ensembl-hive seed_pipeline.pl --url mysql://user@blst.abc; echo $?

Have you run the entire test suite and no regression was detected?

The rest of the source code is not impacted

Merge request reports