New distributed Queen system. Queen/hive updates its state in an incremental
and distributed manner as it interacts with the workers over the course of its life. When a runWorker.pl script starts and asks a queen to create a worker the queen has a list of known analyses which are 'above the surface' where full hive analysis has been done and the number of needed workers has been calculated. Full synch requires joining data between the analysis, analysis_job, analysis_stats, and hive tables. When this reached 10e7 jobs, 10e4 analyses, 10e3 workers a full hard sync took minutes and it was clear this bit of the system wasn't scaling and wasn't going to make it to the next order of magnitude. This occurred in the compara blastz pipeline between mouse and rat. Now there are some analyses 'below the surface' that have partial synchronization. These analyses have been flagged as having 'x' new jobs (AnalysisJobAdaptor updating analysis_stats on job insert). If no analysis is found to asign to the newly created worker, the queen will dip below the surface and start checking the analyses with the highest probablity of needing the most workers. This incremental sync is also done in Queen::get_num_needed_workers When calculating ahead a total worker count, this routine will also dip below the surface until the hive reaches it's current defined worker saturation. A beekeeper is no longer a required component for the system to function. If workers can get onto cpus the hive will run. The beekeeper is now mainly a user display program showing the status of the hive. There is no longer any central process doing work and one hive can potentially scale beyond 10e9 jobs in graphs of 10e6 analysis nodes and 10e6 running workers.
Please register or sign in to comment