In version 19.1.1, it appears that array jobs no longer enter an execution queue if the execution queue is not large enough to hold all the subjobs in the array from the outset. This was definitely not the case in version 18.1.3. Has something changed in how array jobs are now handled in version 19? I’ve looked through the manuals but can’t find any clues there. Perhaps there is a new queue configuration setting that I need to set for array jobs?
An array job consisting of 100 subjobs, each requesting 2 cpu and 1 gb mem, enters the routing queue “workq”. The “workq” queue is configured to route all jobs requesting 2 cpu and 1 gb mem to an exec queue called “solo”. Limits on the “solo” queue allow each user to run no more than 50 jobs at any given time in this queue. In version 18.1.3, 50 subjobs from the array job would enter the “solo” queue and begin running. As they completed, additional subjobs from the array job would trickle in from the routing queue until all subjobs were complete. At no time would the user have more than 50 jobs running in the “solo” queue. In version 19.1.1, the entire array job remains queued in the routing queue and never makes it to the execution queue. I can force the array job to start by manually issuing a move command, eg: “qmove jobid solo”, at which point 50 subjobs will start without issue, and the others trickle in as expected. Or, I must increase the limits on the “solo” queue to allow for at least 100 jobs per user, and then the array job starts immediately without the need for a “qmove”.