We are trying out fairshare with queue priority, and I’m wondering how “large” jobs will be affected by this policy. The Admin Guide says,
At each scheduling cycle, the scheduler attempts to run as many jobs as possible. It selects the most deserving job, runs it if it can, then recalculates to find the next most deserving job, runs it if it can, and so on.
I’m concerned with the situation where the most-deserving entity submits a “large” job on a heavily-utilized cluster, followed by less-deserving entities submitting large numbers of “small” jobs. Based on the description above, it seems that the scheduler would try to schedule the large job first, see that there aren’t enough resources available, then proceed to schedule the small jobs instead. If small jobs continue to be submitted, I would think that the large job would be starved for resources and never schedule.
Is this a concern with fairshare policy for the situation I described, or does the scheduler have some way of ensuring that less-deserving jobs get held until the large job is able to run? It sounds like backfilling and help_starving_jobs are not options with fairshare.