How to aggregate heterogeneous entire clusters into “one big cluster” for cross-cluster job submission


#1

how to aggregate heterogeneous entire clusters into “one big cluster” for cross-cluster job submission,as mentioned in PBS Works
Thanks for your attention


#2

Please check

  1. Peer Scheduling if you would like to do it at the PBS Level using PBS Pro OSS
    4.9.31 Peer Scheduling section from https://www.pbsworks.com/pdfs/PBS18.2_BigBook.pdf

  2. PBS Access Suite (Commercial product) portal to submit jobs across multiple PBS Pro Clusters

  3. Connect all the compute nodes from the respective to one PBS Pro server and manage the jobs with queues and policies.

Note: you cannot mix and match Compute Nodes running Windows and Linux under on PBS Server hosted on Linux or Windows system. Linux PBS Server will serve Linux compute nodes and the same for Windows.

If my answers does not answer your queries, please explain a bit in detail


#3

Hi,adarsh
Thank you for your reply and attention, this is the answer I want, I will try, thank you very much.


#4

Hello, adarsh
I have another question about PBS database,PBS Pro’s task submission records, scheduling status and other information are stored in the postgresql database.However, I found that the database does not immediately synchronize the job information each time the job is submitted, and sometimes the job information is not saved in the job_attr table. I want to develop a cluster management system, including job management, resource management, etc.How do I get the job information in real time during the development process?
Thank you for your attention.


#5

The information disposed by qmgr -c "p s " , pbsnodes -av , qstat -fx , qstat -anws1 are in the memory.
Any other attributes you change on any object is stored in database immediately - like form qmgr
Node states are not updated in the database, since they are transient in nature, it is not stored in the database.

  1. you can get the real time job information by running qstat -fx , tracejob , pbs_dtj
  2. You can use the PBS Libraries and develop code which suits your requirement , plese follow the guide https://www.pbsworks.com/pdfs/PBSProgramGuide18.2.pdf

Most of the Cluster management system have PBS Pro OSS integrated. Could you please explain what you would like to achieve, so that other experienced community members might contribute their suggestions.


In the development of the cluster management system, we use PBSPro as a tool for job management, how to get the job information in real time, through the database or command line query?
#6

Thank you for your reply ,We wanted to develop a job management system based on PBS, where monitoring functions are similar to Compute Manager, and we used Java to implement back-end logic, but we didn’t know how to get job information. For example, job ID, name, state, and so on

Except get the real time job information by running qstat -fx, tracejob, pbs_dtj, and then parse the query results

We are currently using to get job information from PBS postgresql database, but found that all assignments submitted will not be saved to the database, the table “job”, “job_attr”, “job_scr”, especially in most case, it will not save job attributes, so we can’t get the job from PBS database of complete information, under what conditions operation attributes are saved in the “job_attr”? If the job is not saved to the PBS postgresql database, how does the job not saved to the database be saved to memory? And there is a delay in saving to the database, why do jobs submitted through the command line not immediately save to the database?


#7

Thank you for the information.

The goal of the PBS datastore is not to store the entirety of the information of the cluster ( historical and real time information). Also, it is not good idea to connect to the PBS datastore to get the information of the cluster and the purpose of PBS datastore had no intention to support such an activity (integrate with cluster management system). It is not recommended to connect to PBS datastore with active PBS Server service , as it might have unknown consequences (integrity of the datastore).

  • use the PBS Pro API call to achieve this in an optimised way.
  • use PBS libraries to interact with the system from within Cluster Manager ( using C , python, java)

#8

Thank you very much, if we want to count all the execution hosts load information, not the vnode, and show it in the pie chart on the management system homepage, is there any advice? Does PBS provide a way to get the execution hosts load information? Thanks again


#9

Then you can use a execution host periodic hook, that collects the load information of the compute nodes at regular intervals. Similar , all health check scripts can be implemented in the same way.

exechost_periodic hook provides the framework within which any kind of information can be collected periodically

Other option:

  1. Ganglia

Thank you for these queries