PP-660: Add support to PBS init script to act on any PBS daemon individually


#1

Hi @developers,

Please review a new design change to enhance PBS init script to start/stop/restart/status individual PBS daemons. Currently these actions are done on all daemons together.

https://pbspro.atlassian.net/wiki/display/PD/PP-660%3A+Add+support+to+PBS+init+script+to+act+on+any+PBS+daemon+individually
Please do let me know of any comments or suggestions.

Thanks,
Sarita


#2

Hello Sarita, a few initial comments:

  1. I’d think we’d concentrate on adding this functionality to the systemd unit file rather than the init scripts since while we currently support non-systemd systems (RHEL 6 , SLES 10, CentOS 6), systemd systems are clearly the future of our supported platforms. In reality it is all tied together behind the scenes, at least today the pbs.service unit file ultimately calls /opt/pbs/libexec/pbs_init.d, which is the same as the init script that ends up in /etc/init.d, but that may not always be the case so the design should be written either generically or with specific interfaces for systemd unit files. Syntax examples of how this would be used with systemctl also need to be included.

  2. The EDD currently says “The existing behaviour would be the same of init script”, I assume that means that the actual method (qterm, kill, etc.) of stop or starting (pbs_mom with no -p, etc.) the daemon would be the same, correct? If so, I think it should be a bit more explicit about this.


#3

Could you change “as done currently” to “as done by default”.
Please also elaborate what do you mean by “The existing behaviour would be the same of init script”. How are you planning to start|stop|restart single daemon as qterm also kills server if you give -s or -m.


#4

Could you explain why the restart interface should be restricted to restarting only a single service? Why shouldn’t we be able to (e.g.) restart both server and scheduler with a single command-line invocation?


#5

@saritakh, why are we supporting only a single daemon name. Code wise, I believe we can support two daemon names as well. Not sure if such a use case will be useful.

However, if we support or not multiple daemon names, the proposed design should specify if there are any new error or notification messages that the script will output.


#6

Hi Scott,

Reviving this old discussion as we have a PTL issue to address.

init script currently supports parameters. systemd does not support parameters yet.

Given this constraint I feel we can have the /etc/init.d interface to start individual daemons and systemd will use the current interface which would be default. So systemd interface would be unchanged and PTL can use the /etc/init.d interface to start individual daemons.

What do you think?

-R

Jayadev C R


#7

There was a bug in this that was fixed as part of the docker integration last year. It’s now possible to pass the PBS_START_XXXX environmental variables to the script and override what is in pbs.conf. Previously, PTL had to hack around this by creating a temporary pbs.conf file that only had the correct PBS_START_XXXX variables in it. This had other issues since that daemon’s pbs.conf was different than the other daemons. It’s possible now to do something like:

PBS_START_MOM=1 PBS_START_SERVER=0 PBS_START_SCHED=0 PBS_START_COMM=0 /etc/init.d/pbs restart

To just restart the mom. You can easily just export these variables just the same, this example just put them on the command line. PTL for instance could use os.environ() to set the variables prior to calling the init script.

None of this changes what you said. I just thought I’d provide more information.

Bhroam


#8

Sorry, just noticed this. Does the information Bhroam has provided solve the problem for PTL, or are further changes needed?


#9

Indeed it solves PTL requirement. Thanks a lot @bhroam for giving information.

@scc Just thinking in context of PBS users/customers, since PTL requirement is fulfilled, should we drop this idea or should we continue to discuss as this might be helpful to others also if they have to control individual PBS service?


#10

I’d say we should drop this idea unless there is other demand for it (which I have not seen).


#11

@scc 's response to this thread caught my attention. I agree with @scc first response in Feb 2017

Can we move away from using (and installing) the INIT script (/etc/init.d/pbs)?

I have already seen customers run into trouble modifying the INIT script (/etc/init.d/pbs) and systemd does NOT call /etc/init.d/pbs (instead the /usr/lib/systemd/system/pbs.service references /opt/pbs/libexec/pbs_init.d)

If the test teams needs to automate the starting and stopping of specific daemons, then perhaps we should be considering individual services?

Having separate services for the PBS daemons would be more applicable in the external HA configurations, anyways.


#12

Doing a little more investigation, I found a website on “Controlling a Multi-Service Application with systemd”. So, do we really need to continue the INIT script?


#13

Hey @scott, we have a systemd service file that gets installed by pbs_postinstall here: https://github.com/PBSPro/pbspro/blob/master/src/cmds/scripts/pbs.service.in

We also have tests for verifying the start and stop of services using systemd here: https://github.com/PBSPro/pbspro/blob/master/test/tests/functional/pbs_systemd.py

When pbs_postinstall creates the /etc/init.d/pbs script it simply makes a copy of PBS_EXEC/libexec/pbs_init.d.

It would be possible to test for the existence of systemd on a live system, but it may be more complicated to try to do the same for a boot image being assembled in an alternate root. We still support platforms that have not adopted systemd due to their age, so I don’t think we should abandon the init script just yet.


#14

@mkaro, you are totally right :smile: Professional supports old/legacy systems.

Could be… not 100% certain. Looks like if can determine

[root@master ~]# file /sbin/init
/sbin/init: symbolic link to `…/lib/systemd/systemd’

then we would know the image/host will be running systemd. I’ll leave it alone for now.

I have reviewed the https://github.com/PBSPro/pbspro/blob/master/src/cmds/scripts/pbs.service.in and notice we copy the PBS_EXEC/libexec/pbs_init.d to /etc/init.d/pbs. I do have concerns about migration/downgrading/running multiple versions of PBS on the system… but this is a different topic and don’t want to derail.

The title of this discussion - “Add support to PBS init script to act on any PBS daemon individually” - caught my my attention, as well. Has the team looked at the EOL of the OSes and considered investing the effort in supporting Multi-Service Applications with systemd and let the old OS sysvinit stay as-is?

I don’t believe PBS Professional has a steadfast decision on making ALL new features work on ALL platforms. :wink: In addition, bhroam’s comment seem to suggest it is no longer necessary to make changes to the INIT script:

So, I would prefer more focus on improve systemd support and external HA configurations.


#15

I completely agree, @scott. I think that’s the correct approach. We already have the ability to start individual services using the method @bhroam described.


#16

Just sharing some analysis I did as part of PP-996 to have a better systemd integration.

  • systemd should be able to monitor the process it creates, not the children which an init script forks. pbs.service should start the daemon directly from service file instead of invoking an initd script
  • We can use a unit file template using which we can start any daemons directly using service initialization program.
    such as:
    systemctl start pbs@server pbs@sched pbs@mom pbs@comm
    (or any one of them to start an individual service)
  • There could be another target file (along with the template) which starts all the services upon request.
    Ex: systemctl start pbs (as we do today)
  • PBS_START_XXX values in pbs.conf will become irrelevant as you can control daemons directly
  • init script could be decoupled and can run as part of pbs_habitat which ran using “ExecStartPre=” directive in unit file. If there is any post invocation activities, which can be achieved using “ExecStartPost=”
  • pbs init script can be a combination of pbs_habitat script and ExecStartPost script and will be placed in /etc/init.d/ only when systemd is not found.