PP-838 Support for logging via Syslog in PBS


#1

I have written down use cases for the logging via syslog functionality present in PBS. We are reverse engineering the EDD for logging via syslog.
Please review the use cases here: Logging via syslog

Please review the design here: Logging via syslog

Please review the Test Scenarios here: Logging via syslog


#2

Hey Tejas,
Thanks for writing this up. Syslog tends to be one of the more forgotten pieces of functionality in PBS. Spending some time on it is a good thing.

My comments:
Use case 1: You say logging via all daemons and then including objects like jobs and reservations. The objects are not daemons. What did you mean by referring to daemons and objects together like that?
Use case 2: PBSEVENT_DEBUG/PBSEVENT_DEBUG2 are not syslog log levels (they’re pbs log levels). There is a different log levels for syslog. Some examples are LOG_DEBUG or LOG_WARNING.
Use case 4: I don’t see how the user will be setting up syslog in a failover environment. I thinks this is just admin
Use case 6: This seems a little weird. Part of this is use case 5 (turning on both local and syslog together). The other part talks about running tracejob. How do these meet together? From reading it as it is, I think you’re saying you want to be able to run tracejob on either a local log or syslog. If you are truly just reverse engineering the syslog functionality and not adding anything new, then you currently can’t run tracejob on syslog logs. This might be a good enhancement to tracejob, but it can’t do it now.
Use case 7: I’m not sure we need this use case. This sounds like something completely in syslog’s corner. We don’t get into the business of how long logs are kept around


#3

Hi Bhroam,
Thanks a lot for the review.
I have edited usecase 1 to remove the reference of PBS objects and daemons together as it is confusing.
I have also updated usecases 2 and 4 according to the comments.
For 6, I have added this as tracejob will be a nice enhancement to have in future. I have added a note now saying this functionality is currently not supported. This note should be documented so that community knows that tracejob does not work with only syslog enabled.
I have removed use case 7.


#4

Hey Tejas,
Thanks for making the changes. I am happy with the use cases now. They cover the PBS syslog functionality.

Bhroam


#5

I have edited the first comment on this topic to have the link to design.
Here is the link
Please review design here: logging via syslog


#6
  1. Check if the datetime format logged in syslog file by default is same on every Linux distribution.
    Check if we can read the datetime format from syslogconf file and use that when reading logs

  2. If we can directly read the format from syslogconf file, the design should change to include using the datetime format according to what is set in the config file.
    Check if by default all linux distrubutions stores syslog file in same place under same name.
    Check if can find read this file path for syslog file from syslogconf file

  3. If we can directly read from syslogconf file, the design should change to include using the date format according to what is set in the config file.
    Research if we can use https://pypi.python.org/pypi/syslog-rfc5424-parser to read the syslog file. This parser uses the pyparsing module

  4. I have to verify if this parser can make it easier for us to read from syslog file

  5. Check for other Open source syslog parsers

  6. Analyze if changes are needed in pbs_loganalyzer file in PTL.

  7. Add method for reading compressed file in more detail- design decompress/read directly from .bz, .gz file

  8. Expand the design to include multiple combinations of PBS_SYSLOG and PBS_LOCALLOG settings

  9. Expand the design to include an attribute to log_match() for specifying whether to read from syslog / local_logs


#7

Hi Tejas,

Thanks for making updates on the EDD at https://pbspro.atlassian.net/wiki/spaces/PD/pages/52789715/PP-838+Support+for+logging+via+Syslog+in+PBS.

I have the following comments:

  1. Please refer https://pbspro.atlassian.net/wiki/spaces/DG/pages/293076995/PBS+Pro+Design+Document+Guidelines. Some sections like “Overview”, “Glossary” mentioned in the guidelines are good to add here.
  2. Objective: If the scope of work now is to only to address the design for syslog support in PTL we should mention that clearly. Are we addressing other PBS enhancements related to syslog now?
  3. Design : 1) syslog_match is a new function ? Where and who is using this function.Please be clear.
  4. Typo : “Things to be taken intoo consideration”
  5. “Wiki page for syslog. This can be referred to for understanding basics of syslog” . Can this be placed in “Glossary”section, may be
  6. log_config_values(): In the table, be consistent with “True/False” or “0 and 1”. The function above the title seems to be setting the syslog attribute “True/False”
  7. log_config_values(): What error will be thrown ? Is it not good to be silent and do nothing ?
  8. Should the name of the function be something like “get_log_config_values”?
  9. I feel it will be useful if you say at what point this function is called and updated with the syslog config information
  10. I could not quite understand the return value mentioned in the table . Should it not be the return value (1 ,2 or 3)?
  11. The font style (bold and italics) are not consistent
  12. log_syslog_lines(): Again, should the name of the function start with “get” ?
  13. log_syslog_lines(): Item 1 in Description . Should it be “get the type of syslog utility?
  14. log_syslog_lines(): “list of priorites[] “ Should it have underscores ?
  15. log_syslog_lines():” list_of_syslog_files[] is the list of paths to the files right ?
  16. log_syslog_lines(): “if there is no list of files returned, throw PTL error) get_rsyslog_files(), get_syslogng_priorites()” . Here, should it be get_syslogng_files() not get_syslogng_priorites()
  17. get_rsyslog_priorites: Input Should it be severity and “facility” (priority is mentioned)
  18. get_syslogng_priorites: same as above
  19. Please explicitly mention that the functions in bold are retuning these values
  20. If we are not supporting syslogng now, please mention it somewhere
  21. get_syslogng_priorites: No description of the function . Consciously missed ? Are we using syslog specific methods inside ?
  22. get_syslog_lines(): Description says about a parameter “name” but the prototype fo the function does not have it
  23. get_syslog_lines(): Summary should be “… last n blocks of lines”… n is missing here
  24. Do we have any high mark for the max number of lines ? Will there be any performance hits for huge numbers as there seems to be sorting and combining of text? You may want to unit test this for huge number of lines and see if there are any performance hits
  25. log_match(): The heading “Execution steps” is not consistent with other sections. Input , Return Titles not added explicitly’
  26. log_match(): “If syslog fails there is no need to check in local lo” Please rephrase this . I guess you are trying to say that if the match is found in syslog there is no need to check the local logs .Correct?
  27. Test Scenarios: You may want to check the performance when matching for a log messages from a huge file
  28. “Changes to existing methods are made in the following files:- “ Please also add a heading explicitly for the new functions added to have better consistency and clarity
  29. convert_date_time: fmt=syslog_format . What is this format and how do you get it ?
  30. Test Scenarios: “Test that logging via sys logging can be set for only a particular daemon.” How can we do this ? In pbs.conf?

#8

Hi Tejas,

Thanks for making the changes. Some comments on the fixes.

  1. I see that you have updated the Title of the EDD. Thanks. Please update the same in the overview section as well so that its clear that the scope of current work is only to “add syslog support in PTL inorder to test logging via syslog in PBS”.
  2. The new class “syslog_utils” does not follow the naming convention. Please change it.
  3. get_log_config_values: In the table, I am still not clear why the last column is required under this function. If you feel that this information has to go somewhere, please explicitly explain here, else move it to a more appropriate place. It is bit confusing to the reader. The line “ Note: Return value is the value that will be by _log_match().“ , it is not clear which return value we are talking about here.
  4. get_log_syslog_lines :In the description section, #3, Should it be get_syslogng_files().
    Do we need the still need to add descriptions about def get_syslogng_priorites() and get_syslogng_files() as you have mentioned that we are only supporting rsyslog. Also the summary of this function get_syslogng_priorites wrongly mentions about rsyslog.conf. Please remove it if we are not using it.
  5. log_match() function: Description: #1. the class name and function names mentioned don’t match. Shouldn’t they be updated with the latest names?(syslog_config.log_config_values)
  6. _log_match() - Typo match_msh()
  7. def log_lines(): Description: “self.syslogUtils.log_syslog_lines” Update class name and function names accordingly

#9

Thanks Latha, I have integrated your comments in the document. I have added syslogng methods so that it will be easier to integrate it later with the same EDD.


#10
  • BY default pbs.conf does not have PBS_SYSLOG & PBS_LOCALLOG what will be behavior of get_log_config_values()?
  • In the tabble can make ues 1/0 insetad of set/unset it will be more readable.
  • In the table row 7 why we need to read twice if the Syslog_attribute=false?
  • get_rsyslog_priorites() & get_syslogng_priorites() can you give an example what exactly this function return ?

#11

Point 1- BY default pbs.conf does not have PBS_SYSLOG & PBS_LOCALLOG what will be behavior of get_log_config_values()
Ans: In this case we will set default values to PBS_SYSLOG & PBS_LOCALLOG.
a) First check if the PBS_SYSLOG & PBS_LOCALLOG are set in the pbs.conf file
b) If PBS_SYSLOG is not set, we will set locally in code variable PBS_SYSLOG=0
c) If PBS_LOCALLOG is not set, we will set locally in code variable PBS_LOCALLOG=1
Therefore this will check into the local logs only if the values are not set, which is default for PBS

I have updated the EDD accordingly

Point 2- I have set PBS_SYSLOG=0 where it was unset. The rest is shown as set because the values can be between 0 to 9

Point 3- This will help us run existing tests against syslog functionality without any changes to the tests, we will have to manually set the values in pbs.conf before starting tests.

Point 4- Output of get_rsyslog_priorites() would be something like- [’*.info’, ‘local7.debug’]
This list is a combination of facility.severity. In PBS we can set facility as- LOG_DAEMON and LOG_LOCAL0 to LOG_LOCAL7.
PS- I have changed the description of get_log_syslog_lines() point 2, this notes where we get the severity value.

Here is an example of rsyslog.conf where the info and debug messages file path is set-
*.info /var/log/messages
local7.debug /var/log/debug

On basis of returned list[] we will then get the of files where the log messages may be logged. The messages can be logged between multiple files. For eg- in the above setup info messages will go to /var/log/messages and debug messages to /var/log/debug
I have removed syslog-ng functions as we will not be adding support for it currently. This is also mention in the EDD. We will add it later once rsyslogd support is completed.


#12

Please update the title of this post to say " PTL support to test for logging via Syslog in PBS" instead of “Support for logging via Syslog in PBS”. I also don’t understand the “Use Cases” section, it mentions use cases for syslog facility in PBSPro, which is not what this is about, this is about adding PTL support for an existing functionality right? So, I think the only use case here is “User wants to test PBSPRo’s syslog functionality”, which doesn’t really need to be mentioned explicitly.

Comments on the design:

  • "For eg - in test bed machine pbsroot shoud be able to read from syslog file. ", this is internal to our testbed systems, you shouldn’t mention it here.
  • in the flow chart, there’s a branch that says “both local log and syslog messages to be read”, how will that happen? I thought when the syslog argument is set, log_match() will just use syslog.

Will review more soon.


#13

More comments:

  • “pbs_syslog_utils”: please rename this to be upper camelcase
  • " serverity = // PBS_SYSLOGSEVR from pbs.conf (by default NONE)": I think you misspelled “severity”
  • If the return value of the function get_log_config_values() is which logs to read, local, syslog or both, then I think you should rename the function to something like “get_log_type()” or “check_log_type()”
  • I’m not sure I understand the purpose of the ‘syslog’ argument to log_match(). Even if ‘syslog’ is False, if pbs.conf’s ‘PBS_SYSLOG’ is set, it will parse syslog. So, why not just only read the pbs.conf file to determine if PTL should read syslog or not? I understand that you might want a ‘syslog’ argument to __log_match() so to tell it to either read from syslog or local logs, but I don’t it’s needed in the externally visible log_match().
  • get_log_syslog_lines() and get_syslog_lines() look very similar, why not combine them?
  • get_syslog_lines() doesn’t accept start and end time, which is something that log_match() does accept. So, what will happen if somebody doesn’t set the ‘n’ argument to log_match(), but they set the starttime and endtime arguments?
  • “In row 7 where we have to match in both syslog and local log, we would have to call _log_match() twice.” If there’s no common code that both modes can utilize, then just have a different function called __syslog_log_match(). If, on the other hand, there’s common code inside __log_match() which both modes of operation can utilize, then why not change the syslog argument to something like ‘log_type’, where you can pass 1, 2 or 3, 3 meaning read from both. You could also add 2 arguments to log_match(), ‘syslog’ and ‘locallog’, locallog set to True by default, syslog set to False, and then you can set them both to True if both logs need to be read.

#14

Cannot change the name of topic here.
The use cases were to help us create the test cases for syslog. Since we did not have an EDD we were reverse engineering the from use cases. Currently this section is extra info for the PTL support. I can remove this section if you want

Moved this

This will happen when both the PBS_SYSLOG and PBS_LOCALLOG are set in pbs.conf and syslog attribute is false/ not given. We need this in case we want to run previous tests against syslog.


#15

Done - renamed to PBSSyslogUtils

Done

Done- renamed to get_log_type()

This parameter helps when syslog argument is set to true. In that case PTL will read only from syslog and not from local logs irrespective of the value (PBS_LOCALOG) in pbs.conf.

get_syslog_lines() is separated for better modularity, easy reading of code. Having a large function makes the code more complex to read, so I prefer having shorter functions wherever I can. I can combine this if you think its better.

n has a default value for syslog support in get_syslog_lines() same as the current log_match(). The start_time and end_time are handled by the function match_msg() in PBSLogUtils class. So this will be automatically handled with the current PTL functionality for syslog too.


#16

I see, thanks for clarifying, it makes sense to keep them then.

By the same token, if the syslog argument is false, then have the function NOT read syslog irrespective of pbs.conf’s PBS_SYSLOG value. It is confusing to have this argument override one pbs.conf variable but not the other. So, I’d recommend that you either remove this argument and then test writers will have to set the correct pbs.conf variables to enable/disable syslog, or make this argument override pbs.conf variables so that this argument is the method to enable/disable syslog.

That’s a noble goal, but in this case, there seems to be too much of an overlap. If the ‘filename’ and ‘logval’ arguments of get_syslog_lines() are None, then get_syslog_lines becomes get_log_syslog_lines. So, yes, I think you should merge the two, I think it’ll avoid code duplication.

Ah, ok, I was assuming that these functions fill be called internally only by log_match() and other externally visible PTL functions, but these are externally visible functions themselves, so they can be called directly by the end user. Thanks for clarifying.


#17

I have changed this. Now if syslog argument is set as true, we will enable syslog and disable local log in pbs.conf. The syslog argument will not be passed to PBSSyslogUtils.get_log_type()

I have combined these two into one - get_syslog_lines()

In log_match instead of calling _log_match() twice we will be be have two if statements
if sylog or both:
read syslog

if locallog or both:
read locallog


#18

Cool, I think you might wanna update your table then, it still considers a syslog parameter along with the pbs.conf variables

So, if I’m understanding correctly, now if the syslog argument is set to True, PTL will change pbs.conf so that PBS logs to syslog, but if it’s false, it will not change pbs.conf, instead it will read existing variables from it and decide whether to use syslog or not? If that’s correct, then that’s still inconsistent. Also, this is an argument to log_match() right? Then people will likely use it to say whether they want PTL to match logs from syslog or local logs, they probably don’t want log_match() to affect how PBS logs messages in syslog vs local logs. If they want to turn syslog on, they can either set the pbs.conf variable for it themselves in the test, or there could be a PTL interface for it, but log_match() is not the correct interface to turn syslog on/off.

Thanks for being patient!


#19

You are right about this. Here is the change I am doing-

  1. By default the value of syslog in def get_log_type(syslog=None) is change to None from false.
  2. In case the syslog value is false, only local_log messages will be read. (changed in table)
  3. I have also added a extra two conditions in the table where syslog=None will be handled

We will need to keep the attribute syslog and not have PTL directly read from pbs.conf. This will give the user flexibility to read from which logs he wants.
There might also be a case like this where it might fail if we read only from pbs.conf-

  1. In pbs.conf, syslog parameter is set and syslog severity is set to only alert level messages.
  2. In this case if the user runs log_match() on some debug level message the test will fail when PTL tries to find the message in syslog files.

#20

Some comments Tejas:

  • Now that get_log_type() doesn’t take any arguments, you should remove the text:
    " Input

    syslog - by default None
    

"

  • If get_log_type() is meant to be only used by log_match() then please make it an internal function to the class which contains log_match and prefix it with underscores.

  • I think the following bullets are not needed as they get explained by the table:
    "
    b) If PBS_SYSLOG is not set, we will locally in code set variable PBS_SYSLOG=0

     c) If  PBS_LOCALLOG is not set, we will locally in code set variable PBS_LOCALLOG=1
    

"

  • In the 3rd and 4th rows of the table, you’ve mentioned the value of PBS_SYSLOG as ‘set’, please change it to ‘1’.
  • I think you should add another row which explains what happens if both of them are unset, so the row will look something like “unset unset Local_log local_log(x,y)”
  • I also think that the column ‘Return Value for the log_match()’ is confusing. That table is explaining the behavior of the function get_log_type(), so it’s confusing to see what the function log_match() will return here. So, I’d recommend deleting this column and instead just explain in log_match() that it will return log lines from the local log if it matches from both.
  • If get_syslog_lines() is only intended to be used by LogUtils.log_lines(), then please make it an internal function of PBSLogUtils class and prefix it with underscores.
  • Seems like get_rsyslog_priorites() and get_rsyslog_files() are internal functions which get used by get_syslog_lines(), so please rename them with leading underscores.