Pbs_snapshot --obfuscate: how to deal with certain files?

#1

Hi,

pbs_snapshot --obfuscate is supposed to anonymize sensitive information, but today it doesn’t anonymize everything, so I was working on fixing that. But I’m not sure how to anonymize certain files which are captured by pbs_snapshot today:

mom_priv/jobs/.JB files: These are binary files, but running grep on sensitive job attributes returns saying that this file contains those attributes. Should we bother? Should we leave them as it is, or delete them altogether when --obfuscate is provided?

system files:

  • output of ps aux
  • output of ps -leaf
    etc.
    These can also contain hostnames, jobnames, usernames etc. do we go through the effort of anonymizing them? Is it okay if we just don’t capture these when --obfuscate is provided?

@scc it would be great if you could provide some insights

Thanks,
Ravi

0 Likes

#2

Hi @agrawalravi90, good questions. Here are my thoughts:

For JB files, maybe if we are using --obfuscate we can run printjob on the files and obfuscate the output and not actually collect the .JB files themselves? This would still be helpful for troubleshooting purposes, I feel. Typically these would be looked at to answer a question like “what substate was the job in on the host vs. what the server thought?”, or “was the session ID actually being traceked by the mom?”, and I think it is worth doing.

For system outputs like ps it is probably safest to simply not collect them when obfuscating, since we don’t know everything that COULD be sensitive (process names, args to the processes, etc.). But I think we only collect ps -leaf output, correct? So I think we’d only really have to obfuscate the UID and CMD fields, and for CMD what if we just obfuscate the whole thing as one replacement token, so “XXYY” could actually map to the entirety of something like “/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only”, for example? Still a lot of effort?

0 Likes

#3

Thanks for the inputs @scc. For system outputs, we capture the following :

ps aux
ps -leaf
cat /etc/hosts
cat /etc/nsswitch.conf
lsof
vmstat
df -h
dmesg

Some of them could be pretty big, so is it really worth the effort to obfuscate all of them?

0 Likes