dkms issue with "ps -o lstart" under SLES9
Silacci, Lucas
Lucas.Silacci at Teradata.Com
Mon Aug 20 19:32:58 CDT 2007
Hello,
I just ran into an issue during an rpm upgrade of my dkms driver package
and I was wondering if anybody else had seen this. During the upgrade a
"dkms remove" in my packages "%preun" was executed even though it was
called with "--rpm_safe_upgrade".
For background, I discovered this on SLES9 SP3 with dkms-2.0.13-1
installed (although it looks like the latest version would be
susceptible to the same issue).
I looked at the code and was able to get a debug trace of what happened.
Basically dkms is using the output from "ps -o lstart" to help determine
that a "dkms remove" command is coming from the same rpm process that
just did a "dkms add" command during an rpm upgrade. It puts the output
of the "ps -o lstart" into the lockfile and then references that
lockfile later to see if we are in an "rpm_safe_upgrade" situation.
However, it turns out that the output from that command is actually not
guaranteed to be identical for the same process from call to call under
SLES9. I'm not sure about other distros, but it's very obviously not
safe to use in this case. Here's an example of a random process that I
picked:
samoa2:~ # ps -o lstart 24135
STARTED
Thu Aug 16 10:13:20 2007
samoa2:~ # ps -o lstart 24135
STARTED
Thu Aug 16 10:13:19 2007
samoa2:~ # ps -o lstart 24135
STARTED
Thu Aug 16 10:13:19 2007
samoa2:~ # ps -o lstart 24135
STARTED
Thu Aug 16 10:13:19 2007
samoa2:~ # ps -o lstart 24135
STARTED
Thu Aug 16 10:13:20 2007
So I was wondering if anyone else has seen this issue and whether
there's any plans for a change to dkms for this.
Thanks,
-Lucas
Here's the dirty details...
There are two relevant pieces of code in dkms:
#1 (where the lock file gets created):
# Do stuff for --rpm_safe_upgrade
if [ -n "$rpm_safe_upgrade" ]; then
local pppid=`sed -ne 's/PPid:[ \t]*//p' /proc/$PPID/status`
local temp_dir_name=`mktemp
$tmp_location/dkms_rpm_safe_upgrade_lock.$pppid.XXXXXX 2>/dev/null`
echo "$module-$module_version" >> $temp_dir_name
ps -o lstart --no-headers -p $pppid 2>/dev/null >>
$temp_dir_name
fi
#2 (where we jump out for a safe upgrade):
# Do --rpm_safe_upgrade check (exit out and don't do remove if
inter-release RPM upgrade scenario occurs)
if [ -n "$rpm_safe_upgrade" ]; then
local pppid=`cat /proc/$PPID/status | grep PPid: | awk
{'print $2'}`
local time_stamp=`ps -o lstart --no-headers -p $pppid
2>/dev/null`
for lock_file in `ls
$tmp_location/dkms_rpm_safe_upgrade_lock.$pppid.* 2>/dev/null`; do
lock_head=`head -n 1 $lock_file 2>/dev/null`
lock_tail=`tail -n 1 $lock_file 2>/dev/null`
if [ "$lock_head" == "$module-$module_version" ] && [
"$lock_tail" == "$time_stamp" ] && [ -n "$time_stamp" ]; then
echo $""
echo $"DKMS: Remove cancelled because
--rpm_safe_upgrade scenario detected."
rm -f $lock_file
exit 0
fi
done
fi
You can see the problem with "ps -o lstart" in the following debug
output...
Debug output from a good run:
#1 (lstart is saved here):
+ '[' -n true ']'
++ sed -ne 's/PPid:[ \t]*//p' /proc/25854/status
+ local pppid=25852
++ mktemp /tmp/dkms_rpm_safe_upgrade_lock.25852.XXXXXX
+ local temp_dir_name=/tmp/dkms_rpm_safe_upgrade_lock.25852.R25991
+ echo e1000-7.5.5-1
+ ps -o lstart --no-headers -p 25852
#2 (lstart is compared here and everything is fine):
+ '[' -n true ']'
++ cat /proc/26683/status
++ grep PPid:
++ awk '{print $2}'
+ local pppid=25852
++ ps -o lstart --no-headers -p 25852
+ local 'time_stamp=Mon Aug 20 15:07:16 2007'
++ ls /tmp/dkms_rpm_safe_upgrade_lock.25852.R25991
++ head -n 1 /tmp/dkms_rpm_safe_upgrade_lock.25852.R25991
+ lock_head=e1000-7.5.5-1
++ tail -n 1 /tmp/dkms_rpm_safe_upgrade_lock.25852.R25991
+ lock_tail=Mon Aug 20 15:07:16 2007
+ '[' e1000-7.5.5-1 == e1000-7.5.5-1 ']'
+ '[' 'Mon Aug 20 15:07:16 2007' == 'Mon Aug 20 15:07:16 2007' ']'
+ '[' -n 'Mon Aug 20 15:07:16 2007' ']'
+ echo ''
+ echo 'DKMS: Remove cancelled because --rpm_safe_upgrade scenario
detected.'
DKMS: Remove cancelled because --rpm_safe_upgrade scenario detected.
+ rm -f /tmp/dkms_rpm_safe_upgrade_lock.25852.R25991
+ exit 0
Debug output from a bad run:
#1 (grabs lstart here):
+ '[' -n true ']'
++ sed -ne 's/PPid:[ \t]*//p' /proc/28496/status
+ local pppid=28494
++ mktemp /tmp/dkms_rpm_safe_upgrade_lock.28494.XXXXXX
+ local temp_dir_name=/tmp/dkms_rpm_safe_upgrade_lock.28494.y28633
+ echo e1000-7.5.5-1
+ ps -o lstart --no-headers -p 28494
#2 (compares incorrectly here):
+ '[' -n true ']'
++ cat /proc/29326/status
++ grep PPid:
++ awk '{print $2}'
+ local pppid=28494
++ ps -o lstart --no-headers -p 28494
+ local 'time_stamp=Mon Aug 20 15:10:32 2007'
++ ls /tmp/dkms_rpm_safe_upgrade_lock.28494.y28633
++ head -n 1 /tmp/dkms_rpm_safe_upgrade_lock.28494.y28633
+ lock_head=e1000-7.5.5-1
++ tail -n 1 /tmp/dkms_rpm_safe_upgrade_lock.28494.y28633
+ lock_tail=Mon Aug 20 15:10:31 2007
+ '[' e1000-7.5.5-1 == e1000-7.5.5-1 ']'
+ '[' 'Mon Aug 20 15:10:31 2007' == 'Mon Aug 20 15:10:32 2007' ']'
Since the times don't match exactly, my dkms driver gets removed.
More information about the DKMS-devel
mailing list