dkms issue with "ps -o lstart" under SLES9
Silacci, Lucas
Lucas.Silacci at Teradata.Com
Mon Aug 27 13:41:55 CDT 2007
Matt,
I've given some thought to this, and here's what I've come up with...
>From what I can tell, dkms is currently using the 3-tuple of
(rpm_process_id, module_name-module_version, lstart) for the
"rpm_safe_upgrade" lock file.
The rpm_process_id and lstart pieces of that tuple tell us that the very
same rpm process executed both the "dkms add" and the "dkms remove",
including covering for process id wrap. But it seems to me that the only
reason we need to worry about the process id wrap is because of the fact
that we're stuck with the lock files hanging and not getting cleaned up
until the very next reboot.
My idea for an answer to this is to be a bit more tidy with the lock
files. If we can clean them up a bit more aggressively, then they won't
hang around to cause us issues in the future. There's two times that I
feel we can be aggressive about cleaning up "stale" lock files: 1) When
we enter the "rpm_safe_upgrade" code and the pppid for the file does not
match our current pppid and 2) When we are in the "rpm_safe_upgrade" on
a "dkms remove", and the pppid and module name match the module we are
trying to remove.
Case 1 - "rpm_safe_upgrade" with a different pppid:
Since only one rpm command can be running at a time, we know that any
lockfile with a different pppid than the current one is automatically
"stale" and can be removed.
Case 2 - "rpm_safe_upgrade" during "dkms remove":
Here if we find a lockfile that matches the module name (regardless of
version) then we know that the lockfile is potentially going to cause a
safe upgrade situation. Regardless of that fact, there is no other "dkms
remove" that is going to happen for that same module, so we can go ahead
and remove the lockfile.
With the addition of these two lockfile cleanup changes, I think we can
have the lockfile use the 3-tuple of (rpm_process_id, module_name,
module_version) and get rid of lstart completely.
-Lucas
Here's the patch I created to do this:
--- dkms.orig 2007-08-22 09:26:27.011184088 -0700
+++ dkms 2007-08-22 10:43:04.339283984 -0700
@@ -840,9 +840,16 @@
# Do stuff for --rpm_safe_upgrade
if [ -n "$rpm_safe_upgrade" ]; then
local pppid=`sed -ne 's/PPid:[ \t]*//p' /proc/$PPID/status`
+ # Clean up stale lock files
+ local lockfiles=`ls $tmp_location/dkms_rpm_safe_upgrade_lock.* |
\
+ grep -v -e
"^$tmp_location/dkms_rpm_safe_upgrade_lock\.$pppid\."`
+ for lock_file in $lockfiles
+ do
+ rm -f $lock_file
+ done
local temp_dir_name=`mktemp
$tmp_location/dkms_rpm_safe_upgrade_lock.$pppid.XXXXXX 2>/dev/null`
- echo "$module-$module_version" >> $temp_dir_name
- ps -o lstart --no-headers -p $pppid 2>/dev/null >>
$temp_dir_name
+ echo "$module" >> $temp_dir_name
+ echo "$module_version" >> $temp_dir_name
fi
# Check that this module-version hasn't already been added @@
-1656,15 +1663,17 @@
# Do --rpm_safe_upgrade check (exit out and don't do remove if
inter-release RPM upgrade scenario occurs)
if [ -n "$rpm_safe_upgrade" ]; then
local pppid=`cat /proc/$PPID/status | grep PPid: | awk
{'print $2'}`
- local time_stamp=`ps -o lstart --no-headers -p $pppid
2>/dev/null`
for lock_file in `ls
$tmp_location/dkms_rpm_safe_upgrade_lock.$pppid.* 2>/dev/null`; do
lock_head=`head -n 1 $lock_file 2>/dev/null`
lock_tail=`tail -n 1 $lock_file 2>/dev/null`
- if [ "$lock_head" == "$module-$module_version" ] && [
"$lock_tail" == "$time_stamp" ] && [ -n "$time_stamp" ]; then
- echo $""
- echo $"DKMS: Remove cancelled because
--rpm_safe_upgrade scenario detected."
- rm -f $lock_file
- exit 0
+ if [ "$lock_head" == "$module" ]; then
+ if [ "$lock_tail" == "$module_version" ]; then
+ echo $""
+ echo $"DKMS: Remove cancelled because
--rpm_safe_upgrade scenario detected."
+ rm -f $lock_file
+ exit 0
+ fi
+ rm -f $lock_file
fi
done
fi
> -----Original Message-----
> From: Matt Domsch [mailto:Matt_Domsch at dell.com]
> Sent: Friday, August 24, 2007 7:31 PM
> To: Silacci, Lucas
> Cc: DKMS-devel at dell.com
> Subject: Re: dkms issue with "ps -o lstart" under SLES9
>
> On Tue, Aug 21, 2007 at 01:01:48PM -0400, Silacci, Lucas wrote:
> > As a follow-up to my previous post, I understand that the
> root issue
> > seems to be a problem with SLES (lstart not staying
> constant), but I'm
> > curious as to why the timestamp check is necessary to begin with.
> >
> > Why not just use the module name/version and process id of the rpm
> > process?
>
> When that was written, we wanted to be sure we were in the
> same RPM transaction doing the safe upgrade. I personally
> always hated the temp file, but it was all we could come up
> with at the time, and to make it somewhat more difficult to
> be fooled, we used both the pid and the lstart time of the
> parent. That it fails on SLES9 is a new bug, so we'll have
> to come up with something else. I'm open to advice.
>
> (pid space isn't all that large, hence wanting to add
> something more likely to be unique: pid + lstart was thought
> to have the right property).
>
> Thanks,
> Matt
>
>
> --
> Matt Domsch
> Linux Technology Strategist, Dell Office of the CTO
> linux.dell.com & www.dell.com/linux
>
More information about the DKMS-devel
mailing list