any advice to find root cause of "Falling back to HPET" ?

Bond Masuda bond.masuda at jlbond.com
Sat May 22 14:30:04 CDT 2010


Hello,

I'd appreciate any help/advice anyone can provide regarding our issue. I've
run out of ideas on this one...

We have two identical PowerEdge 2950, one is called s7 and the other is s8.
Both are web servers running Apache and PHP. We first noticed the problem
because our benchmarking showed drastically different results between the
two servers. With s7, we were able to get 180 requests/sec while on s8 we
only get 35 request/sec (and now only 15 requests/sec - more on that below).
After this, we became aware that almost all tasks on s8 were slower than s7,
whether it is CPU bound or I/O bound, everything we tried was slower on s8
than on s7 (untar'ing archives, running md5 hashes, etc).

I started digging around. Both servers are identical in terms of software
and configuration (other than things like hostname and IP addresses). Both
servers are RHEL4U8, kernel-2.6.9-89.0.25.ELsmp, x86_64, exact same packages
and exact same versions. I even ran 'rpm --verify' on all packages and
didn't find anything unusual on both s7 and s8.

The ONLY error message I'm seeing that is unique to s8 are the following
messages in dmesg:

Losing some ticks... checking if CPU frequency changed.
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip __do_softirq+0x4d/0xd0
Falling back to HPET

Some google searching found:

https://bugzilla.redhat.com/show_bug.cgi?id=429010

which refers to:

https://bugzilla.redhat.com/show_bug.cgi?id=248488

But that seems to refer to problems with virtualization. This is on real
hardware.

What we don't understand is that s7 does *not* exhibit any slowness nor the
messags above, only s8. Again, both are identical.

So, thinking this might be a hardware issue, we asked our hosting company to
pull the drives out of s8 and replace the entire chassis. After replacing
the entire chassis of s8, we are still getting the above messages in dmesg.
Not only that, things have gotten worse... our benchmarking (using 'ab') now
shows the server can only do 15 requests/sec (all these test were run
locally on loopback to avoid any network related issue).

Since the chassis was swapped, we feel that it probably isn't a hardware
issue. But we have s7 which is configured identically to s8 that doesn't
have this issue, so it is hard to say that it is a software issue.

Any advice? What can I do to find the root cause?

TIA,
-Bond







More information about the Linux-PowerEdge mailing list