PowerEdge R510 hangs on uptime > 200 days

Sven Ulland sveniu at opera.com
Sun Mar 4 04:26:38 CST 2012


On 03/04/2012 11:18 AM, Ruediger Gunreben wrote:
> But now - about 200 days later the machines / and also other
> machines which already had the bios version 1.6.3 crashes without
> any notification in logfiles (we also have drac log which keep no
> information than unknown system halt (or similar))

Which kernel version are you running? We've had lots of crashed
servers where the common issue seemed to be 200+ days of uptime.
Did you manage to pull out a kernel backtrace?

It could be it's fixed in 3.2, but due to the nature of the problem,
we haven't been able to verify it yet. See the following:

[1]: lkml: 2.6.32.21 - uptime related crashes?
http://marc.info/?l=linux-kernel&m=132287065112937&w=2

..which points to:

[2]: sched, x86: Avoid unnecessary overflow in sched_clock
http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
(Committed in v3.2-rc1-314-g4cecf6d)

There's also:

[3]: divide by zero bug in find_busiest_group (actually inlined 
update_sg_lb_stats)
https://bugzilla.kernel.org/show_bug.cgi?id=16991

Sven



More information about the Linux-PowerEdge mailing list