RHEL 6.0 + PowerEdge issues galore

Daryl Herzmann akrherz at gmail.com
Mon Jan 23 08:46:43 CST 2012


Howdy,

Lets bump again to see how things are going :)

On Wed, Aug 24, 2011 at 9:04 AM, Daryl Herzmann <akrherz at gmail.com> wrote:
> Hello,
>
> Figured I would follow up on these issues for posterity's sake...
>
> On Fri, May 6, 2011 at 3:40 PM, Daryl Herzmann <akrherz at gmail.com> wrote:
>> Hi all,
>>
>> I dunno what to do :( I've been a Linux admin for 15 years now, have a
>> RHCE and have been mostly using poweredge equipment the entire time.
>> Things have been mostly great.  I've been migrating to RHEL 6.0 and have
>> had nothing but problems.  I dunno what else to do other than whine on an
>> email list and see if anybody has suggestions on what to do.  Maybe once
>> RHEL 6.1 comes out next week, things will magically start working.
>> Anyway, here is the laundry list...
>>
>> - T410 Locks up at seemingly random times with messages like:
>>    CPU#2 Stuck for 67s
>>  I filed this in redhat's bugzilla
>>  https://bugzilla.redhat.com/show_bug.cgi?id=674427
>
> still happening...

I don't think I have seen this in a while.

>> - R510 has a kernel panic at random times and with messages like:
>>  invalid opcode: 0000 [#1] SMP
>>  last sysfs file:
>>  /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map CPU 6
>>  https://bugzilla.redhat.com/show_bug.cgi?id=702456
>
> knock on wood, I haven't seen this in 4 weeks.

Saw it last week again with current RHEL6.2 kernel :(

>> - T410's will not install BIOS upgrades via OMSA6.5 64bit
>>  (update_firmware).  I have to manually
>>  download PET410_BIOS_LX_1.6.3.BIN  and install some 32bit packages to
>>  make it work.  I tested it on RHEL5 and it worked just fine there...
>
> Still happens with 6.5.1

I have not tested with 6.5.3, as I upgraded all mine manually.

>> - PE1900 that has brutal IO performance and would crash with any attempt
>>  to do heavy IO (backups) over NFS.
>>  https://bugzilla.redhat.com/show_bug.cgi?id=678766
>
> Still happens, Red Hat did triage the bug.

Still seeing a kernel dump, but it doesn't crash the system anymore.
Will have to look into it more.

>> - PE2950 will not recognize non-dell 2 TB in caddy #1 (drive2) at BIOS
>>  post, but the drive magically appears after 10-60 minutes of OS system
>>  boot.  I whined about this issue before on here:
>>
>> http://lists.us.dell.com/pipermail/linux-poweredge/2011-April/044705.html
>>
>>  The worse problem is that this system will now crash after periods of
>>  heavy IO and now will crash when I attempt to resync the array.  Oye.
>
> Gave up and stuck in smaller drives.  No issues.

I am starting to wonder about my choice for using RHEL's supplied hard
drive controller kernel modules (for example megaraid or mptlinux).
Downloading the drivers from the support website and using those
modules have been much better experience.  Perhaps I am being naughty
for having openmanage update my firmware, but still using old RHEL
kernel modules.

>> - I have 3 R510s that have each lost one or two of their internal 2.5" HD
>>  drives. 2 of the machines lost both drives at the same time, the other
>>  just lost one.  And now a machine where I replaced the two drives has
>>  lost another :(  Anyway, when this happens, the OS locks up.  Augh.
>
> Lost one more drive, but the rate has slowed down a bunch!

Knocking on wood here, still okay.

Then there's the infamous Nahalem (HPET/cstates/other terms I don't
understand) timer bug, which is private and can't find updated
information on besides the knowledge base article.

https://bugzilla.redhat.com/show_bug.cgi?id=710265

Fun never stops! :)

daryl



More information about the Linux-PowerEdge mailing list