server harware issues

Scott R. Ehrlich scott at MIT.EDU
Fri Aug 15 09:51:23 CDT 2008


Also check the logs for SMART errors.  I have yet to see a high-end 
server, or Linux distro, that has SMART not available or disabled.

Scott

On Fri, 15 Aug 2008, Brad Hein wrote:

> Forgive me for not reading the whole message, but you mentioned inode error,
> and that means Linux can't read from a spot on the disk. If I see a single
> iNode error I immediately go into "back up everything right now because it
> might not be there much longer" mode. You should also consider going RAID-1
> for redundancy and configure either the DRAC (if you have one) to send
> yourself failure alerts, or use openmanage. I wrote some scripts that
> interface between Nagios and command-line openmanage that serve that purpose
> nicely. I'll share with anyone who asks.
> Also note that if you do have a DRAC card, get in there and take a look at
> the log, there may be useful information. On the other hand the iNode errors
> are almost certainly indication of drive (hardware) failure.
>
>
> On Fri, Aug 15, 2008 at 9:47 AM, Paul A <razor at meganet.net> wrote:
>
>> Hi, first thanks for taking the time to read this.
>>
>> I'm hoping someone here can send me in the right direction. I have a PE
>> 2950
>> that's being used as a mail scanner using RAID 5 with a hotswap drive.
>>
>> This server was running great for about a year but within the last 4 months
>> I had to manually reboot a couple of times as the server just locks up. The
>> first time it happened I notice a yellow light on drive 0 and on the
>> console
>> I was getting a message like, cant read inode #, over and over again.
>>
>> When I rebooted the sever dirve 0 started working again and the server was
>> fine for about a month. Since then the server had to be rebooted two more
>> times, however the yellow like on drive 0 is not coming up. But on the
>> console im still getting cant read inod #...
>>
>> When I previously looked at the logs I did see the controller complaining
>> about the firmware version that I was using so I upgraded it using
>> FRMW_LX_R169302.BIN, thinking that this would solve the issue but it hasn't
>> and now im getting a firmware error on the logs, see below, that I wasn't
>> getting before.
>>
>>> From the logs below im assuming that drive 0 is on its way out, correct?
>> If so what should I do, just let it die out and replace?
>>
>> What am I getting firmware errors and what should I do to fix that issue?
>>
>> Is the main problem drive 0 or is there a bigger controller issue going on
>> here and what should I do?
>>
>>
>> 2335 Thu Aug 14 21:34:44 2008Storage ServiceController event log:
>> Predictive
>> failure: PD 00(e1/s0): Controller 0 (PERC 5/i Integrated)
>>
>> 2334 Thu Aug 14 21:34:43 2008Storage ServiceController event log:
>> Unexpected
>> sense: PD 00(e1/s0), CDB: 28 00 0e 9f 9f 6d 00 00 08 00, Sense: 70 00 01 00
>> 00 00 00 0a 00 00 00 00 5d 02 00 00 00 0: Controller 0 (PERC 5/i
>> Integrated)
>>
>> 2336 Thu Aug 14 21:34:45 2008Storage ServiceController event log: Fatal
>> firmware error: Driver detected possible FW hang, halting FW. : Controller
>> 0
>> (PERC 5/i Integrated)
>>
>> 2336 Thu Aug 14 21:34:45 2008Storage ServiceController event log: Fatal
>> firmware error: Line 205 in ../../raid/mfihw.c : Controller 0 (PERC 5/i
>> Integrated)
>>
>> Fri Aug 15 09:00:54 2008Storage ServiceSCSI sense data Sense key: 5 Sense
>> code: 24 Sense qualifier: 0: Physical Disk 0:0:0 Controller 0, Connector 0
>>
>> 2095 Fri Aug 15 09:00:54 2008Storage ServiceSCSI sense data Sense key: 5
>> Sense code: 24 Sense qualifier: 0: Physical Disk 0:0:1 Controller 0,
>> Connector 0
>>
>>
>> Below is the system information
>>
>>
>> BIOS Information
>>        Manufacturer            Dell Inc.
>>        Version         1.3.7
>>        Release Date            03/26/2007
>>
>>
>> ID      0
>> Name    PERC 5/i Integrated
>> State   Ready
>> Firmware Version        5.2.1-0067
>> Driver Version  00.00.03.15-RH1
>>
>> Operating System
>>        Name            CentOS
>>        Version         release 5.2 (Final) Kernel 2.6.18-92.1.6.el5PAE
>> (i686)
>>        System Time             Fri Aug 15 09:26:34 2008
>>        System Bootup Time              Thu Aug 14 21:32:41 2008
>>
>>
>> Thanks,
>>
>> Paul
>>
>>
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> Linux-PowerEdge at dell.com
>> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq
>>
>



More information about the Linux-PowerEdge mailing list