dset for CentOS 3.9? / trouble shooting hardware issues on an old server?

PJF sec-alert at socalweb.com
Tue Oct 11 19:05:51 CDT 2011


I think it's a lost cause, what's disturbing is all the LED's are green and
everything looks okay.
megarc shows all disks and the array as healthy, but a dmesg shows:

--snip--
megaraid: critical hardware error!
megaraid: reset-34311 cmd=2a <c=0 t=0 l=0>
megaraid: hw error, cannot reset
Error (-5) on journal on device 08:02
Aborting journal on device sd(8,2).
I/O error: dev 08:02, sector 2883672
I/O error: dev 08:02, sector 2887696
EXT3-fs error (device sd(65,17)): ext3_readdir: directory #2 contains a hole
at offset 0
Remounting filesystem read-only
--snip--

It's toasted, at this point I think it's safer to rebuild from backups on a
new box, smells like the actual controller is dead.

Thanks for everyone's help though.
 
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
> bounces at dell.com] On Behalf Of J. Epperson
> Sent: Tuesday, October 11, 2011 3:12 PM
> To: linux-poweredge at dell.com
> Subject: Re: dset for CentOS 3.9? / trouble shooting hardware issues on
> an old server?
> 
> I don't think there will be /dev/sdx devices for the drives behind the
> PERC, just for the arrays themselves.
> 
> dellmgr will work with the PERC 4, I think, will let you see error
> counts
> on the drives.  Google it.  You need TERM=linux and may have to muck
> around to get the megadev0 device properly created if it doesn't exist,
> although udev may take care of it.
> 
> Did you e2fsck the problem filesystem?
> 
> On Tue, October 11, 2011 16:38, Sabuj Pattanayek wrote:
> > tried running smartctl on the individual drives? You should be able
> to
> > get to them through /dev/sg? or perhaps smartctl -d megaraid,0
> > /dev/sda ? tried running memtest?
> >
> > On Tue, Oct 11, 2011 at 3:29 PM, PJF <sec-alert at socalweb.com> wrote:
> >> A clients legacy DB server crashed this morning and I found the
> >> following
> >> scrolling by on the console, not good:
> >>
> >> ext3-fs error journal aborted
> >> ext3-fs error journal aborted
> >> ext3-fs error journal aborted
> >>
> >> I rebooted, it ran for a few hours then it happened again, not good.
> >>
> >> I ran /usr/local/bin/megarc -ldInfo -a0 -Lall and all the RAID
> arrays
> >> are
> >> showing OPTIMAL, so I'm not sure what the issue is.
> >> I was expecting to see some failed disks...
> >>
> >> Smells hardware related to me.
> >>
> >> Anyone know where I can find a legacy version of dset to check the
> >> hardware
> >> on this server, or any other recommendations?
> >> None of the current versions will run, which I expected.
> >>
> >> CentOS 3.9, I believe this is a 2850 running a Perc 4e/Di
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Linux-PowerEdge mailing list
> >> Linux-PowerEdge at dell.com
> >> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
> >>
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > https://lists.us.dell.com/mailman/listinfo/linux-poweredge
> >
> >
> 
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge



More information about the Linux-PowerEdge mailing list