Possible disk corruption - help and advice appreciated
Arno van der Veen
arno.vd.veen at technologist.si
Sun Sep 19 09:07:46 CDT 2010
If it is possible to stop the ve, backup the ve as files... (eg. vm
stores by default the virtualmachines in /var/vmware/Virtual Machines/)
together with the rest of data I would put it to ext.drive.
Then at least one worry less.. :-)
what I would do then is to go to single user mode.
if possible unmount /dev/sda3 (if not part of LVM, otherwise maybe
another who knows more about LVM can jump in)
then run (as root) fsck.ext3 /dev/sda3 -p -f
this should repair at least the errors..
then later I would check diskcache settings of the raid and disks..
I would put them on writetrough.
Op Sun, 19 Sep 2010 14:41:37 +0100 "Faris Raouf" <asterisk at raouf.net>
> Dear all,
> One of our R200s seems to be having some disk problems but I don't
> understand what's happening. Any info and advice would be appreciated.
> Repairing or fault-finding linux filesystems is all new stuff to me -
> in 10 years or so I've never had to worry about it until now - so
> please be gentle with me.
> The system in question has two 500Gb SATA drives connected to a SAS
> 6/iR hardware raid controller as a RAID-1 mirrored pair.
> OMSA reports no errors but I'm seeing rather a lot of this kind of
> thing in my logs:
> EXT3-fs error (device sda3): ext3_lookup: unlinked inode 35753000 in
> dir #35752458
> The same thing happened a few weeks ago, and on rebooting I was
> horrified to find fsck reporting "Duplicate of bad block in use" then
> finding myself in a recovery console (I think?) and quickly having to
> learn a few things about fsck and to get it to repair things. It was
> reporting things like "multiply-linked blocks in inode" but after a
> lot of pressing "y" I was able to reboot. There was no apparent data
> loss, all seemed to be fine and there were no more of those
> errors...until a week or so ago.
> That's when I started getting these "unlinked inode" errors again,
> and I expect I'm going to have to go through a reboot and fsck hell
> again shortly.
> The systems runs Centos 5.5 but with a Virtuozzo (same as OpenVZ)
> 2.6.18-028stab070.2 kernel.
> I honestly don't know where to begin on this one. If there are bad
> blocks on a disk, surely OMSA would report a problem?
> What *useful* and ideally not heart stopping things should I be
> looking to do at the next reboot to try to get to the bottom of this?
> I can't begin to describe the horrors I went through the first time
> -- it was at 2am on a Saturday and all I had for help was Google and
> my co-lo company's duty engineer who did his absolute best to help
> but wasn't a Linux expert -- and I'd like to try to avoid that
> situation this time.
> The worrying thing is that I'm currently unable to backup one
> particularly vital Container (VE) on the server in question. The
> backup fails but doesn't give me any indication as to why. I would
> not be surprised if the two things were related. But it puts me in a
> chicken and egg situation which doesn't help my stress levels.
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> Please read the FAQ at http://lists.us.dell.com/faq
More information about the Linux-PowerEdge