Disconcerting journal commit I/O error RHEL+2.6.9-34.0.2

Nicky Peeters nicky.peeters at pandora.be
Fri Aug 4 03:52:18 CDT 2006


Well, I've just had one sever go bongo on a similar issue.

It's a PE 2850 with 6 disks in RAID10, running RHEL4 X86_64.
And the only machine I upgraded the kernel to 2.6.9-34.0.2.ELsm (22  
days ago)

The FS seems RO now, but since I can't get a root shell running (bus  
errrors) I need to schedule a datacenter trip to know more.

Dmesg output:

EXT3-fs error (device dm-1) in start_transaction: Journal has aborted
EXT3-fs error (device dm-1) in start_transaction: Journal has aborted
EXT3-fs error (device dm-1) in start_transaction: Journal has aborted
EXT3-fs error (device dm-1) in start_transaction: Journal has aborted
scsi0 (0:0): rejecting I/O to offline device
EXT3-fs error (device dm-1): ext3_find_entry: reading directory  
#15450113 offset 0

scsi0 (0:0): rejecting I/O to offline device
EXT3-fs error (device dm-1): ext3_find_entry: reading directory  
#15450118 offset 0
...

Let me know if you know more, I'm beginnen to suspect our kernels to  
be the culprits !

On 28 Jul 2006, at 15:12, Jason Young wrote:

> Hi all,
>
> Two weeks ago, right after the Red Hat Kernel update to
> 2.6.9-34.0.2.ELsmp (RHEL4) I started getting a journal commit I/O
> error on two of my servers with the srvadmin-all rpm's installed
> (version 5).
>
> One was a 2800 running WS the other a 2850 running AS - both with the
> OEM PERC controllers that came with the servers.     All firmware/
> bios updates are up to the latest release versions available.
>
> The error came after a moderate amount of writes (either installing
> ruby on the freshly reinstalled 2850 or processing some webstats with
> awstats on the 2800) - and when the journal commit error occurs -
> every mounted volume goes read only - which obviously wreaks havoc on
> the running operating system.   The problem occurred twice on the
> 2800, and once on the 2850.    It was not (yet) occurring on my other
> 2850's and 1850's - running RHELv4, ws and as both - also with the
> version srvadmin-all rpm's (and the srvadmin-rac4 RPM's where
> appropriate).  Those were/are still running 2.6.9-34.0.1
>
> My filesystem is a normal primary ext3 /boot, and the rest of the
> RAID (either all RAID5 or a two disk RAID1 and 3 disk RAID5  on the
> six-drive 2850) is a PV with various sized LVM2 logical volumes for
> slash, /var, /home, etc.
>
> The problem freaked me out more than a little, the two servers it was
> happening on are not-yet-production, and obviously the last thing I
> needed was the problem to spread to production systems.  There's no
> logs obviously, because /var goes read-only like everything else.
>
> Grasping at "what changed" straws - I froze going to kernel
> 2.6.9-34.0.2 everywhere else - and proceeded to pull the Dell
> srvadmin RPM's everywhere (I know that openipmi is a kernel module,
> and didn't want it to be a question mark).
>
> - No problems on the 2.6.9-34.0.2 boxes since I pulled openipmi and
> the other rpm's.
> - Still no problems with the 2.6.9-34.0.1 boxes.   I have a few
> vmware (esx) VM's that have gone to 2.6.9-34.0.2 without problem, but
> no other physical servers.
>
> I'd like to put the Dell software back, because I like it, and I'm
> not sure it's the culprit at all.   But I'm a bit gunshy at the
> moment, and like the fact that the filesystems aren't "locking up" on
> me anymore.   But I'm a bit of a loss to troubleshoot the problem
> since there's nothing that can get logged when it happens.    Logs up
> until it happens didn't give me any indication of a pending problem.
>
> Thoughts?  ideas?
>
> Jason
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Jason Young --  Systems Manager, eXtension
>   http://about.extension.org/wiki/Jason_Young
> ______________________________________
>
>
>
>
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
>



More information about the Linux-PowerEdge mailing list