Ubuntu 7.04 and PE SC1435

Ramiro Alba Queipo raq at cttc.upc.edu
Thu Sep 18 05:23:02 CDT 2008


Hello everybody:

We have an infiniband cluster built from PE SC1435 servers under Ubuntu
7.04 and using OpenMPI 1.2.5 (not in official distribution) with
Mellanox infiniband cards of 20 Gb/s ( MT25204 [InfiniHost III Lx HCA]).

Both hardware tests (full DELL diagnostics) and software tests (hpl,
NPG-MI NAS) seem to be OK, but every now and then the / and/or /home
file systems are remounted read only by the system with many files
corrupted. Then, the system must be reinstalled.
I tried both jfs and ext3 file systems, but the results a similar. In
the case of ext3 I've got:

[46509.378381] EXT3-fs error (device sda1): htree_dirblock_to_tree: bad
entry in directory #99737: rec_len is smaller t
han minimal - offset=0, inode=0, rec_len=0, name_len=0
[46509.378494] Aborting journal on device sda1.
[46509.378722] Remounting filesystem read-only

This node has been reinstalled from scratch, the same day it failed

I am quite confused, as if not a hardware failure, (DELL diagnostics are
OK), how can a user process corrupt the / file system.

Any comment/advice would be very appreciated.

Thanks in advance

Regards 


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk



More information about the Linux-PowerEdge mailing list