Ubuntu 7.04 and PE SC1435

Vanush "Misha" Paturyan misha at cs.nuim.ie
Thu Sep 18 09:22:01 CDT 2008


Hi Ramiro,

You need to provide more info on the setup:
1) are / and /home local to each node or they're shared from  
centralized location somehow?
2) what does jfs error look like (you only provided ext3 error)

Cheers,

Misha.


On 18 Sep 2008, at 11:23, Ramiro Alba Queipo wrote:

> Hello everybody:
>
> We have an infiniband cluster built from PE SC1435 servers under  
> Ubuntu
> 7.04 and using OpenMPI 1.2.5 (not in official distribution) with
> Mellanox infiniband cards of 20 Gb/s ( MT25204 [InfiniHost III Lx  
> HCA]).
>
> Both hardware tests (full DELL diagnostics) and software tests (hpl,
> NPG-MI NAS) seem to be OK, but every now and then the / and/or /home
> file systems are remounted read only by the system with many files
> corrupted. Then, the system must be reinstalled.
> I tried both jfs and ext3 file systems, but the results a similar. In
> the case of ext3 I've got:
>
> [46509.378381] EXT3-fs error (device sda1): htree_dirblock_to_tree:  
> bad
> entry in directory #99737: rec_len is smaller t
> han minimal - offset=0, inode=0, rec_len=0, name_len=0
> [46509.378494] Aborting journal on device sda1.
> [46509.378722] Remounting filesystem read-only
>
> This node has been reinstalled from scratch, the same day it failed
>
> I am quite confused, as if not a hardware failure, (DELL  
> diagnostics are
> OK), how can a user process corrupt the / file system.
>
> Any comment/advice would be very appreciated.
>
> Thanks in advance
>
> Regards
>
>
> -- 
> Aquest missatge ha estat analitzat per MailScanner
> a la cerca de virus i d'altres continguts perillosos,
> i es considera que està net.
> For all your IT requirements visit: http://www.transtec.co.uk
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq

Vanush "Misha" Paturyan
Senior Technical Officer
Comptuer Science Department
NUI Maynooth








More information about the Linux-PowerEdge mailing list