serious stability issues with Dell C6145 and C410x
Stijn De Weirdt
stijn.deweirdt at ugent.be
Fri Jul 29 11:11:38 CDT 2011
> > we run a recompiled 2.6.32-131.4 and saw that this really mattered a lot
> > wrt compute times. the main changes were to disable no_hz and set the
> > cpu_freq to 100Hz. (we also stripped a lot of unnecessary stuff from the
> > default kernels (these are compute nodes after all).
> > (the bios settings are performance, so no power saving features enabled)
> Well, I was hoping to avoid making these somewhat
> Frankenstein-ish by running custom compiled kernels or installing
> non-package managed software (as much as possible).
we have rpms of these kernels ;)
> But it might
> be unavoidable unfortunately. I guess I can start with disabling
> dynamic ticks via the kernel command line though since it should
> be rather painless (assuming nohz=off works to disable it
it should. (default timing is 1kHz though)
> > we also had performance issues with the raid0 of the SAS2008 cards we
> > have. new firmware fixed that, but it was not standard (we got help from
> > dell support though)
> We're just doing RAID-5 here. I did notice that the RHEL
> 6.1 formatting of our ext4 root file system took significantly
> longer than the Debian testing installer's format. I couldn't
> tell if RHEL simply wasn't using sparse_super or if it was an
> actual problem with the megaraid_sas driver in the RHEL kernel.
> I assume you're running a direct from LSI firmware now?
yes, but ours are simple JBOD/Raid0/Raid1 cards, no real raid
controllers. performance of raid0 with 2 15k rpm sas was 50MB/s (dd,
direct write flag). after the update it's 300+
> > for now things are starting to look good, my only remaining issue with
> > the boxes is that i can get the pcie max payload higher then 128byte on
> > our IB cards (something also important for your setup i assume).
> Just getting the GPGPU cards stable in the C410x
> enclosure is the first step. I'm not at all concerned about
> performance right now if trying to use the cards at all means the
> system locks up.
More information about the Linux-PowerEdge