linux freez on 3 Poweredge R310
sdowdy at ucar.edu
Sat Apr 7 13:13:13 CDT 2012
Francesco Andreozzi wrote, On 04/07/2012 09:03 AM:
> i know how rrd works!
> i know i can convert with dump and restore... but our it is a complex
> scenario and it is not easy to convert to 64bit rrd because lot of rrd
> are process on one machine than share to other... everything is 32 bit.
> On each machine i have something like 1TB of rrd ;)
> We need the machine working :D
> Someone use a 32 bit kernel on poweredge R310? i really need to solve
> the problem asap!
> today another freez happen!!
You never did mention how much memory is in your machine. But,
really, if it's > 16GB as another poster mentioned, life's gonna
suck. You also didn't really indicate the full function of these
servers, other than they run RRD. You could certainly build a
32-bit runtime for your RRD code (using -m32 and including all the
ia32* libs). We run all of our systems w/ 64-bit kernels now, unless
the h/w can't support it (mostly a few of our laptops and really
old Precision systems) You could also run a 64-bit host system and
run 32-bit VM's on it for your RRD work.
You can always pass 'mem=4gb' or such to the bootparams to see if your
memory size is the problem. (if the system doesn't crash, that's at
Otherwise, try increasing the value of 'vm.min_free_kbytes' in sysctl.
I've seen NFS servers deadlock under pressure because the kernel
gets to a point where it has to allocate memory to free memory, and
it can't. This tuning knob is basically a slop reserve buffering
space to keep the system from getting its head too far underwater.
As i understand it, this value is split amongst the various memory
zones, including the LOWMEM zone which is going to be terribly
stressed if you have > 16GB of RAM on a PAE bigmem kernel. I'll bet
that the current value is ~8MB. I'd try upping that to 16MB, though
on a bigmem kernel i think setting that too high can also make
things worse. (i just checked a 32-bit Debian Squeeze VM and it's
got about 4MB set for min_free_kbytes, so maybe try just doubling
# sysctl vm.min_free_kbytes
vm.min_free_kbytes = 11496
# sysctl vm.min_free_kbytes=24000
vm.min_free_kbytes = 24000
You should watch your lowmmem to see whether that's what's borking you:
(check that every 10 seconds or something, perhaps using RRD ;), and
see if before the hang, if it is dropping, but check via ssh from
another system or fsync() the output or something to ensure it
gets to disk)
32bitbob:~# grep -i low /proc/meminfo
LowTotal: 775512 kB (775MB of 1GB LOWMEM avail to kern)
LowFree: 9780 kB (10MB free now)
in this example, there's only 10MB free in LowMem due to all the
memory map structures to handle the highmem zones. LOWMEM is
required for certain memory allocation types.(PCI DMA transfers for
some PCI cards, network buffers, ext3 inode cache, etc) IIRC, with
16GB of RAM in your system, 128MB of the 1GB of LOWMEM is consumed
just for page table entries. There's loads of other section of
that memory reserved for things like video buffers and the like,
(I'm presuming if you're doing RRD you're also potentially doing
heavy networking, which will play into LOMWEM exhaustion)
More information about the Linux-PowerEdge