Memory performance on PE R610 (Adv ECC vs Optimizer)
JACOB_LIBERMAN at Dell.com
JACOB_LIBERMAN at Dell.com
Fri May 29 14:50:10 CDT 2009
Hi Stephen,
I don’t have WRF data on this platform yet. We should have some interconnect focused WRF data soon on 11g.
For some general guidance, take a peek at these:
http://hpcadvisorycouncil.mellanox.com/pdf/2009%20LCI%20International%20-%20WRF%20Model.pdf
http://hpcadvisorycouncil.mellanox.com/pdf/preso_HPC_AC_WRF_Model.pdf
John Michalakes co-authored the paper, so I assume the folks at UCAR know this stuff already.
You can calculate memory bandwidth as [ channels * DIMM Speed * 8 bits per byte ].
So you can theoretically see up to 31.99 GB/s per memory controller in a balanced, optimized config.
The important bit for HPC users is to think in 3s rather than 2s. 3 GB/core at 1066 is the sweet spot for many hpc apps. 6 x 4 GB DIMMs works well in several directions: performance, performance/watt, performance/$$.
I do not recommend 1 CPU configs for most HPC apps, tho. Its actually more cost effective to run 2 simultaneous jobs within the same 2 cpu node then 2 jobs across 2 1-CPU nodes.
Thanks, Jacob
> -----Original Message-----
> From: Stephen Dowdy [mailto:sdowdy at ucar.edu]
> Sent: Friday, May 29, 2009 2:21 PM
> To: Liberman, Jacob
> Cc: linux-poweredge-Lists
> Subject: Re: Memory performance on PE R610 (Adv ECC vs Optimizer)
>
> Stephen Dowdy wrote, On 05/29/2009 12:49 PM:
>
> > For comparison...
> > I only get about 8GB/s using OpenMP threaded version of STREAM, but
> > with 1 proc module, and 1066MHz DIMMs. Should i be seeing better
> > than this, and if so, why am i not? (from what you show above, i
> > expect to see roughly:
> > 0.5 (one socket) * 35000 (yields ~ 17GB/sec) \
> > * 0.8 (1066/1333 scale memspeed) => ~13GB/sec
> > Should tri-channel (your setup) versus dual-channel (mine)
> > have that much impact?
>
> Dual-Channel (8GB) shows:
> > -------------------------------------------------------------
> > Function Rate (MB/s) Avg time Min time Max time
> > Copy: 7964.6757 0.0645 0.0643 0.0647
> > Scale: 7783.5016 0.0660 0.0658 0.0662
> > Add: 8723.6035 0.0884 0.0880 0.0890
> > Triad: 8835.4905 0.0875 0.0869 0.0881
> > -------------------------------------------------------------
>
> reconfiguring the DIMMs by pulling 1 to go to tri-channel
> and 6GB does indeed scale out as expected!
> -------------------------------------------------------------
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 11869.7313 0.0435 0.0431 0.0440
> Scale: 12100.3400 0.0426 0.0423 0.0430
> Add: 13497.3580 0.0571 0.0569 0.0575
> Triad: 13603.7800 0.0567 0.0565 0.0575
>
> The reason i was want to believe this was that Tom's Hardware
> or anandtech did a *windows* based test of the Nehalem and
> showed very little bandwidth improvement from dual to tri
> channel. (and a significant latency increase)
>
> Perhaps that was an earlier revision of Nehalem-EP's IMC, or
> Windows has some issue?
>
> thanks,
> --stephen
>
> --
> Stephen Dowdy - Systems Administrator - NCAR/RAL
> 303.497.2869 - sdowdy at ucar.edu -
> http://www.ral.ucar.edu/~sdowdy/
More information about the Linux-PowerEdge
mailing list