Memory performance on PE R610 (Adv ECC vs Optimizer)

JACOB_LIBERMAN at Dell.com JACOB_LIBERMAN at Dell.com
Fri May 29 14:50:10 CDT 2009


Hi Stephen,

I don’t have WRF data on this platform yet.  We should have some interconnect focused WRF data soon on 11g.

For some general guidance, take a peek at these:
http://hpcadvisorycouncil.mellanox.com/pdf/2009%20LCI%20International%20-%20WRF%20Model.pdf
http://hpcadvisorycouncil.mellanox.com/pdf/preso_HPC_AC_WRF_Model.pdf 

John Michalakes co-authored the paper, so I assume the folks at UCAR know this stuff already.

You can calculate memory bandwidth as [ channels * DIMM Speed * 8 bits per byte ].

So you can theoretically see up to 31.99 GB/s per memory controller in a balanced, optimized config.

The important bit for HPC users is to think in 3s rather than 2s.  3 GB/core at 1066 is the sweet spot for many hpc apps. 6 x 4 GB DIMMs works well in several directions:  performance, performance/watt, performance/$$.

I do not recommend 1 CPU configs for most HPC apps, tho. Its actually more cost effective to run 2 simultaneous jobs within the same 2 cpu node then 2 jobs across 2 1-CPU nodes.  

Thanks, Jacob




> -----Original Message-----
> From: Stephen Dowdy [mailto:sdowdy at ucar.edu]
> Sent: Friday, May 29, 2009 2:21 PM
> To: Liberman, Jacob
> Cc: linux-poweredge-Lists
> Subject: Re: Memory performance on PE R610 (Adv ECC vs Optimizer)
> 
> Stephen Dowdy wrote, On 05/29/2009 12:49 PM:
> 
> > For comparison...
> > I only get about 8GB/s using OpenMP threaded version of STREAM, but
> > with 1 proc module, and 1066MHz DIMMs. Should i be seeing better
> > than this, and if so, why am i not? (from what you show above, i
> > expect to see roughly:
> >    0.5 (one socket) * 35000 (yields ~ 17GB/sec) \
> >      * 0.8 (1066/1333 scale memspeed) => ~13GB/sec
> > Should tri-channel (your setup) versus dual-channel (mine)
> > have that much impact?
> 
> Dual-Channel (8GB) shows:
> > -------------------------------------------------------------
> > Function      Rate (MB/s)   Avg time     Min time     Max time
> > Copy:        7964.6757       0.0645       0.0643       0.0647
> > Scale:       7783.5016       0.0660       0.0658       0.0662
> > Add:         8723.6035       0.0884       0.0880       0.0890
> > Triad:       8835.4905       0.0875       0.0869       0.0881
> > -------------------------------------------------------------
> 
> reconfiguring the DIMMs by pulling 1 to go to tri-channel
> and 6GB does indeed scale out as expected!
> -------------------------------------------------------------
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:       11869.7313       0.0435       0.0431       0.0440
> Scale:      12100.3400       0.0426       0.0423       0.0430
> Add:        13497.3580       0.0571       0.0569       0.0575
> Triad:      13603.7800       0.0567       0.0565       0.0575
> 
> The reason i was want to believe this was that Tom's Hardware
> or anandtech did a *windows* based test of the Nehalem and
> showed very little bandwidth improvement from dual to tri
> channel. (and a significant latency increase)
> 
> Perhaps that was an earlier revision of Nehalem-EP's IMC, or
> Windows has some issue?
> 
> thanks,
> --stephen
> 
> --
> Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
> 303.497.2869   -  sdowdy at ucar.edu        -
> http://www.ral.ucar.edu/~sdowdy/




More information about the Linux-PowerEdge mailing list