Memory performance on PE R610 (Adv ECC vs Optimizer)
Stephen Dowdy
sdowdy at ucar.edu
Fri May 29 13:49:59 CDT 2009
JACOB_LIBERMAN at dell.com wrote, On 05/29/2009 11:49 AM:
> If its memory bandwidth you're after, populate 1 DIMM per channel per socket across both sockets. (6 DIMMs total)
>
> On an R610 with 6 1333 MHz UDIMMs you should expect stream bandwidth of ~36 GB/s. (BIOS 1.0.4 and 1.1.4, 8 threads)
>
> Copy 36577
> Scale 36212
> Add 34232
> Triad 35240
> I have heaps of performance data if you need design recommendations for a particular application.
Jacob,
Send it all! ;)
This isn't my projects' system, so i'm not sure what they're doing
with it. My projects would be doing WRF. So, btw, do you have
performance data on the R610 vs R410 for HPCC WRF applications?
Seems there should be no performance penalty if using the same
processors and a single row of DIMMs optimized to the specific
processor module in each scenario, right?
For comparison...
I only get about 8GB/s using OpenMP threaded version of STREAM, but
with 1 proc module, and 1066MHz DIMMs. Should i be seeing better
than this, and if so, why am i not? (from what you show above, i
expect to see roughly:
0.5 (one socket) * 35000 (yields ~ 17GB/sec) \
* 0.8 (1066/1333 scale memspeed) => ~13GB/sec
Should tri-channel (your setup) versus dual-channel (mine)
have that much impact?
This system's Configuration:
R610
single E5530 module
4x 2GB 1066MHz DIMMs
# grep 'model name' /proc/cpuinfo | head -1
model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
# /root/ssi.sh | grep ssi_cpu
ssi_cpu_cap_ht=1 * can do hyperthreading *
ssi_cpu_cap_nx=1 * can do NoExecute *
ssi_cpu_cap_pae=1 * can do PAE *
ssi_cpu_cap_vmx=1 * Vanderpool VM *
ssi_cpu_clock=2393.999
ssi_cpu_core_count=4 * total cores *
ssi_cpu_cores_per_chip=4 * cores / socket *
ssi_cpu_count=8 * total "cpus" *
ssi_cpu_is_ht=1 * hyperthread/SMT enabled *
ssi_cpu_is_mc=1 * is multicore *
ssi_cpu_siblings=8 * 8 "processors" *
ssi_cpu_sockets=1 * 1 socket occupied *
# dmidecode -t memory | agrep -d'^Handle' -v 'Size: No Module' | egrep '^[[:space:]]*(Size|Locator|Speed)' | paste - - -
Size: 2048 MB Locator: DIMM_A1 Speed: 1066 MHz (0.9 ns)
Size: 2048 MB Locator: DIMM_A2 Speed: 1066 MHz (0.9 ns)
Size: 2048 MB Locator: DIMM_A4 Speed: 1066 MHz (0.9 ns)
Size: 2048 MB Locator: DIMM_A5 Speed: 1066 MHz (0.9 ns)
(this is Optimizer Mode configuration)
Stream test with N=32000000
thrust:stream# OMP_NUM_THREADS=4 ./stream_c_omp.exe
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 32000000, Offset = 0
Total memory required = 732.4 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
-------------------------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 45081 microseconds.
(= 45081 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 7964.6757 0.0645 0.0643 0.0647
Scale: 7783.5016 0.0660 0.0658 0.0662
Add: 8723.6035 0.0884 0.0880 0.0890
Triad: 8835.4905 0.0875 0.0869 0.0881
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
thanks,
--stephen
--
Stephen Dowdy - Systems Administrator - NCAR/RAL
303.497.2869 - sdowdy at ucar.edu - http://www.ral.ucar.edu/~sdowdy/
More information about the Linux-PowerEdge
mailing list