Clovertown HPL scores

Robin Humble rjh+dellpe at cita.utoronto.ca
Mon Mar 26 21:30:08 CST 2007


On Mon, Mar 26, 2007 at 07:38:51PM -0700, Kilian CAVALOTTI wrote:
>efficiency out of one single host. They are PowerEdge 1950, with two E5345 
>(Clovertown, quad-core, 2.33GHz) and 16GB memory each. So, if I made no 
>mistake, for a single host, the theoretical performance should be:
>2 (CPUs) x 4 (cores) x 4 (ops/cycle) x 2.33G (cycles/s) = 74.56 Gflop/s

yup

>I compiled xhpl against the GotoBLAS library, and use LAM MPI. I played a 
>little bit with HPL.dat values (see below), and experimentally, 
>the best score I get, running 8 jobs on the same host, is 52 Gflops/s. 
>That's about 70% efficiency, which seems a little low to me. I would have 
>expected something more in the 80-90% range. 

I'd say 70-80% is the best you could expect.
on a single core you can expect close to 90%, but not on all 8.
I see 75.7% on 1.6 GHz pe1950 dual quad-core Clovertowns with 4g ram.

these Woodcrests and Clovertown chips seem to lack memory bandwidth
when trying to feed 4 flops/cycle, and also probably suffer bus
contention due to the fairly pathetic Intel shared memory bus
architecture. large caches only get you so far...

having said that, you might get a better result with the threaded
GotoBLAS - it generally performs better on a single node (but sometimes
not for a large number of nodes). also NB of 192 or 256. the rest of
your settings look fine.

cheers,
robin



More information about the Linux-PowerEdge mailing list