Clovertown HPL scores
Kilian CAVALOTTI
kilian.cavalotti at lip6.fr
Tue Mar 27 15:20:41 CST 2007
On Monday 26 March 2007 08:30:08 pm you wrote:
> I'd say 70-80% is the best you could expect.
> on a single core you can expect close to 90%, but not on all 8.
Well, I tried on a single core, with P = Q = 1 and 'mpirun -np 1', and I
only get 78% efficiency, which is not exactly impressive.
> I see 75.7% on 1.6 GHz pe1950 dual quad-core Clovertowns with 4g ram.
That's almost 10% better that what I can get for 8 jobs. Do you think the
MPI implementation/compiler is at fault?
> these Woodcrests and Clovertown chips seem to lack memory bandwidth
> when trying to feed 4 flops/cycle, and also probably suffer bus
> contention due to the fairly pathetic Intel shared memory bus
> architecture. large caches only get you so far...
>
> having said that, you might get a better result with the threaded
> GotoBLAS - it generally performs better on a single node (but sometimes
> not for a large number of nodes). also NB of 192 or 256. the rest of
> your settings look fine.
Thanks for the info, I'll continue to try to find better parameters.
Cheers,
--
Kilian
More information about the Linux-PowerEdge
mailing list