Clovertown HPL scores

Kilian CAVALOTTI kilian.cavalotti at lip6.fr
Tue Mar 27 15:20:41 CST 2007


On Monday 26 March 2007 08:30:08 pm you wrote:
> I'd say 70-80% is the best you could expect.
> on a single core you can expect close to 90%, but not on all 8.

Well, I tried on a single core, with P = Q = 1 and 'mpirun -np 1', and I 
only get 78% efficiency, which is not exactly impressive.

> I see 75.7% on 1.6 GHz pe1950 dual quad-core Clovertowns with 4g ram.

That's almost 10% better that what I can get for 8 jobs. Do you think the 
MPI implementation/compiler is at fault?

> these Woodcrests and Clovertown chips seem to lack memory bandwidth
> when trying to feed 4 flops/cycle, and also probably suffer bus
> contention due to the fairly pathetic Intel shared memory bus
> architecture. large caches only get you so far...
>
> having said that, you might get a better result with the threaded
> GotoBLAS - it generally performs better on a single node (but sometimes
> not for a large number of nodes). also NB of 192 or 256. the rest of
> your settings look fine.

Thanks for the info, I'll continue to try to find better parameters.

Cheers,
-- 
Kilian



More information about the Linux-PowerEdge mailing list