Clovertown HPL scores

Kilian CAVALOTTI kilian.cavalotti at lip6.fr
Mon Mar 26 20:38:51 CST 2007


Hi all,

I don't know if that's the right place to ask, but I guess some of the 
readers of this mailing-list have a good knowledge of HPC benchmarking.

I'm trying to evaluate our cluster, and began to launch some Linpack runs. 
Before scaling to the full-range cluster, I'd like to get the best 
efficiency out of one single host. They are PowerEdge 1950, with two E5345 
(Clovertown, quad-core, 2.33GHz) and 16GB memory each. So, if I made no 
mistake, for a single host, the theoretical performance should be:
2 (CPUs) x 4 (cores) x 4 (ops/cycle) x 2.33G (cycles/s) = 74.56 Gflop/s

I compiled xhpl against the GotoBLAS library, and use LAM MPI. I played a 
little bit with HPL.dat values (see below), and experimentally, 
the best score I get, running 8 jobs on the same host, is 52 Gflops/s. 
That's about 70% efficiency, which seems a little low to me. I would have 
expected something more in the 80-90% range. 

Is 70% efficiency reasonable for local jobs? Or should I try to get more? 
And if so, what would you advise to improve the score?

-----------------------------------------------------------------------
# HPLinpack benchmark input file
# Innovative Computing Laboratory, University of Tennessee
HPL.out     output file name (if any)
6           device out (6=stdout,7=stderr,file)
1           # of problems sizes (N)
44000       Ns
1           # of NBs
200         NBs
0           PMAP process mapping (0=Row-,1=Column-major)
1           # of process grids (P x Q)
2           Ps
4           Qs
16.0        threshold
1           # of panel fact
2           PFACTs (0=left, 1=Crout, 2=Right)
1           # of recursive stopping criterium
8           NBMINs (>= 1)
1           # of panels in recursion
2           NDIVs
1           # of recursive panel fact.
0           RFACTs (0=left, 1=Crout, 2=Right)
1           # of broadcast
1           BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1           # of lookahead depth
0           DEPTHs (>=0)
0           SWAP (0=bin-exch,1=long,2=mix)
64          swapping threshold
0           L1 in (0=transposed,1=no-transposed) form
0           U in (0=transposed,1=no-transposed) form
1           Equilibration (0=no,1=yes)
8           memory alignment in double (> 0)
-----------------------------------------------------------------------

Thanks for any hint you could provide,
-- 
Kilian



More information about the Linux-PowerEdge mailing list