Demand based switching with RHEL4 on PE 1850 and 1950 servers

Robin Humble rjh+dellpe at cita.utoronto.ca
Tue Apr 10 05:06:24 CDT 2007


On Tue, Apr 10, 2007 at 05:10:57AM -0400, Kuba Ober wrote:
>On Tuesday 10 April 2007, Billinghurst, David (RTATECH) wrote:
>> As part of our corporate energy efficiency program, I want to
>> reduce the energy consumption of our 1850 and 1950 servers.
>>
>> The machines are compute servers running large mathematical models.
>> They are either flat out for days at a time, or idle.

with regards to the Dell white paper, it's pretty easy to put a 'cpus to
full power' in a PBS prologue script, and a 'cpus to min speed' or 'cpus
to ondemand' into an epilogue script. it's just an echo into a /sys file
for each core.

>I presume there must be some software running that will allocate jobs to those 
>servers, in the know of how many compute nodes are needed at any given time. 

you'd hope so. are the machine running a queing system? are they
installed with OSCAR or Rocks or similar?

>It's a trivial enough thing to get those nodes to power up and down as needed 
>from within the node allocation framework, as long as the framework allows 
>nodes to be dynamically added/removed.

yeah, power-off is better than just spinning down disks and cpus.
smallish machines are still maybe 70-100W at idle with everything going
slow, and can be 2 or 3x that at peak power usage.

in a torque/maui setup (eg. OSCAR), you could pretty easily write a
script that sat watching how many jobs were queued and how many nodes
idle and either shutdown (in c3 terms, cexec :<nodenum> shutdown -h now)
or powered nodes on via IPMI.
PBS (torque) and maui (scheduler) are happy to have nodes come and go so
that part's not a drama.

I think the main problem would be making sure that newly booted nodes
came up properly, with all daemons alive, all memory and cpus
available, all filesystems mounted etc. before they ran pbs_mom and
made themselves available for jobs. it would be bad if jobs started
onto 1/2 alive nodes and then crashed immediately. so a small suite of
sanity checking scripts would be required. presumably the on/off daemon
or the newly powered on node itself could run these.

a set of heuristics to stop nodes being powered on and off too
frequently would be nice too. presumably this sort of cycling is bad for
the hardware ????

ultimately the on/off daemon belongs in the queueing software which
could call external scripts for boot, check/online, and shutdown, but
as a first cut something external in python would be fine. it would be
a nice little project to write a daemon like that...

cheers,
robin



More information about the Linux-PowerEdge mailing list