PE1950 fan speed

Tom Rockwell rockwell at pa.msu.edu
Tue Feb 5 11:23:26 CST 2008


Per discussions on this list and my own experience with pe1950s, the fan 
speed is controlled by the system firmware and there is no way for the 
user to modify the control parameters.  Intel and Dell specified and 
engineered the CPU and system cooling, I don't see any reason to think 
that the cooling is not sufficient.  On the contrary, it is easy for me 
to believe that a steady CPU temp (even if "high") is better for CPU 
longevity than a CPU temp that cycles up and down a lot with load.   It 
seems that these systems try to keep the exit air temp (and hence CPU 
die temp) fairly constant.

On racks full of 1950s, we use the APC exhaust fan doors.  These capture 
the hot air and send it directly into the air return plenum.  They also 
pull a little bit more air through the rack (especially the gaps around 
nodes).  We have pe1950s with the x5355 CPUs and 16GB ram --- this is a 
pretty high power combo.  They run 100% CPU load 24/7 and the fans never 
go above the base speed.  This seems fine to me.  The only issue that I 
worry about with these is making sure that hot exhaust air isn't 
ingested by other computers in the room...  Increasing system fan speed 
is probably going to increase power consumption and I doubt reliability 
will be increased at all (note that I expect to replace these systems in 
3-4 years because of speed improvements...).

If the environment the systems is used in isn't ideal (to hot) or there 
are fan failures, the fan speed will increase to compensate.

Cheers,
Tom

Vernon A. Fort wrote:
> michalwd1979 wrote:
>   
>> Hi Vernon,
>> I am not 100% sure but I think that fan speed is set by BMC and you can not do anything about it. Simple test if fan speed control is working is to pull out one fan when server is running. For me (PE2850 running gentoo) this speeds-up other fans. When I putted it  back fans gradually slow down to normal speed. Can you check temperature of the CPUs using lm_sensors or openmanage or anything? In 2850 dell sets CPU warning temp to 120 deg, and maybe it assumes that running hot is normal for the system. 
>> Anyway when I made a crash-test (a few compilations and a few cat /dev/urandom >/dev/null) some time ago I got about 62deg max on CPUs but I think this is because of completely changed cooling system.
>>
>> Regards,
>> Michael
>>
>>
>> Dnia 5 lutego 2008 3:46 "Vernon A. Fort" <vfort at provident-solutions.com> napisał(a):
>>
>>   
>>     
>>> I have a PE1950 running gentoo.  During a cpu stress test, we expected 
>>> the fan speed to go up but it did not.  We had the cpuload around 25.0 
>>> and when i put my hand at the back of the server, it was very hot to the 
>>> touch.  I have looked at the lm_sensors and the openipmi but i did not 
>>> find anything related to controlling the fanspeed, i.e. making it go 
>>> faster when under heavy load.
>>>
>>> Could someone point me to some good reading.  Basically - i need to 
>>> monitor the server and have something that adjusts accordingly
>>>     
>>>       
> After further reading, I enable some additional kernel features and 
> installed ipmitool.  Why is the Temp disabled?  I obviouslly have 
> something mis-configured in teh bios setup - any pointers....
>
> Vernon
>
> newcs ~ # ipmitool -I open sdr
> Temp             | disabled          | ns
> Temp             | disabled          | ns
> Temp             | disabled          | ns
> Temp             | disabled          | ns
> Ambient Temp     | 26 degrees C      | ok
> CMOS Battery     | 0x00              | ok
> ROMB Battery     | 0x00              | ok
> VCORE            | 0x01              | ok
> VCORE            | 0x01              | ok
> CPU VTT          | 0x01              | ok
> 1.5V PG          | 0x01              | ok
> 1.8V PG          | 0x01              | ok
> 3.3V PG          | 0x01              | ok
> 5V PG            | 0x01              | ok
> 1.5V PXH PG      | 0x01              | ok
> 5V Riser PG      | 0x01              | ok
> Backplane PG     | 0x01              | ok
> Linear PG        | 0x01              | ok
> 0.9V PG          | 0x01              | ok
> 0.9V Over Volt   | 0x01              | ok
> CPU Power Fault  | 0x01              | ok
> FAN MOD 1A RPM   | 7125 RPM          | ok
> FAN MOD 1B RPM   | 7350 RPM          | ok
> FAN MOD 1C RPM   | 4650 RPM          | ok
> FAN MOD 1D RPM   | 4575 RPM          | ok
> FAN MOD 2A RPM   | 7575 RPM          | ok
> FAN MOD 2B RPM   | 7650 RPM          | ok
> FAN MOD 2C RPM   | 4650 RPM          | ok
> FAN MOD 2D RPM   | 4800 RPM          | ok
> FAN MOD 3A RPM   | 8025 RPM          | ok
> FAN MOD 3B RPM   | 7425 RPM          | ok
> FAN MOD 3C RPM   | 4875 RPM          | ok
> FAN MOD 3D RPM   | 4800 RPM          | ok
> FAN MOD 4A RPM   | 7725 RPM          | ok
> FAN MOD 4B RPM   | 7650 RPM          | ok
> FAN MOD 4C RPM   | 4800 RPM          | ok
> FAN MOD 4D RPM   | 4950 RPM          | ok
> Presence         | 0x01              | ok
> Presence         | 0x01              | ok
> Presence         | 0x01              | ok
> Presence         | 0x01              | ok
> Presence         | 0x01              | ok
> Presence         | 0x01              | ok
> DRAC5 Conn 2 Cbl | Not Readable      | ns
> PFault Fail Safe | Not Readable      | ns
> Status           | 0x80              | ok
> Status           | 0x80              | ok
> Status           | 0x01              | ok
> Status           | 0x0b              | ok
> Status           | 0x01              | ok
> RAC Status       | 0x00              | ok
> OS Watchdog      | 0x00              | ok
> SEL              | Not Readable      | ns
> Intrusion        | 0x00              | ok
> PS Redundancy    | 0x02              | ok
> Fan Redundancy   | 0x01              | ok
> CPU Temp Interf  | Not Readable      | ns
> Drive            | 0x01              | ok
> Cable SAS A      | 0x01              | ok
> ECC Corr Err     | Not Readable      | ns
> ECC Uncorr Err   | Not Readable      | ns
> I/O Channel Chk  | Not Readable      | ns
> PCI Parity Err   | Not Readable      | ns
> PCI System Err   | Not Readable      | ns
> SBE Log Disabled | Not Readable      | ns
> Logging Disabled | Not Readable      | ns
> Unknown          | 0xc0              | ok
> CPU Protocol Err | Not Readable      | ns
> CPU Bus PERR     | Not Readable      | ns
> CPU Init Err     | Not Readable      | ns
> CPU Machine Chk  | Not Readable      | ns
> Memory Spared    | 0x00              | ok
> Memory Mirrored  | 0x01              | ok
> Memory RAID      | 0x01              | ok
> Memory Added     | Not Readable      | ns
> Memory Removed   | Not Readable      | ns
> Memory Cfg Err   | 0x01              | ok
> Mem Redun Gain   | 0x01              | ok
> PCIE Fatal Err   | 0x01              | ok
> Chipset Err      | 0x01              | ok
> Err Reg Pointer  | 0x01              | ok
> Mem ECC Warning  | 0x01              | ok
> Mem CRC Err      | 0x01              | ok
> USB Over-current | 0x01              | ok
> POST Err         | Not Readable      | ns
> Hdwr version err | Not Readable      | ns
> Mem Overtemp     | 0x01              | ok
> Mem Fatal SB CRC | 0x01              | ok
> Mem Fatal NB CRC | 0x01              | ok
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
>   



More information about the Linux-PowerEdge mailing list