Thermal issues with SC1435 servers??

Cris Rhea crhea at
Mon Apr 19 15:55:34 CDT 2010

On Mon, Apr 19, 2010 at 02:53:21PM -0500, Wayne_Weilnau at wrote:
> Chris,
> I've seen other HPC customers who have had thermal issues, 
> especially with systems in the top of the rack.  If you have any 
> leakage of air from the hot aisle into the cold aisle, it would be 
> possible the inlet (ambient) temperature for a system could be higher 
> than you realize.  I don't have a 1435 and don't have the specs in 
> front of me, but I would think 71F is within the operating range of the 
> system.  If not, it is barely outside the operating range.  I believe 
> the 1435 has a Baseboard Management Controller (BMC) that records 
> hardware events into the System Event Log (SEL).  You should be able 
> to view the SEL during POST by pressing CTRL-E.  You can also view the 
> SEL through IPMI Tool or OMSA.  I would check the SEL for any 
> events, especially for thermal sensors.
> Wayne Weilnau
> Systems Management Technologist
> Dell | OpenManage Software Development 

Systems are from bottom to top of rack... yes, our hot/cold aisle
stuff is a bit sloppy (don't have under-floor cold air), but I figured
I'd have a pattern as you suggest (e.g., systems at top of rack). 
Temp reading is low-tech thermometer on front door of rack at eye-level. 

The only place I see these errors is in the SEL.  Upon powering the 
machines back up, I do the CTRL-E and look at the SEL. I get simple 
messages like "CPUx thermal tripped asserted". 

I've taken one system apart and re-done the thermal goo between the
CPU/heatsink. Didn't help. Replaced the MB and it has behaved since then.

Perhaps, if this isn't a common problem, I really do just have 8 more
systems that have bad thermal sensors on the MB. 

-- Cris

 Cristopher J. Rhea                     
 Mayo Clinic - Research Computing Facility
 200 First St SW, Rochester, MN 55905
 crhea at Mayo.EDU
 (507) 284-0587

More information about the Linux-PowerEdge mailing list