[Linux-PowerEdge] 13th gen server stuck in cpu reset loop with new chips (PSU issue?)

vincent at cojot.name vincent at cojot.name
Fri Jul 20 15:54:56 CDT 2018


Quick follow-up to this developping story.

I purchased a used pair of E5-2650-V4 chips from someone who had upgraded 
his R730, and things went like this:

1) When I put the cpus in the server and tried to turn it on, it didn't.

2) I disconnected the power cable and waited about 15 secs, it didn't want 
to power up. The iDRAC was up but the server wouldn't turn on.

3) Becoming desperate at this point, I unplugged the server for two 
minutes and when I plugged it back, it turned itself on and booted..!

I'm guessing some power distribution component needed to be drained of all 
power left before it would accept to be reconfigured. Fun stuff, what a 
scare (even on ebay, those V4 chips weren't exactly cheap).

As an aside, I ran Linpack on the machine and the power consumption 
stabilized around 350W (as seen from the iDRAC and from 'ipmitool sdr list 
full') so I guess my 750W PSU will still be fine..

The 2650 V4's weren't my first choice, I would have wanted a pair of 2658 
V4's but I was worried that I'd have greater chances of running into 
trouble (the 2650's were listed for the T430, not the 2658's).

Vincent

On Mon, 25 Jun 2018, vincent at cojot.name wrote:

>
> Hi everyone,
>
> I have a Poweredge T430 that has been running fine for the past two years.
>
> I've run into an issue upgrading the cpus and although I have submitted an SR 
> to dell, I would like to ask for a second opinion.
>
> The current configuration is as follows:
> - 2*E5-2620 V3 (85W)
> - Qlogic dual 8Gbps FC HBA
> - PERC H730P
> - 4*1GBps Intel i350-T4
> - Small NVidia Geforce (50W)
> - 750W Platinum PSU
>
> I got a good deal on two E5-2697V3 chips (145W) locally and that's when my 
> problems started:
>
> - With 2*E5-2697 V3, the iDrac works fine, the system powers on (nothing on 
> the screen), sets the fans to max speed and the LC log shows that the system 
> is stuck in a resetting loop caused by cpu0:
> CPU0000: Internal error has occurred check for additional logs.
> SYS1003: System CPU Resetting.
> RAC0703: Requested system hardreset.
>
> - With 1*E5-2697 V3 (the cpu formerly placed in the cpu0 socket), The system 
> boots fine and I can run LINPack on it without issues.
>
> - When I put the 2*E5-2620V3 back into the system, the system worked fine 
> too.
>
> I am wondering if I may have either a dead cpu (one of the E5-2697's) or a 
> PSU that's not providing sufficient power or something else..
>
> I'd like to have additionnal guidance as I would like to minimze physical 
> operations into the chassis.. Have anyone seen anything like this before?
>
> Do I need to upgrade my PSU to 1100W?
>
> Thanks,
>
> Vincent
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
>



More information about the Linux-PowerEdge mailing list