PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly

Patrick_Boyd at Dell.com Patrick_Boyd at Dell.com
Mon Oct 1 10:41:36 CDT 2007


If you have SAS drives you have to use the SAS 5/iR. The motherboard
will control SATA drives independently of the SAS 5/iR.

 

Can you tell me how you have the drives configured? Output from fdisk -l
would be ideal.

 

Thanks,

Patrick Boyd

 

From: Jobe Bittman [mailto:jbittman at chewcorp.com] 
Sent: Monday, October 01, 2007 10:39 AM
To: Boyd, Patrick
Cc: linux-poweredge-Lists
Subject: Re: PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly

 

4.00.00.01 mptlinux from dell. i also got a similar crash with te centos
5 supplied mptlinux. there is soemthing wrong with this controller i
think. If I enable SAS in BIOS do I need to open up the server to
configure it as 2 separate disks and use linux software raid? 

On 10/1/07, Patrick_Boyd at dell.com <Patrick_Boyd at dell.com> wrote: 

1)      There is no caching on the SAS 5/iR controllers. Therefore it
will always be write-through.

2)      What version of the driver are you using?

 

From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Jobe Bittman
Sent: Friday, September 28, 2007 11:30 PM
To: linux-poweredge-Lists
Subject: PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly

 

I am having issues with the PowerEdge 860 SAS5/iR controller. I am
running Centos5 64bit and running latest update kernel
2.6.18-8.1.14.el5. I have 2 72G drives striped. I started out using the
linux supplied driver but the dmesg always showed that I write-through
caching was being used. After installing OMSA 5.2 from the Dell hw/sw
repos, i discovered the linux raid driver was hanging and crashing when
attempting to connect to the OMSA web interface. I reloaded the machine
and tried installing the mptlinux driver from the dell repo. It seemed
to work great for the day. I even saw that write-back caching was
working. But now I'm running into issues while running bonnie++ to
benchmark my io. The errors in /var/log/messages are below. I didnt
capture the error with the linux driver but it was very similar. 

Has anyone run into this?


Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: attempting task
abort! (sc=ffff810051291e40)
Sep 28 21:07:26 san1-test1 kernel: sd 0:1:0:0: 
Sep 28 21:07:26 san1-test1 kernel:         command: Write(10): 2a 00 01
d4 1a 0a 00 01 40 00 
Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler
for type=1: IOC Not operational (0x40001600)!
Sep 28 21:07:26 san1-test1 kernel:  Issuing HardReset!!
Sep 28 21:07:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery 
Sep 28 21:07:26 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state!!!
Sep 28 21:07:26 san1-test1 kernel:            FAULT code = 1600h
Sep 28 21:07:28 san1-test1 kernel: mptbase: ioc0: Recovered from IOC
FAULT 
Sep 28 21:07:42 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
(sc=ffff810051291e40)
Sep 28 21:07:43 san1-test1 kernel: mptscsih: ioc0: attempting target
reset! (sc=ffff810051291e40)
Sep 28 21:07:43 san1-test1 kernel: sd 0:1:0:0: 
Sep 28 21:07:43 san1-test1 kernel:         command: Write(10): 2a 00 01
d4 1a 0a 00 01 40 00
Sep 28 21:07:45 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
(sc=ffff810051291e40)
Sep 28 21:09:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery 
Sep 28 21:09:36 san1-test1 kernel: BUG: soft lockup detected on CPU#0!
Sep 28 21:09:36 san1-test1 kernel: 
Sep 28 21:09:36 san1-test1 kernel: Call Trace:
Sep 28 21:09:36 san1-test1 kernel:  <IRQ>  [<ffffffff800b2c30>]
softlockup_tick+0xdb/0xed 
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff800933ec>]
update_process_times+0x42/0x68
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff80073d61>]
smp_local_timer_interrupt+0x23/0x47
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff80074423>]
smp_apic_timer_interrupt+0x41/0x47 
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff8005bcc2>]
apic_timer_interrupt+0x66/0x6c
Sep 28 21:09:36 san1-test1 kernel:  <EOI>  [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff880c2e4d>]
:mptbase:WaitForDoorbellInt+0x5b/0x86 
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff880c3023>]
:mptbase:mpt_handshake_req_reply_wait+0x138/0x296
Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:11:00 san1-test1 kernel:  [<ffffffff880c39df>]
:mptbase:SendIocInit+0x229/0x310 
Sep 28 21:11:01 san1-test1 shutdown[12201]: shutting down for system
reboot
Sep 28 21:11:17 san1-test1 kernel:  [<ffffffff880c33a7>]
:mptbase:GetIocFacts+0x7e/0x2d6
Sep 28 21:12:07 san1-test1 init: Switching to runlevel: 6 
Sep 28 21:12:35 san1-test1 kernel:  [<ffffffff880c459f>]
:mptbase:MakeIocReady+0x635/0xa29
Sep 28 21:12:37 san1-test1 kernel:  [<ffffffff880c71f6>]
:mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
Sep 28 21:12:38 san1-test1 kernel:  [<ffffffff80072a51>]
smp_send_reschedule+0x4e/0x53 
Sep 28 21:12:38 san1-test1 kernel:  [<ffffffff8013b1b2>]
__next_cpu+0x19/0x28
Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff800857cf>]
find_busiest_group+0x20d/0x621
Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff8006290e>]
__kprobes_text_start+0xfe/0x230 
Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff800627d1>]
__reacquire_kernel_lock+0x2c/0x45
Sep 28 21:12:39 san1-test1 shutdown[12243]: shutting down for system
reboot
Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff80060b5f>]
thread_return+0xb7/0xea 
Sep 28 21:12:40 san1-test1 kernel:  [<ffffffff880c72e7>]
:mptbase:mpt_HardResetHandler+0xb1/0x109
Sep 28 21:12:40 san1-test1 kernel:  [<ffffffff88220df1>]
:mptctl:mptctl_timeout_expired+0x1b4/0x1dc
Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff800613bf>]
schedule_timeout+0x92/0xad 
Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff80092e02>]
process_timeout+0x0/0x5
Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff882225ce>]
:mptctl:mptctl_do_mpt_command+0x7b6/0x998
Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8009b681>]
autoremove_wake_function+0x0/0x2e 
Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff882290cb>]
:mptctl:compat_mpctl_ioctl+0x230/0x31f
Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8822903b>]
:mptctl:compat_mpctl_ioctl+0x1a0/0x31f
Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff800e8cb8>]
compat_sys_ioctl+0xc5/0x2b1 
Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8005f013>]
sysenter_do_call+0x1b/0x67
Sep 28 21:12:58 san1-test1 kernel: 
Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: attempting task
abort! (sc=ffff8101005429c0) 
Sep 28 21:12:59 san1-test1 kernel: sd 0:1:0:0: 
Sep 28 21:12:59 san1-test1 kernel:         command: Write(10): 2a 00 02
2a df 8a 00 01 40 00
Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler
for type=1: IOC Not operational (0x40001600)! 
Sep 28 21:12:59 san1-test1 kernel:  Issuing HardReset!!
Sep 28 21:12:59 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:12:59 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state!!!
Sep 28 21:13:00 san1-test1 kernel:            FAULT code = 1600h
Sep 28 21:13:00 san1-test1 kernel: mptbase: ioc0: Recovered from IOC
FAULT
Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
(sc=ffff8101005429c0) 
Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: attempting target
reset! (sc=ffff8101005429c0)
Sep 28 21:13:00 san1-test1 kernel: sd 0:1:0:0: 
Sep 28 21:13:01 san1-test1 kernel:         command: Write(10): 2a 00 02
2a df 8a 00 01 40 00 
Sep 28 21:13:01 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
(sc=ffff8101005429c0)
Sep 28 21:13:01 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:13:01 san1-test1 kernel: BUG: soft lockup detected on CPU#0! 
Sep 28 21:13:01 san1-test1 kernel: 
Sep 28 21:13:01 san1-test1 kernel: Call Trace:
Sep 28 21:13:01 san1-test1 kernel:  <IRQ>  [<ffffffff800b2c30>]
softlockup_tick+0xdb/0xed
Sep 28 21:13:01 san1-test1 kernel:  [<ffffffff800933ec>]
update_process_times+0x42/0x68 
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff80073d61>]
smp_local_timer_interrupt+0x23/0x47
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff80074423>]
smp_apic_timer_interrupt+0x41/0x47
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff8005bcc2>]
apic_timer_interrupt+0x66/0x6c 
Sep 28 21:13:02 san1-test1 kernel:  <EOI>  [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff880c2e4d>]
:mptbase:WaitForDoorbellInt+0x5b/0x86
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff880c3023>]
:mptbase:mpt_handshake_req_reply_wait+0x138/0x296 
Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c39df>]
:mptbase:SendIocInit+0x229/0x310
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c33a7>]
:mptbase:GetIocFacts+0x7e/0x2d6 
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c459f>]
:mptbase:MakeIocReady+0x635/0xa29
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c71f6>]
:mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff80072a51>]
smp_send_reschedule+0x4e/0x53 
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff8011735a>]
avc_has_perm+0x43/0x55
Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff80117a1b>]
ipc_has_perm+0x59/0x67
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff8006290e>]
__kprobes_text_start+0xfe/0x230 
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800862e7>]
dequeue_task+0x18/0x37
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800627d1>]
__reacquire_kernel_lock+0x2c/0x45
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff80060b5f>]
thread_return+0xb7/0xea 
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff880c72e7>]
:mptbase:mpt_HardResetHandler+0xb1/0x109
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff88220df1>]
:mptctl:mptctl_timeout_expired+0x1b4/0x1dc
Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800613bf>]
schedule_timeout+0x92/0xad 
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff80092e02>]
process_timeout+0x0/0x5
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff882225ce>]
:mptctl:mptctl_do_mpt_command+0x7b6/0x998
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8009b681>]
autoremove_wake_function+0x0/0x2e 
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8002dd9c>]
__wake_up+0x38/0x4f
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff882290cb>]
:mptctl:compat_mpctl_ioctl+0x230/0x31f
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8822903b>]
:mptctl:compat_mpctl_ioctl+0x1a0/0x31f 
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff800e8cb8>]
compat_sys_ioctl+0xc5/0x2b1
Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8005f013>]
sysenter_do_call+0x1b/0x67

-- 
Jobe Bittman




-- 
Jobe Bittman
Chief Network Architect
Stage6 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20071001/d8c1c68b/attachment.htm 


More information about the Linux-PowerEdge mailing list