PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly
Jobe Bittman
jbittman at chewcorp.com
Fri Sep 28 23:29:57 CDT 2007
I am having issues with the PowerEdge 860 SAS5/iR controller. I am running
Centos5 64bit and running latest update kernel 2.6.18-8.1.14.el5. I have 2
72G drives striped. I started out using the linux supplied driver but the
dmesg always showed that I write-through caching was being used. After
installing OMSA 5.2 from the Dell hw/sw repos, i discovered the linux raid
driver was hanging and crashing when attempting to connect to the OMSA web
interface. I reloaded the machine and tried installing the mptlinux driver
from the dell repo. It seemed to work great for the day. I even saw that
write-back caching was working. But now I'm running into issues while
running bonnie++ to benchmark my io. The errors in /var/log/messages are
below. I didnt capture the error with the linux driver but it was very
similar.
Has anyone run into this?
Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff810051291e40)
Sep 28 21:07:26 san1-test1 kernel: sd 0:1:0:0:
Sep 28 21:07:26 san1-test1 kernel: command: Write(10): 2a 00 01 d4
1a 0a 00 01 40 00
Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler for
type=1: IOC Not operational (0x40001600)!
Sep 28 21:07:26 san1-test1 kernel: Issuing HardReset!!
Sep 28 21:07:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:07:26 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT
state!!!
Sep 28 21:07:26 san1-test1 kernel: FAULT code = 1600h
Sep 28 21:07:28 san1-test1 kernel: mptbase: ioc0: Recovered from IOC FAULT
Sep 28 21:07:42 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
(sc=ffff810051291e40)
Sep 28 21:07:43 san1-test1 kernel: mptscsih: ioc0: attempting target reset!
(sc=ffff810051291e40)
Sep 28 21:07:43 san1-test1 kernel: sd 0:1:0:0:
Sep 28 21:07:43 san1-test1 kernel: command: Write(10): 2a 00 01 d4
1a 0a 00 01 40 00
Sep 28 21:07:45 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
(sc=ffff810051291e40)
Sep 28 21:09:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:09:36 san1-test1 kernel: BUG: soft lockup detected on CPU#0!
Sep 28 21:09:36 san1-test1 kernel:
Sep 28 21:09:36 san1-test1 kernel: Call Trace:
Sep 28 21:09:36 san1-test1 kernel: <IRQ> [<ffffffff800b2c30>]
softlockup_tick+0xdb/0xed
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff800933ec>]
update_process_times+0x42/0x68
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff80073d61>]
smp_local_timer_interrupt+0x23/0x47
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff80074423>]
smp_apic_timer_interrupt+0x41/0x47
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff8005bcc2>]
apic_timer_interrupt+0x66/0x6c
Sep 28 21:09:36 san1-test1 kernel: <EOI> [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff880c2e4d>]
:mptbase:WaitForDoorbellInt+0x5b/0x86
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff880c3023>]
:mptbase:mpt_handshake_req_reply_wait+0x138/0x296
Sep 28 21:09:36 san1-test1 kernel: [<ffffffff8000c4d2>] __delay+0x8/0x10
Sep 28 21:11:00 san1-test1 kernel: [<ffffffff880c39df>]
:mptbase:SendIocInit+0x229/0x310
Sep 28 21:11:01 san1-test1 shutdown[12201]: shutting down for system reboot
Sep 28 21:11:17 san1-test1 kernel: [<ffffffff880c33a7>]
:mptbase:GetIocFacts+0x7e/0x2d6
Sep 28 21:12:07 san1-test1 init: Switching to runlevel: 6
Sep 28 21:12:35 san1-test1 kernel: [<ffffffff880c459f>]
:mptbase:MakeIocReady+0x635/0xa29
Sep 28 21:12:37 san1-test1 kernel: [<ffffffff880c71f6>]
:mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
Sep 28 21:12:38 san1-test1 kernel: [<ffffffff80072a51>]
smp_send_reschedule+0x4e/0x53
Sep 28 21:12:38 san1-test1 kernel: [<ffffffff8013b1b2>]
__next_cpu+0x19/0x28
Sep 28 21:12:39 san1-test1 kernel: [<ffffffff800857cf>]
find_busiest_group+0x20d/0x621
Sep 28 21:12:39 san1-test1 kernel: [<ffffffff8006290e>]
__kprobes_text_start+0xfe/0x230
Sep 28 21:12:39 san1-test1 kernel: [<ffffffff800627d1>]
__reacquire_kernel_lock+0x2c/0x45
Sep 28 21:12:39 san1-test1 shutdown[12243]: shutting down for system reboot
Sep 28 21:12:39 san1-test1 kernel: [<ffffffff80060b5f>]
thread_return+0xb7/0xea
Sep 28 21:12:40 san1-test1 kernel: [<ffffffff880c72e7>]
:mptbase:mpt_HardResetHandler+0xb1/0x109
Sep 28 21:12:40 san1-test1 kernel: [<ffffffff88220df1>]
:mptctl:mptctl_timeout_expired+0x1b4/0x1dc
Sep 28 21:12:41 san1-test1 kernel: [<ffffffff800613bf>]
schedule_timeout+0x92/0xad
Sep 28 21:12:41 san1-test1 kernel: [<ffffffff80092e02>]
process_timeout+0x0/0x5
Sep 28 21:12:41 san1-test1 kernel: [<ffffffff882225ce>]
:mptctl:mptctl_do_mpt_command+0x7b6/0x998
Sep 28 21:12:42 san1-test1 kernel: [<ffffffff8009b681>]
autoremove_wake_function+0x0/0x2e
Sep 28 21:12:42 san1-test1 kernel: [<ffffffff882290cb>]
:mptctl:compat_mpctl_ioctl+0x230/0x31f
Sep 28 21:12:42 san1-test1 kernel: [<ffffffff8822903b>]
:mptctl:compat_mpctl_ioctl+0x1a0/0x31f
Sep 28 21:12:42 san1-test1 kernel: [<ffffffff800e8cb8>]
compat_sys_ioctl+0xc5/0x2b1
Sep 28 21:12:42 san1-test1 kernel: [<ffffffff8005f013>]
sysenter_do_call+0x1b/0x67
Sep 28 21:12:58 san1-test1 kernel:
Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: attempting task abort!
(sc=ffff8101005429c0)
Sep 28 21:12:59 san1-test1 kernel: sd 0:1:0:0:
Sep 28 21:12:59 san1-test1 kernel: command: Write(10): 2a 00 02 2a
df 8a 00 01 40 00
Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler for
type=1: IOC Not operational (0x40001600)!
Sep 28 21:12:59 san1-test1 kernel: Issuing HardReset!!
Sep 28 21:12:59 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:12:59 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT
state!!!
Sep 28 21:13:00 san1-test1 kernel: FAULT code = 1600h
Sep 28 21:13:00 san1-test1 kernel: mptbase: ioc0: Recovered from IOC FAULT
Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
(sc=ffff8101005429c0)
Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: attempting target reset!
(sc=ffff8101005429c0)
Sep 28 21:13:00 san1-test1 kernel: sd 0:1:0:0:
Sep 28 21:13:01 san1-test1 kernel: command: Write(10): 2a 00 02 2a
df 8a 00 01 40 00
Sep 28 21:13:01 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
(sc=ffff8101005429c0)
Sep 28 21:13:01 san1-test1 kernel: mptbase: Initiating ioc0 recovery
Sep 28 21:13:01 san1-test1 kernel: BUG: soft lockup detected on CPU#0!
Sep 28 21:13:01 san1-test1 kernel:
Sep 28 21:13:01 san1-test1 kernel: Call Trace:
Sep 28 21:13:01 san1-test1 kernel: <IRQ> [<ffffffff800b2c30>]
softlockup_tick+0xdb/0xed
Sep 28 21:13:01 san1-test1 kernel: [<ffffffff800933ec>]
update_process_times+0x42/0x68
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff80073d61>]
smp_local_timer_interrupt+0x23/0x47
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff80074423>]
smp_apic_timer_interrupt+0x41/0x47
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff8005bcc2>]
apic_timer_interrupt+0x66/0x6c
Sep 28 21:13:02 san1-test1 kernel: <EOI> [<ffffffff8000c4d2>]
__delay+0x8/0x10
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff880c2e4d>]
:mptbase:WaitForDoorbellInt+0x5b/0x86
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff880c3023>]
:mptbase:mpt_handshake_req_reply_wait+0x138/0x296
Sep 28 21:13:02 san1-test1 kernel: [<ffffffff8000c4d2>] __delay+0x8/0x10
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff880c39df>]
:mptbase:SendIocInit+0x229/0x310
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff880c33a7>]
:mptbase:GetIocFacts+0x7e/0x2d6
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff880c459f>]
:mptbase:MakeIocReady+0x635/0xa29
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff880c71f6>]
:mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff80072a51>]
smp_send_reschedule+0x4e/0x53
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff8011735a>]
avc_has_perm+0x43/0x55
Sep 28 21:13:03 san1-test1 kernel: [<ffffffff80117a1b>]
ipc_has_perm+0x59/0x67
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff8006290e>]
__kprobes_text_start+0xfe/0x230
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff800862e7>]
dequeue_task+0x18/0x37
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff800627d1>]
__reacquire_kernel_lock+0x2c/0x45
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff80060b5f>]
thread_return+0xb7/0xea
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff880c72e7>]
:mptbase:mpt_HardResetHandler+0xb1/0x109
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff88220df1>]
:mptctl:mptctl_timeout_expired+0x1b4/0x1dc
Sep 28 21:13:04 san1-test1 kernel: [<ffffffff800613bf>]
schedule_timeout+0x92/0xad
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff80092e02>]
process_timeout+0x0/0x5
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff882225ce>]
:mptctl:mptctl_do_mpt_command+0x7b6/0x998
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff8009b681>]
autoremove_wake_function+0x0/0x2e
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff8002dd9c>] __wake_up+0x38/0x4f
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff882290cb>]
:mptctl:compat_mpctl_ioctl+0x230/0x31f
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff8822903b>]
:mptctl:compat_mpctl_ioctl+0x1a0/0x31f
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff800e8cb8>]
compat_sys_ioctl+0xc5/0x2b1
Sep 28 21:13:05 san1-test1 kernel: [<ffffffff8005f013>]
sysenter_do_call+0x1b/0x67
--
Jobe Bittman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20070928/732a9c54/attachment-0001.htm
More information about the Linux-PowerEdge
mailing list