PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly

Jobe Bittman jbittman at chewcorp.com
Mon Oct 1 11:22:48 CDT 2007


sure. layout is simple for now.
Device
/dev/sda1  *      1           13        104391          82    Linux
/dev/sda2        14         743       5863725         82    Linux swap
/dev/sda3       744     19209  148328145         83    Linux

I reinstalled and am using the Centos 5 driver again.  I get the same type
of error messages. I think it is related to SMP but I'm not positive.

On 10/1/07, Patrick_Boyd at dell.com <Patrick_Boyd at dell.com> wrote:
>
>  If you have SAS drives you have to use the SAS 5/iR. The motherboard will
> control SATA drives independently of the SAS 5/iR.
>
>
>
> Can you tell me how you have the drives configured? Output from fdisk –l
> would be ideal.
>
>
>
> Thanks,
>
> Patrick Boyd
>
>
>
> *From:* Jobe Bittman [mailto:jbittman at chewcorp.com]
> *Sent:* Monday, October 01, 2007 10:39 AM
> *To:* Boyd, Patrick
> *Cc:* linux-poweredge-Lists
> *Subject:* Re: PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly
>
>
>
> 4.00.00.01 mptlinux from dell. i also got a similar crash with te centos 5
> supplied mptlinux. there is soemthing wrong with this controller i think. If
> I enable SAS in BIOS do I need to open up the server to configure it as 2
> separate disks and use linux software raid?
>
> On 10/1/07, *Patrick_Boyd at dell.com* <Patrick_Boyd at dell.com> wrote:
>
> 1)      There is no caching on the SAS 5/iR controllers. Therefore it will
> always be write-through.
>
> 2)      What version of the driver are you using?
>
>
>
> *From:* linux-poweredge-bounces at dell.com [mailto:
> linux-poweredge-bounces at dell.com] *On Behalf Of *Jobe Bittman
> *Sent:* Friday, September 28, 2007 11:30 PM
> *To:* linux-poweredge-Lists
> *Subject:* PowerEdge 860 SAS5/iR mptlinux driver crashing repeatedly
>
>
>
> I am having issues with the PowerEdge 860 SAS5/iR controller. I am running
> Centos5 64bit and running latest update kernel 2.6.18-8.1.14.el5. I have 2
> 72G drives striped. I started out using the linux supplied driver but the
> dmesg always showed that I write-through caching was being used. After
> installing OMSA 5.2 from the Dell hw/sw repos, i discovered the linux raid
> driver was hanging and crashing when attempting to connect to the OMSA web
> interface. I reloaded the machine and tried installing the mptlinux driver
> from the dell repo. It seemed to work great for the day. I even saw that
> write-back caching was working. But now I'm running into issues while
> running bonnie++ to benchmark my io. The errors in /var/log/messages are
> below. I didnt capture the error with the linux driver but it was very
> similar.
>
> Has anyone run into this?
>
>
> Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff810051291e40)
> Sep 28 21:07:26 san1-test1 kernel: sd 0:1:0:0:
> Sep 28 21:07:26 san1-test1 kernel:         command: Write(10): 2a 00 01 d4
> 1a 0a 00 01 40 00
> Sep 28 21:07:26 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler
> for type=1: IOC Not operational (0x40001600)!
> Sep 28 21:07:26 san1-test1 kernel:  Issuing HardReset!!
> Sep 28 21:07:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery
> Sep 28 21:07:26 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in
> FAULT state!!!
> Sep 28 21:07:26 san1-test1 kernel:            FAULT code = 1600h
> Sep 28 21:07:28 san1-test1 kernel: mptbase: ioc0: Recovered from IOC FAULT
>
> Sep 28 21:07:42 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
> (sc=ffff810051291e40)
> Sep 28 21:07:43 san1-test1 kernel: mptscsih: ioc0: attempting target
> reset! (sc=ffff810051291e40)
> Sep 28 21:07:43 san1-test1 kernel: sd 0:1:0:0:
> Sep 28 21:07:43 san1-test1 kernel:         command: Write(10): 2a 00 01 d4
> 1a 0a 00 01 40 00
> Sep 28 21:07:45 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
> (sc=ffff810051291e40)
> Sep 28 21:09:26 san1-test1 kernel: mptbase: Initiating ioc0 recovery
> Sep 28 21:09:36 san1-test1 kernel: BUG: soft lockup detected on CPU#0!
> Sep 28 21:09:36 san1-test1 kernel:
> Sep 28 21:09:36 san1-test1 kernel: Call Trace:
> Sep 28 21:09:36 san1-test1 kernel:  <IRQ>  [<ffffffff800b2c30>]
> softlockup_tick+0xdb/0xed
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff800933ec>]
> update_process_times+0x42/0x68
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff80073d61>]
> smp_local_timer_interrupt+0x23/0x47
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff80074423>]
> smp_apic_timer_interrupt+0x41/0x47
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff8005bcc2>]
> apic_timer_interrupt+0x66/0x6c
> Sep 28 21:09:36 san1-test1 kernel:  <EOI>  [<ffffffff8000c4d2>]
> __delay+0x8/0x10
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff880c2e4d>]
> :mptbase:WaitForDoorbellInt+0x5b/0x86
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff880c3023>]
> :mptbase:mpt_handshake_req_reply_wait+0x138/0x296
> Sep 28 21:09:36 san1-test1 kernel:  [<ffffffff8000c4d2>] __delay+0x8/0x10
> Sep 28 21:11:00 san1-test1 kernel:  [<ffffffff880c39df>]
> :mptbase:SendIocInit+0x229/0x310
> Sep 28 21:11:01 san1-test1 shutdown[12201]: shutting down for system
> reboot
> Sep 28 21:11:17 san1-test1 kernel:  [<ffffffff880c33a7>]
> :mptbase:GetIocFacts+0x7e/0x2d6
> Sep 28 21:12:07 san1-test1 init: Switching to runlevel: 6
> Sep 28 21:12:35 san1-test1 kernel:  [<ffffffff880c459f>]
> :mptbase:MakeIocReady+0x635/0xa29
> Sep 28 21:12:37 san1-test1 kernel:  [<ffffffff880c71f6>]
> :mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
> Sep 28 21:12:38 san1-test1 kernel:  [<ffffffff80072a51>]
> smp_send_reschedule+0x4e/0x53
> Sep 28 21:12:38 san1-test1 kernel:  [<ffffffff8013b1b2>]
> __next_cpu+0x19/0x28
> Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff800857cf>]
> find_busiest_group+0x20d/0x621
> Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff8006290e>]
> __kprobes_text_start+0xfe/0x230
> Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff800627d1>]
> __reacquire_kernel_lock+0x2c/0x45
> Sep 28 21:12:39 san1-test1 shutdown[12243]: shutting down for system
> reboot
> Sep 28 21:12:39 san1-test1 kernel:  [<ffffffff80060b5f>]
> thread_return+0xb7/0xea
> Sep 28 21:12:40 san1-test1 kernel:  [<ffffffff880c72e7>]
> :mptbase:mpt_HardResetHandler+0xb1/0x109
> Sep 28 21:12:40 san1-test1 kernel:  [<ffffffff88220df1>]
> :mptctl:mptctl_timeout_expired+0x1b4/0x1dc
> Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff800613bf>]
> schedule_timeout+0x92/0xad
> Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff80092e02>]
> process_timeout+0x0/0x5
> Sep 28 21:12:41 san1-test1 kernel:  [<ffffffff882225ce>]
> :mptctl:mptctl_do_mpt_command+0x7b6/0x998
> Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8009b681>]
> autoremove_wake_function+0x0/0x2e
> Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff882290cb>]
> :mptctl:compat_mpctl_ioctl+0x230/0x31f
> Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8822903b>]
> :mptctl:compat_mpctl_ioctl+0x1a0/0x31f
> Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff800e8cb8>]
> compat_sys_ioctl+0xc5/0x2b1
> Sep 28 21:12:42 san1-test1 kernel:  [<ffffffff8005f013>]
> sysenter_do_call+0x1b/0x67
> Sep 28 21:12:58 san1-test1 kernel:
> Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff8101005429c0)
> Sep 28 21:12:59 san1-test1 kernel: sd 0:1:0:0:
> Sep 28 21:12:59 san1-test1 kernel:         command: Write(10): 2a 00 02 2a
> df 8a 00 01 40 00
> Sep 28 21:12:59 san1-test1 kernel: mptscsih: ioc0: WARNING - TM Handler
> for type=1: IOC Not operational (0x40001600)!
> Sep 28 21:12:59 san1-test1 kernel:  Issuing HardReset!!
> Sep 28 21:12:59 san1-test1 kernel: mptbase: Initiating ioc0 recovery
> Sep 28 21:12:59 san1-test1 kernel: mptbase: ioc0: WARNING - IOC is in
> FAULT state!!!
> Sep 28 21:13:00 san1-test1 kernel:            FAULT code = 1600h
> Sep 28 21:13:00 san1-test1 kernel: mptbase: ioc0: Recovered from IOC FAULT
> Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: task abort: FAILED
> (sc=ffff8101005429c0)
> Sep 28 21:13:00 san1-test1 kernel: mptscsih: ioc0: attempting target
> reset! (sc=ffff8101005429c0)
> Sep 28 21:13:00 san1-test1 kernel: sd 0:1:0:0:
> Sep 28 21:13:01 san1-test1 kernel:         command: Write(10): 2a 00 02 2a
> df 8a 00 01 40 00
> Sep 28 21:13:01 san1-test1 kernel: mptscsih: ioc0: target reset: SUCCESS
> (sc=ffff8101005429c0)
> Sep 28 21:13:01 san1-test1 kernel: mptbase: Initiating ioc0 recovery
> Sep 28 21:13:01 san1-test1 kernel: BUG: soft lockup detected on CPU#0!
> Sep 28 21:13:01 san1-test1 kernel:
> Sep 28 21:13:01 san1-test1 kernel: Call Trace:
> Sep 28 21:13:01 san1-test1 kernel:  <IRQ>  [<ffffffff800b2c30>]
> softlockup_tick+0xdb/0xed
> Sep 28 21:13:01 san1-test1 kernel:  [<ffffffff800933ec>]
> update_process_times+0x42/0x68
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff80073d61>]
> smp_local_timer_interrupt+0x23/0x47
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff80074423>]
> smp_apic_timer_interrupt+0x41/0x47
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff8005bcc2>]
> apic_timer_interrupt+0x66/0x6c
> Sep 28 21:13:02 san1-test1 kernel:  <EOI>  [<ffffffff8000c4d2>]
> __delay+0x8/0x10
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff880c2e4d>]
> :mptbase:WaitForDoorbellInt+0x5b/0x86
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff880c3023>]
> :mptbase:mpt_handshake_req_reply_wait+0x138/0x296
> Sep 28 21:13:02 san1-test1 kernel:  [<ffffffff8000c4d2>] __delay+0x8/0x10
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c39df>]
> :mptbase:SendIocInit+0x229/0x310
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c33a7>]
> :mptbase:GetIocFacts+0x7e/0x2d6
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c459f>]
> :mptbase:MakeIocReady+0x635/0xa29
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff880c71f6>]
> :mptbase:mpt_do_ioc_recovery+0xf0d/0xf4d
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff80072a51>]
> smp_send_reschedule+0x4e/0x53
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff8011735a>]
> avc_has_perm+0x43/0x55
> Sep 28 21:13:03 san1-test1 kernel:  [<ffffffff80117a1b>]
> ipc_has_perm+0x59/0x67
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff8006290e>]
> __kprobes_text_start+0xfe/0x230
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800862e7>]
> dequeue_task+0x18/0x37
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800627d1>]
> __reacquire_kernel_lock+0x2c/0x45
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff80060b5f>]
> thread_return+0xb7/0xea
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff880c72e7>]
> :mptbase:mpt_HardResetHandler+0xb1/0x109
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff88220df1>]
> :mptctl:mptctl_timeout_expired+0x1b4/0x1dc
> Sep 28 21:13:04 san1-test1 kernel:  [<ffffffff800613bf>]
> schedule_timeout+0x92/0xad
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff80092e02>]
> process_timeout+0x0/0x5
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff882225ce>]
> :mptctl:mptctl_do_mpt_command+0x7b6/0x998
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8009b681>]
> autoremove_wake_function+0x0/0x2e
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8002dd9c>]
> __wake_up+0x38/0x4f
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff882290cb>]
> :mptctl:compat_mpctl_ioctl+0x230/0x31f
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8822903b>]
> :mptctl:compat_mpctl_ioctl+0x1a0/0x31f
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff800e8cb8>]
> compat_sys_ioctl+0xc5/0x2b1
> Sep 28 21:13:05 san1-test1 kernel:  [<ffffffff8005f013>]
> sysenter_do_call+0x1b/0x67
>
> --
> Jobe Bittman
>
>
>
>
> --
> Jobe Bittman
> Chief Network Architect
> Stage6
>



-- 
Jobe Bittman
Chief Network Architect
Stage6
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20071001/1f6b41e5/attachment.htm 


More information about the Linux-PowerEdge mailing list