strange I/O errors with SAN storage

Robert von Bismarck robert.vonbismarck at vtx-telecom.ch
Mon Aug 4 04:35:17 CDT 2008


Sijis,

Yes, we have to do a reinstall of PP after every kernel update, but well, that's in the update cycle now, we know that the PP-equipped boxes need an hour of work for a kernel update instead of 5 minutes :)

MPIO from the Device-Mapper seems like the way to go, but I need to find some time to test it with our production environment though.

Cheers,

RvB


> -----Message d'origine-----
> De : linux-poweredge-bounces at dell.com 
> [mailto:linux-poweredge-bounces at dell.com] De la part de Sijis Aviles
> Envoyé : jeudi, 31. juillet 2008 21:08
> À : linux-poweredge at dell.com
> Objet : RE: strange I/O errors with SAN storage
> 
> Robert,
> 
> Did you have have to reinstall PowerPath after a kernel 
> upgrade? We are using PP 4.5.1 and everytime we upgrade the 
> kernel, I have to reinstall the software.
> 
> Chris,
> 
> I've seen similar errors on my RHEL4 servers when the path 
> dies because the path to the SAN dies (HBA, cable, switch 
> port) . From looking at your errors, it seems like it could 
> be connectivity issue. Try changing the FC cable, it might be 
> bad or seeing if there are any errors in the switch.
> Does dmesg or /var/log/messages show anything useful?
>  
> We are using PE2950's and we use local disk for OS and SAN 
> for storage. 
> 
> As a note, in our documentation, this is our config for the 
> HBA to connect to our EMC Symmetrix.
> a. Host Adapter BIOS = Enabled
> b. Connection Options = 1 (Point to point only) c. Data Rate 
> = 1 (2GB/S) Everyone else is defaults
> 
> If I think of anything else, I'll pass it along.
> 
> Sijis
> --------------
> Sijis Aviles | Systems Administrator | Empire Today, LLC 
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] On Behalf Of 
> linux-poweredge-request at dell.com
> Sent: Thursday, July 31, 2008 12:00 PM
> To: linux-poweredge at dell.com
> Subject: Linux-PowerEdge Digest, Vol 47, Issue 55
> 
> Send Linux-PowerEdge mailing list submissions to
> 	linux-poweredge at dell.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> or, via email, send a message with subject or body 'help' to
> 	linux-poweredge-request at dell.com
> 
> You can reach the person managing the list at
> 	linux-poweredge-owner at dell.com
> 
> When replying, please edit your Subject line so it is more 
> specific than
> "Re: Contents of Linux-PowerEdge digest..."
> 
> 
> Today's Topics:
> 
>    1. RE: Help with md3000i (Nick_Parrott at Dell.com)
>    2. RE: Help with md3000i (Harald_Jensas at Dell.com)
>    3. OMSA 5.4 Centos4.6: WARNING: srvadmin-storage configuration
>       not	performed; > '/etc/omreg.cfg' is missing or damaged
> (Robert Hart)
>    4. RE: strange I/O errors with SAN storage (Robert von Bismarck)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 31 Jul 2008 10:47:34 +0100
> From: <Nick_Parrott at Dell.com>
> Subject: RE: Help with md3000i
> To: <Sidney.Young at globalx.com.au>, <matthew at acfr.usyd.edu.au>
> Cc: linux-poweredge at lists.us.dell.com
> Message-ID:
> 	
> <2AB12D8DC5E4564DB294A4C66F0C68F1DD677D at DUBX3M11.dub.emea.dell.com>
> Content-Type: text/plain;	charset="us-ascii"
> 
> Hi Guys,
> 
> Close, but not close enough. Some MD3000's were shipped with 
> the serial cables, some were not.. not sure of the reasons 
> why, just know that it's 50/50 as to whether a customer 
> calling in has one.
> 
> You can do a full reset (this needs a Dell tech to provide 
> the syntax/credentials for the reset command) or you can 
> simply look at the IP's of the management ports with a 
> similar command to ifconfig, then use the Java software to 
> fire in on that IP.
> 
> I'm checking out the availability of this cable, as far as 
> I'm aware it's not "for sale" but can be provided with a 
> service call if there is another reason for us to be out to the unit.
> 
> Regards,
> 
> Nick 
> 
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Sid Young
> Sent: 31 July 2008 04:51
> To: Matthew Geier
> Cc: linux-poweredge-Lists
> Subject: RE: Help with md3000i
> 
> 
> Yes I suspect that is the case, the serial cable activates 
> something, however I am told there is a password that needs 
> to be applied and instructions from a level 2/3 tech to 
> resolve access.
> 
> Could be something really simple like "wipe config all<enter>" ;)
> 
> Sid
> 
> -----Original Message-----
> From: Matthew Geier [mailto:matthew at acfr.usyd.edu.au]
> Sent: Thursday, July 31, 2008 12:15 PM
> To: Scott R. Ehrlich
> Cc: Sid Young; Linux-PowerEdge at dell.com
> Subject: Re: Help with md3000i
> 
> Scott R. Ehrlich wrote:
> > Can you put it on an isolated LAN or use a crossover cable 
> between it 
> > and a PC, and use a network sniffer to monitor IP 
> addresses?  If so, 
> > you should be able to get the IP from it through the sniffer.
> >
> > As for the unlock kit, I'd like to learn more about that, too - 
> > exactly what it does, what it costs, etc.
> >
> >   
>  What's the bet the 'unlock kit' is the serial console cable :-)
> 
>  My MD3000i came with a serial cable which when plugged in 
> gives you access to a service console. You can do all sorts 
> of scary stuff from the service console - including 
> determining the IP address of the controller :-)
> 
> 
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 31 Jul 2008 11:52:41 +0200
> From: <Harald_Jensas at Dell.com>
> Subject: RE: Help with md3000i
> To: <matthew at acfr.usyd.edu.au>, <johnh at comp.leeds.ac.uk>
> Cc: linux-poweredge at lists.us.dell.com
> Message-ID:
> 	
> <87C820D35C176D428A1A8A3B34F6FB86BACE06 at uppx3m1.upp.emea.dell.com>
> Content-Type: text/plain;	charset="US-ASCII"
> 
> > -----Original Message-----
> > From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge- 
> > bounces at dell.com] On Behalf Of Matthew Geier
> > Sent: 31 July 2008 04:08
> > To: John Hodrien
> > Cc: linux-poweredge-Lists
> > Subject: Re: Help with md3000i
> > 
> > John Hodrien wrote:
> > > On Tue, 29 Jul 2008, Matthew Geier wrote:
> > >
> > >
> > >> There is a windows app that some how finds what IP address the 
> > >> MD3000i has given itself. If you don't have control of your DHCP 
> > >> server and thus the IP address it gets, this might be 
> the only easy
> way to
> > find out what IP it got.
> > >>
> > >
> > > Does it not respond to a broadcast ping?
> > >
> > 
> >  It does, but so does at lot of other stuff, it really 
> doesn't help a
> great
> > deal unless your network is small, and if it's small you 
> probably can
> see what
> > your DHCP server is up to if you have one.
> 
> The MD3000i Configuration Utility has an option to 
> Automatically detect MD3000i arrays in the subnet. It will 
> take quite some time in a large network, but it should work. 
> The utility is available on version 1.4 or later of the 
> MD3000i resource CD.
> 
> 
> 1. Start MD3000i Configuration Utility. (Run 'MDconfig.sh' if 
> you are in
> Linux.)
> 2. Select "Configure MD3000i" and Click Next.
> 3. To discover available storage arrays choose "Discover New 
> Arrays" and click Next.
> 4. To perform an automatic discovery of storage arrays within 
> the local subnet choose "Automatic" and click next.
> 5. Follow trouch the wizard to configure your MD3000i.
> 
> 
> 
> --
> Harald
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 31 Jul 2008 10:02:52 -0400
> From: "Robert Hart" <meteobobdell at gmail.com>
> Subject: OMSA 5.4 Centos4.6: WARNING: srvadmin-storage configuration
> 	not	performed; > '/etc/omreg.cfg' is missing or damaged
> To: linux-poweredge at dell.com
> Message-ID:
> 	<c6551ca10807310702m399be5ccue5b0472ecea26787 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> I have searched through every prior post and via google, and 
> cannot find an answer to this.  I apologize if it has been 
> and I missed it.
> 
> I have successfully run OMSA on seven different PE.  They have worked
> wonderfully.    However, yesterday,  I got two more:  PE2900 
> and PET300.
> Unfortunately, I cannot get OMSA to work on these.  They all 
> run CentOS4.6, including the machines on which OMSAworks.
> 
> The only difference with the machines that don't work is that 
> they are new hardware (PE2900 and PET300 for first time) and 
> I believe the yum install is from OMSA 5.4 directly.  I 
> believe prior installs were 5.3 which then upgraded to 5.4 with yum.
> 
> The specific error I am seeing on both machines is:
> 
> WARNING: srvadmin-storage configuration not performed; 
> '/etc/omreg.cfg'
> is missing or damaged
> 
> [Yet of course the file in etc is there with no obvious 
> errors when comparing to "good" installs].
> 
> After srvadmin-services start, I cannot even get 
> https:...1311 to bring up the web page.
> 
> Is the problem going with 5.4 to start with?  If so, is there 
> an easy way to have the yum repo use 5.3 to start?
> 
> Thanks again for your help.
> 
> Bob
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.us.dell.com/pipermail/linux-poweredge/attachments
> /20080731/
> 207455bf/attachment-0001.htm 
> 
> ------------------------------
> 
> Message: 4
> Date: Thu, 31 Jul 2008 16:56:16 +0200
> From: "Robert von Bismarck" <robert.vonbismarck at vtx-telecom.ch>
> Subject: RE: strange I/O errors with SAN storage
> To: <christian.peper at kpn.com>, <linux-poweredge at dell.com>
> Message-ID:
> 	
> <07C3015B9E21F949AB59B28F2D5B567F01EBC0C9 at exch-pul-01.interne.
> smart-tele
> com.ch>
> 	
> Content-Type: text/plain;	charset="iso-8859-1"
> 
> Hello,
> 
> Have you tried booting with only one HBA ?
> 
> We have seen the same kind of errors with dual-ported SAN 
> connections because the linux kernel tried to access the SAN 
> volumes because PowerPath (the EMC failover software) did not 
> load correctly after a kernel update.
> We disabled one path so that we could perform the necessary 
> maintenance, which was to get the latest release of PowerPath 
> and install it on the host. Reboot, reconnect the fiber, and 
> the system was back to being it's happy self again.
> NB: we do not boot from the SAN as you do, we have a local OS 
> installation and data storage on the SAN.
> 
> This was in a PE2850 with centos 4.5 and qlogic 2340 pci-x 
> adapters connected to a Clariion array.
> 
> Kind regards,
> 
> Robert von Bismarck
> 
> 
> 
> 
> > -----Message d'origine-----
> > De : linux-poweredge-bounces at dell.com 
> > [mailto:linux-poweredge-bounces at dell.com] De la part de 
> > christian.peper at kpn.com Envoy? : jeudi, 31. juillet 2008 11:36 ? :
> > linux-poweredge at dell.com Objet : strange I/O errors with SAN storage
> > 
> > Hi everyone,
> > I hope someone can make a few suggestions as to where 
> (what) to look 
> > (for). Because we're baffled and apart from creating new 
> disks on the 
> > SAN, we have run out of things to check.
> > Except booting from the same LUNs on a different server: no more 
> > hardware :( ...
> > 
> > We've found some really strange I/O errors (PE2950, 
> OEL/RHEL AS 4u5, 
> > 2x Qlogic qle2460, firmware 1.24) using LUNs on our
> > DMX-3 SAN. One HBA was faulty so we replaced it. However upon 
> > restoring the OS and reinstalling it, more problems appeared.
> > The new HBA would not boot at all using the existing disks. 
> > So we disabled it in the BIOS and booted from the other
> > (original) HBA. Both HBAs have the same firmware, same settings.
> > 
> > Upon booting, anything involving the disks (we boot from 
> SAN and have 
> > data disks there as well) is extremely sluggish.
> > Letting the server do its thing, I got a ton of I/O errors first 
> > during disk discovery, then again during mounting of file systems.
> > 
> > ERROR: ddf1: reading /dev/sdb[Input/output error]
> > ERROR: hpt37x: reading /dev/sdb[Input/output error]
> > ERROR: pdc: reading /dev/sdb[Input/output error]
> > ERROR: pdc: reading /dev/sdb[Input/output error]
> > ERROR: pdc: reading /dev/sdb[Input/output error]
> > ERROR: pdc: reading /dev/sdb[Input/output error]
> > ERROR: pdc: reading /dev/sdb[Input/output error]
> > ERROR: sil: reading /dev/sdb[Input/output error]
> > ERROR: ddf1: reading /dev/sdc[Input/output error]
> > ERROR: hpt37x: reading /dev/sdc[Input/output error]
> > ERROR: pdc: reading /dev/sdc[Input/output error]
> > ERROR: pdc: reading /dev/sdc[Input/output error]
> > ERROR: pdc: reading /dev/sdc[Input/output error]
> > ERROR: pdc: reading /dev/sdc[Input/output error]
> > ERROR: pdc: reading /dev/sdc[Input/output error]
> > ERROR: sil: reading /dev/sdc[Input/output error]
> > ERROR: ddf1: reading /dev/sdd[Input/output error]
> > ERROR: hpt37x: reading /dev/sdd[Input/output error]
> > ERROR: pdc: reading /dev/sdd[Input/output error]
> > ERROR: pdc: reading /dev/sdd[Input/output error]
> > ERROR: pdc: reading /dev/sdd[Input/output error]
> > ERROR: pdc: reading /dev/sdd[Input/output error]
> > ERROR: pdc: reading /dev/sdd[Input/output error]
> > ERROR: sil: reading /dev/sdd[Input/output error] ...
> > and so on for all disks (LUNs) attached.
> > 
> > Searching the web gave me a few hits but no solutions (see 1|2|3).
> > However, all errors were related to local RAID setups using 
> ATA/SATA 
> > disks. I am not using local RAID. We have Dell Poweredge 
> 2950 servers 
> > with 2 qle2460 HBAs. The internal PERC5/i is enabled as it provides 
> > the swap disk space, but it doesn't do anything. 
> Furthermore, sdb, sdc
> 
> > and so on are SAN disks. So why do I get RAID errors from 
> them? Could 
> > this point to motherboard errors? PCI bus errors? Broken FC cables?
> > Bad FC switch configuration of simply damaged LUNs from the SAN?
> > 
> > I'm keeping a blog of this updated with anything new I run into...
> > http://breakablelinux.blogspot.com/2008/07/strange-io-errors-w
> > ith-san.ht
> > ml
> > 
> > thanks in advance,
> > Chris.
> > 
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> > 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
> 
> End of Linux-PowerEdge Digest, Vol 47, Issue 55
> ***********************************************
> 
> 
> The information contained in this message may be confidential 
> and is for the intended addressee only.
> Any unauthorized use, dissemination of the information, or 
> copying of this message is prohibited. 
> If you are not the intended addressee, please notify the 
> sender immediately and delete this message.
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
> 



More information about the Linux-PowerEdge mailing list