strange I/O errors with SAN storage
Sijis Aviles
saviles at empire-today.com
Thu Jul 31 14:08:19 CDT 2008
Robert,
Did you have have to reinstall PowerPath after a kernel upgrade? We are
using PP 4.5.1 and everytime we upgrade the kernel, I have to reinstall
the software.
Chris,
I've seen similar errors on my RHEL4 servers when the path dies because
the path to the SAN dies (HBA, cable, switch port) . From looking at
your errors, it seems like it could be connectivity issue. Try changing
the FC cable, it might be bad or seeing if there are any errors in the
switch.
Does dmesg or /var/log/messages show anything useful?
We are using PE2950's and we use local disk for OS and SAN for storage.
As a note, in our documentation, this is our config for the HBA to
connect to our EMC Symmetrix.
a. Host Adapter BIOS = Enabled
b. Connection Options = 1 (Point to point only)
c. Data Rate = 1 (2GB/S)
Everyone else is defaults
If I think of anything else, I'll pass it along.
Sijis
--------------
Sijis Aviles | Systems Administrator | Empire Today, LLC
-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of
linux-poweredge-request at dell.com
Sent: Thursday, July 31, 2008 12:00 PM
To: linux-poweredge at dell.com
Subject: Linux-PowerEdge Digest, Vol 47, Issue 55
Send Linux-PowerEdge mailing list submissions to
linux-poweredge at dell.com
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
or, via email, send a message with subject or body 'help' to
linux-poweredge-request at dell.com
You can reach the person managing the list at
linux-poweredge-owner at dell.com
When replying, please edit your Subject line so it is more specific than
"Re: Contents of Linux-PowerEdge digest..."
Today's Topics:
1. RE: Help with md3000i (Nick_Parrott at Dell.com)
2. RE: Help with md3000i (Harald_Jensas at Dell.com)
3. OMSA 5.4 Centos4.6: WARNING: srvadmin-storage configuration
not performed; > '/etc/omreg.cfg' is missing or damaged
(Robert Hart)
4. RE: strange I/O errors with SAN storage (Robert von Bismarck)
----------------------------------------------------------------------
Message: 1
Date: Thu, 31 Jul 2008 10:47:34 +0100
From: <Nick_Parrott at Dell.com>
Subject: RE: Help with md3000i
To: <Sidney.Young at globalx.com.au>, <matthew at acfr.usyd.edu.au>
Cc: linux-poweredge at lists.us.dell.com
Message-ID:
<2AB12D8DC5E4564DB294A4C66F0C68F1DD677D at DUBX3M11.dub.emea.dell.com>
Content-Type: text/plain; charset="us-ascii"
Hi Guys,
Close, but not close enough. Some MD3000's were shipped with the serial
cables, some were not.. not sure of the reasons why, just know that it's
50/50 as to whether a customer calling in has one.
You can do a full reset (this needs a Dell tech to provide the
syntax/credentials for the reset command) or you can simply look at the
IP's of the management ports with a similar command to ifconfig, then
use the Java software to fire in on that IP.
I'm checking out the availability of this cable, as far as I'm aware
it's not "for sale" but can be provided with a service call if there is
another reason for us to be out to the unit.
Regards,
Nick
-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Sid Young
Sent: 31 July 2008 04:51
To: Matthew Geier
Cc: linux-poweredge-Lists
Subject: RE: Help with md3000i
Yes I suspect that is the case, the serial cable activates something,
however I am told there is a password that needs to be applied and
instructions from a level 2/3 tech to resolve access.
Could be something really simple like "wipe config all<enter>" ;)
Sid
-----Original Message-----
From: Matthew Geier [mailto:matthew at acfr.usyd.edu.au]
Sent: Thursday, July 31, 2008 12:15 PM
To: Scott R. Ehrlich
Cc: Sid Young; Linux-PowerEdge at dell.com
Subject: Re: Help with md3000i
Scott R. Ehrlich wrote:
> Can you put it on an isolated LAN or use a crossover cable between it
> and a PC, and use a network sniffer to monitor IP addresses? If so,
> you should be able to get the IP from it through the sniffer.
>
> As for the unlock kit, I'd like to learn more about that, too -
> exactly what it does, what it costs, etc.
>
>
What's the bet the 'unlock kit' is the serial console cable :-)
My MD3000i came with a serial cable which when plugged in gives you
access to a service console. You can do all sorts of scary stuff from
the service console - including determining the IP address of the
controller :-)
_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq
------------------------------
Message: 2
Date: Thu, 31 Jul 2008 11:52:41 +0200
From: <Harald_Jensas at Dell.com>
Subject: RE: Help with md3000i
To: <matthew at acfr.usyd.edu.au>, <johnh at comp.leeds.ac.uk>
Cc: linux-poweredge at lists.us.dell.com
Message-ID:
<87C820D35C176D428A1A8A3B34F6FB86BACE06 at uppx3m1.upp.emea.dell.com>
Content-Type: text/plain; charset="US-ASCII"
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
> bounces at dell.com] On Behalf Of Matthew Geier
> Sent: 31 July 2008 04:08
> To: John Hodrien
> Cc: linux-poweredge-Lists
> Subject: Re: Help with md3000i
>
> John Hodrien wrote:
> > On Tue, 29 Jul 2008, Matthew Geier wrote:
> >
> >
> >> There is a windows app that some how finds what IP address the
> >> MD3000i has given itself. If you don't have control of your DHCP
> >> server and thus the IP address it gets, this might be the only easy
way to
> find out what IP it got.
> >>
> >
> > Does it not respond to a broadcast ping?
> >
>
> It does, but so does at lot of other stuff, it really doesn't help a
great
> deal unless your network is small, and if it's small you probably can
see what
> your DHCP server is up to if you have one.
The MD3000i Configuration Utility has an option to Automatically detect
MD3000i arrays in the subnet. It will take quite some time in a large
network, but it should work. The utility is available on version 1.4 or
later of the MD3000i resource CD.
1. Start MD3000i Configuration Utility. (Run 'MDconfig.sh' if you are in
Linux.)
2. Select "Configure MD3000i" and Click Next.
3. To discover available storage arrays choose "Discover New Arrays" and
click Next.
4. To perform an automatic discovery of storage arrays within the local
subnet choose "Automatic" and click next.
5. Follow trouch the wizard to configure your MD3000i.
--
Harald
------------------------------
Message: 3
Date: Thu, 31 Jul 2008 10:02:52 -0400
From: "Robert Hart" <meteobobdell at gmail.com>
Subject: OMSA 5.4 Centos4.6: WARNING: srvadmin-storage configuration
not performed; > '/etc/omreg.cfg' is missing or damaged
To: linux-poweredge at dell.com
Message-ID:
<c6551ca10807310702m399be5ccue5b0472ecea26787 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
I have searched through every prior post and via google, and cannot find
an answer to this. I apologize if it has been and I missed it.
I have successfully run OMSA on seven different PE. They have worked
wonderfully. However, yesterday, I got two more: PE2900 and PET300.
Unfortunately, I cannot get OMSA to work on these. They all run
CentOS4.6, including the machines on which OMSAworks.
The only difference with the machines that don't work is that they are
new hardware (PE2900 and PET300 for first time) and I believe the yum
install is from OMSA 5.4 directly. I believe prior installs were 5.3
which then upgraded to 5.4 with yum.
The specific error I am seeing on both machines is:
WARNING: srvadmin-storage configuration not performed; '/etc/omreg.cfg'
is missing or damaged
[Yet of course the file in etc is there with no obvious errors when
comparing to "good" installs].
After srvadmin-services start, I cannot even get https:...1311 to bring
up the web page.
Is the problem going with 5.4 to start with? If so, is there an easy
way to have the yum repo use 5.3 to start?
Thanks again for your help.
Bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20080731/
207455bf/attachment-0001.htm
------------------------------
Message: 4
Date: Thu, 31 Jul 2008 16:56:16 +0200
From: "Robert von Bismarck" <robert.vonbismarck at vtx-telecom.ch>
Subject: RE: strange I/O errors with SAN storage
To: <christian.peper at kpn.com>, <linux-poweredge at dell.com>
Message-ID:
<07C3015B9E21F949AB59B28F2D5B567F01EBC0C9 at exch-pul-01.interne.smart-tele
com.ch>
Content-Type: text/plain; charset="iso-8859-1"
Hello,
Have you tried booting with only one HBA ?
We have seen the same kind of errors with dual-ported SAN connections
because the linux kernel tried to access the SAN volumes because
PowerPath (the EMC failover software) did not load correctly after a
kernel update.
We disabled one path so that we could perform the necessary maintenance,
which was to get the latest release of PowerPath and install it on the
host. Reboot, reconnect the fiber, and the system was back to being it's
happy self again.
NB: we do not boot from the SAN as you do, we have a local OS
installation and data storage on the SAN.
This was in a PE2850 with centos 4.5 and qlogic 2340 pci-x adapters
connected to a Clariion array.
Kind regards,
Robert von Bismarck
> -----Message d'origine-----
> De : linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] De la part de
> christian.peper at kpn.com Envoy? : jeudi, 31. juillet 2008 11:36 ? :
> linux-poweredge at dell.com Objet : strange I/O errors with SAN storage
>
> Hi everyone,
> I hope someone can make a few suggestions as to where (what) to look
> (for). Because we're baffled and apart from creating new disks on the
> SAN, we have run out of things to check.
> Except booting from the same LUNs on a different server: no more
> hardware :( ...
>
> We've found some really strange I/O errors (PE2950, OEL/RHEL AS 4u5,
> 2x Qlogic qle2460, firmware 1.24) using LUNs on our
> DMX-3 SAN. One HBA was faulty so we replaced it. However upon
> restoring the OS and reinstalling it, more problems appeared.
> The new HBA would not boot at all using the existing disks.
> So we disabled it in the BIOS and booted from the other
> (original) HBA. Both HBAs have the same firmware, same settings.
>
> Upon booting, anything involving the disks (we boot from SAN and have
> data disks there as well) is extremely sluggish.
> Letting the server do its thing, I got a ton of I/O errors first
> during disk discovery, then again during mounting of file systems.
>
> ERROR: ddf1: reading /dev/sdb[Input/output error]
> ERROR: hpt37x: reading /dev/sdb[Input/output error]
> ERROR: pdc: reading /dev/sdb[Input/output error]
> ERROR: pdc: reading /dev/sdb[Input/output error]
> ERROR: pdc: reading /dev/sdb[Input/output error]
> ERROR: pdc: reading /dev/sdb[Input/output error]
> ERROR: pdc: reading /dev/sdb[Input/output error]
> ERROR: sil: reading /dev/sdb[Input/output error]
> ERROR: ddf1: reading /dev/sdc[Input/output error]
> ERROR: hpt37x: reading /dev/sdc[Input/output error]
> ERROR: pdc: reading /dev/sdc[Input/output error]
> ERROR: pdc: reading /dev/sdc[Input/output error]
> ERROR: pdc: reading /dev/sdc[Input/output error]
> ERROR: pdc: reading /dev/sdc[Input/output error]
> ERROR: pdc: reading /dev/sdc[Input/output error]
> ERROR: sil: reading /dev/sdc[Input/output error]
> ERROR: ddf1: reading /dev/sdd[Input/output error]
> ERROR: hpt37x: reading /dev/sdd[Input/output error]
> ERROR: pdc: reading /dev/sdd[Input/output error]
> ERROR: pdc: reading /dev/sdd[Input/output error]
> ERROR: pdc: reading /dev/sdd[Input/output error]
> ERROR: pdc: reading /dev/sdd[Input/output error]
> ERROR: pdc: reading /dev/sdd[Input/output error]
> ERROR: sil: reading /dev/sdd[Input/output error] ...
> and so on for all disks (LUNs) attached.
>
> Searching the web gave me a few hits but no solutions (see 1|2|3).
> However, all errors were related to local RAID setups using ATA/SATA
> disks. I am not using local RAID. We have Dell Poweredge 2950 servers
> with 2 qle2460 HBAs. The internal PERC5/i is enabled as it provides
> the swap disk space, but it doesn't do anything. Furthermore, sdb, sdc
> and so on are SAN disks. So why do I get RAID errors from them? Could
> this point to motherboard errors? PCI bus errors? Broken FC cables?
> Bad FC switch configuration of simply damaged LUNs from the SAN?
>
> I'm keeping a blog of this updated with anything new I run into...
> http://breakablelinux.blogspot.com/2008/07/strange-io-errors-w
> ith-san.ht
> ml
>
> thanks in advance,
> Chris.
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
>
------------------------------
_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq
End of Linux-PowerEdge Digest, Vol 47, Issue 55
***********************************************
The information contained in this message may be confidential and is for the intended addressee only.
Any unauthorized use, dissemination of the information, or copying of this message is prohibited.
If you are not the intended addressee, please notify the sender immediately and delete this message.
More information about the Linux-PowerEdge
mailing list