Performance differences across two ports on same MD3000i controller
gene.tang at kiwiplan.co.nz
Thu Sep 3 18:42:49 CDT 2009
I'm wondering if anybody can explain the following to me. We have two MD3000i's setup in independent ESX4 clusters showing the following issues, across both controllers on each MD3000i, so I suspect it not to be a hardware issue. I'm getting strange performance differences when a ESX host connects connects to one of the two ports of a controller to access a LUN. Each ESX host uses the "Most Recently Used" Path Policy instead of the alternative Round Robin path policy. My ESX setup is identical to shown here: http://www.delltechcenter.com/page/VMware+ESX+4.0+and+PowerVault+MD3000i. i.e. Each port of one controller on the MD3000i is connected to an independent subnet across its own independent switch.
The performance differences are tested using the following method. I have three hosts connected to the MD3000i, and one virtual machine. One of the hosts is using "Port 1, Controller 0" for its I/O traffic, and the other two are using "Port 0, Controller 0" for their I/O traffic, and each port is on the controller that owns the virtual disk. I use HDTune Pro as my benchmarking tool (other tools show similar results). I test by VMotioning a virtual machine onto one machine, running HDTune Pro, then VMotioning the VM onto the other machine and running HDTune Pro again. The results are below:
"Port 0, Controller 0":
Minimum Transfer Rate: 30.1MB/sec
Maximum Transfer Rate: 988.8MB/sec
Average Transfer Rate: 156.9MB/sec
Access Time: 4.0ms
Burst Rate: 37.5MB/sec
CPU Usage: 3.5%
"Port 1, Controller 0":
Minimum Transfer Rate: 6.0MB/sec
Maximum Transfer Rate: 968.6MB/sec
Average Transfer Rate: 131.2MB/sec
Access Time: 4.3ms
Burst Rate: 4.5MB/sec
CPU Usage: 3.2%
(Image viewed here: http://en.community.dell.com/forums/t/19292882.aspx)
On the top is the VM connected on host to "Port 0, Controller 0", on the bottom is the VM connected to host on to "Port 1, Controller 0". Although not obvious (due to the scales being different - sorry I couldn't get the scales to match), the I/O results are almost 10-20mb/s slower on transfer rates on Port 1, and the burst rate is hugely slower (although access times are about even). If I force the machine connected on "Port 1, Controller 0" onto "Port 0, Controller 0", the performance improves, however, this isn't ideal since it means there is a disused port on one of the controllers.
I know it can't be the switches at fault either, because those machines accessing other LUNs, say on "Port 1, Controller 1", have good performance, and "Port 0, Controller 1" has poor performance. Note that Port 0 on both controllers are connected to Switch 1, and both port 1's on both controllers are connected to Switch 2. Also, both MD3000i's are displaying the same behaviour, and they are independent of each other.
This leads me to believe its something to do with the MD3000i itself, and is related to how it is accessing the LUNs. Does anyone have any clue with this, or why it is displaying this behaviour. Note that the MD3000i's are all showing optimal status, and each LUN is appropriately owned by one of the two controllers (i.e. the virtual disks are all on their preferred paths).
If anyone has any clue about this, it would be greatly appreciated.
More information about the Linux-PowerEdge