08-31-2010 04:59 PM
I'll keep this as short as possible...hopefully I will explain properly.
We have 2 DMX4 arrays (5773) connected via SRDF between 2 sites. The group consists of 4 RF ports on each array (3d, 4d, 13d, 14d). For months there have been no issues, but then about 3 weeks ago an issue started surfacing with the 13d connection and this became apparent when I noticed the IO Service time going through the roof for that port. The first thing we did, in order to isolate the issue, was switch the cabling with that of 14d, and then 13d started working well and 14d was having issues....this, to me, proves that the port is OK, but that something is wrong on the switch side. We moved the cables back to where they were, and as expected, 13d started being a problem again and 14d was ok. We then went one step further and moved 13d to a different port on a different LIM in the switch, and 13d started working properly. We concluded that there was an issue either on the original port, paddle, or LIM.
About 5-6 days later, 13d started getting errors again on the new port. We have not done any more troubleshooting with this since, but the client wants to replace the entire director board (13) to rule it out. I have a hard time seeing how the Director board can be the problem if switching to another port resolves the issue (at least temporarily).
Is there something on the McData switch we should be looking at? Some kind of setting that would cause something to run OK for a while and then fail again?
Please let ke know if any of you have ever seen similar behaviour. Replacing an entire Director board is pretty impacting, especially since there are well over 50 servers zoned to the other 6 ports on that board.
09-01-2010 09:06 AM
I have faced similar issue in HITACHI USP-V and Brocade . It was HUR. We had escalated it to Sun, who escalated to Brocade in the back end. We and even Sun people could not find anything . Atlast Brocade found some crc error and changed the Board.
I Know this will be a tough task, but if the solution will be that, then we have to do it.
But this will be an online activity, if all the servers have multipath and I DMX is an enterprise storage, so it has.
So you have to log the call with ur vendor .
have you checked the ports with portstatsclear/portstatsshow/porterrshow/portshow cmnds? Did you analize the supportsave's FFDC?
Did you find any error from ur analysis on ports?
I have not worked on McDATA i10K. But if you can provide the supportsave of the Dir, somebody from teh forum will definitely help, but trust me,
you should log a call with the SW vendor simultaneously.Let them do the analysis, if they say to change the board, then you have to.