05-28-2010 08:00 AM
Just wondering if anyone might be able to point me in the right direction. We have been investigating and troubleshooting this issue for over two months now and we are no further than we were when we started. First off this is in regards to backup speed decreasing. In March I was seeing backup speeds from my EMC Electronic Disk Library(EDL) backing up clients around 120 MB/s then one day poof nada nothing zilch. Found a bad port and a bad cable decided to replace all the fibre cables involved. Once the cable was replaced the speed gradually got slower and slower now its down to about 20-60 MB/s. During this problem we have upgraded to the latest code revisions on HBA's and the code for the OS on the switches was upgraded to 6.3... We also had a FINISAR brought in to check the traffic basically this is what it showed: data leaves the server through the Brocade 4424's in Dell M1000e chassis through the fibre patch panels into the Brocade 4900 then back out of the 4900 to the EDL from the EDL back to the 4900 once the data came out of the 4900 it lost it's Encapsulation. We re-ran the test multiple times and each time same result. I then tried bypassing the Fibre patch panels all together and ran cabling directly from the 4424's to Finisar to 4900's to EDL and same result. I see this from a porterrshow on both switches see attached text file for outputs. Mainly what I see when i do look at the ports that are in use specifically for my backup system is Invalid Word, some Lr In etc... Oh and we have replaced HBA's on both ends and still getting the errors. I guess my real question is how in the world do you tell if the 4900 is going bad?
05-31-2010 06:57 PM
Hi Jason Girard
I couldn't open up your attachments (seem to be corrupt)
Do you know what ports your EMC library is connected to ?
If you do try running this command
Then monitor those ports to see if your getting any hard errors.
05-31-2010 08:27 PM
Sorry about that not sure why it tried to zip my txt files. Let me try attaching the porterrshow text files again. Actually I have the EDL hooked up to ports 1 and 3 for my Netbackup servers and ports 7 and 11 going to my Networker server. Also the stats that you see on the attached are over a 2 week period since the last time I cleared the logs for both fabric switches.
06-01-2010 02:03 AM
The enc out values are quite high on both switches which makes me suspect the patch panel if thats common to both switches.
Another aspect you may wanna try is fixing the port speeds depending on the intiator and target speeds.
If you attach supportshow of both switches and a simple schematic diagram, I will have a look for sure. No promise on whether I can find the root cause but will give it a try.
06-01-2010 06:10 AM
Attached you will find the supportshow's from both fabric switches. Also I see I left out on my original post that I did try hooking the server up directly to the 4900's and also the EDL directly to the 4900's totally bypassing all the fibre patch panels and I was still recieving errors. I appreciate you taking a look at this it has me so at a loss I really think that the 4900's are going bad but I have no way of proving it that I know of. They are about 4 years old at this point.
06-01-2010 08:03 AM
A dumb question but just to eliminate my doubt. Are
ports on the same controller or on different controllers???
I dont have any major breakthroughs but have doubts on these CX3-80 ports. It may be worthwhile to check the Clariion events as well.
06-01-2010 11:25 AM
Well not sure how a CX3-80 is actually built in that aspect but I do know that they are totally separate GBIC's in the back of the Clariion head unit. B0 and A0 have different WWN. THe A0 is on the A-side switch and B0 is on the Bside switch. Also there are no events logged on the Clariion it is clean too. Did I answer your question or does that just muddy the water even more?