Fibre Channel (SAN)

Reply
Regular Contributor
Posts: 166
Registered: ‎02-05-2014

Re: SCN saturation

According to the first line:

 

==============

We’ve been troubleshooting an issue lately where we have around 170 initiators zoned to the same 8 target ports on an HPE 3PAR array.

==================

 

Which to me means that I have 170 initiator mapped to all 8 ports.(170 to port A, 170 to port B, etc...)

 

The terminology you mentioned isn't 100% clear to me. You mention 1000 HBA's (which to me are actually 1000 physical cards). That is a massive difference opposed to 1000 NPIV initiators which may well be dispersed over multiple cards who are subsequently split and mapped to the 8/12 target ports yo mention.

 

It all depends on the stability of the environment related to workload, physical issues, protocol issues etc.

 

Very hard to diagnose.

Kind regards,
Erwin van Londen
Brocade Distinguished Architect
http://www.erwinvanlonden.net The Fibre Channel blog



Q&A -> https://hackhands.com/elo/


-------
Contributor
Posts: 55
Registered: ‎05-12-2013

Re: SCN saturation

After weeks of troubleshooting we got an FC analyzer and the issue i'm being told is thatthe 3PAR array is not getting a full copy of the nameserver table.  The array requests a copy of the nameserver and receives it but it appears to be truncated to significantly less than the full table.  This happens on both fabrics.  Also one major difference on one fabric is that when the arrays start querying the principal switch for the missing entries, the response takes a very long time and returns no initiator information.  On the B fabric, the query returns immediately with the initiator information.  This delay or timeout or missing information is causing the initiators to think they're missing paths.

Contributor
Posts: 55
Registered: ‎05-12-2013

Re: SCN saturation

I also should mention we jave been able to reduce fan in ratio to ~45:1
Contributor
Posts: 36
Registered: ‎01-19-2018

Re: SCN saturation

Do you have Brocade support involved on that process?
What FOS version are you running?
Do you have any fundamental differences between your fabrics A and B?
What number you see at the end of "nsallshow" output in your fabrics?
I understand that there are many things that happen for the first time in life, but again, I've seen much wider SAN deployments and they all worked fine...
Contributor
Posts: 55
Registered: ‎05-12-2013

Re: SCN saturation

Yeah, working with HPE & Brocade. They're still determining a root cause. We're seeing the 3PAR flood the switches with GA_NEXT requests, but not sure if that is contributing to the problem or not. I'll post updates as I get them.
Contributor
Posts: 55
Registered: ‎05-12-2013

Re: SCN saturation

This is a question maybe Erwin could answer..

 

When the array queries the name server to learn which devices it can access the name server replies with the device addresses.  it's my understanding that these addresses are returned in Words 12-17 of the frame payload.  If there are lots of devices will the name server keep sending CT frames with all the device addresses the array can communicate with or is there a limit to the number of devices the name server will advertise?  (Also are these reply frames typically the same size/length?)

 

If the latter, then I'm guessing the array will then start sending GA_NEXT commands for unknowns addresses? 

 

 

 

Highlighted
Contributor
Posts: 55
Registered: ‎05-12-2013

Re: SCN saturation

I wanted to close out this thread since it's always nice to have an ending...

 

In my scenario I have multiple HPE 3PAR arrays.  These arrays had "IOCTL" function running and the array was interrogating the Brocade Management server (FFFFFA) heavily.  The management server would become over loaded and would take an extended time to respond to the service calls.  This incurred latency and when the latencies were long enough, path recovery was invoked as well as hosts disabling paths.  Also the array could choose to implicilty log hosts out of the array if the name server response did not clarify the HOST status in a timely manner.

 

Another behavior that made the issue worse was our IBM servers were not registering the "FC Type" in the name server.  Also the arrays are using GA_NXT calls which increase load on the name server.

 

After disabling the ICOTL function on the arrays the environment became a lot more stable.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.