03-07-2018 08:25 PM
According to the first line:
We’ve been troubleshooting an issue lately where we have around 170 initiators zoned to the same 8 target ports on an HPE 3PAR array.
Which to me means that I have 170 initiator mapped to all 8 ports.(170 to port A, 170 to port B, etc...)
The terminology you mentioned isn't 100% clear to me. You mention 1000 HBA's (which to me are actually 1000 physical cards). That is a massive difference opposed to 1000 NPIV initiators which may well be dispersed over multiple cards who are subsequently split and mapped to the 8/12 target ports yo mention.
It all depends on the stability of the environment related to workload, physical issues, protocol issues etc.
Very hard to diagnose.
04-07-2018 04:08 PM
After weeks of troubleshooting we got an FC analyzer and the issue i'm being told is thatthe 3PAR array is not getting a full copy of the nameserver table. The array requests a copy of the nameserver and receives it but it appears to be truncated to significantly less than the full table. This happens on both fabrics. Also one major difference on one fabric is that when the arrays start querying the principal switch for the missing entries, the response takes a very long time and returns no initiator information. On the B fabric, the query returns immediately with the initiator information. This delay or timeout or missing information is causing the initiators to think they're missing paths.
04-07-2018 11:29 PM
04-09-2018 08:04 AM
04-09-2018 09:37 AM
This is a question maybe Erwin could answer..
When the array queries the name server to learn which devices it can access the name server replies with the device addresses. it's my understanding that these addresses are returned in Words 12-17 of the frame payload. If there are lots of devices will the name server keep sending CT frames with all the device addresses the array can communicate with or is there a limit to the number of devices the name server will advertise? (Also are these reply frames typically the same size/length?)
If the latter, then I'm guessing the array will then start sending GA_NEXT commands for unknowns addresses?
05-09-2018 10:20 AM
I wanted to close out this thread since it's always nice to have an ending...
In my scenario I have multiple HPE 3PAR arrays. These arrays had "IOCTL" function running and the array was interrogating the Brocade Management server (FFFFFA) heavily. The management server would become over loaded and would take an extended time to respond to the service calls. This incurred latency and when the latencies were long enough, path recovery was invoked as well as hosts disabling paths. Also the array could choose to implicilty log hosts out of the array if the name server response did not clarify the HOST status in a timely manner.
Another behavior that made the issue worse was our IBM servers were not registering the "FC Type" in the name server. Also the arrays are using GA_NXT calls which increase load on the name server.
After disabling the ICOTL function on the arrays the environment became a lot more stable.