01-17-2016 08:53 AM
I am throwing this out fairly early in my diagnosis, but here we go:
Last Monday afternoon I upgraded BNA from 12.4.1 to 12.4.4. No immediately apparent issues, but I did notice the next day that the daily run of the Configuration Policy Manager did get different results as far as checks passed and failed. I had it trimmed down to 30 tests with a routine 28 passes and 2 fails; I was now getting 28 passes and 12 fails. No big deal, just something to look into some day.
In talking this morning to my VMWare administrators (who are my only 'customers') that they have been logging thousands of pathing errors every morning between ~8:05 and ~8:15 each morning. They are seeing BFA_AEN_RPORT_DISCONNECT errors during that window that started on Tuesday morning, and have recurred each morning since then. That time period is inside the execution window for the default policy manager run. They are seeing the errors on a number of hosts in the same datacenter, that are connected to a number of different storage units. There may be some commonality in that the hosts with issues are all likely attached to the same pair of Brocade 6520 switches, and possibly common 16Gb HBAs. As I said, I am posting this earlier that I might at other times; I am still doing some digging.
This morning I selected the policy and edited it to see if I could spot anything new, I closed out of that and I subsequently ran it 'manually'...twice. My manual runs are getting results that match what I had before the BNA upgrade, and not causing any pathing errors. So I don't know if there is a correlation or not.
What I have done for today is modify the run schedule to kick it off at 4:00 am rather than 8:00 am. I am thinking that either the errors will now go away just as quickly as they appeared, or that they will occur either at 4:00, indicting the BNA Configuration Policy Manager, or at 8:00 indicating that perhaps the problem is elsewhere.
What I am wondering is if the default settings may have been effected by the upgrade, and that somehow the hosts were being more vigorously scanned than before.
Questions, comments, or war stories would be welcome.
I will post back with tomorrows results, or lack of same.
01-20-2016 08:58 AM
I can report that the errors stopped occurring subsequent to my initial investigation.
Without being able to prove it, I can imply from that that the default policy manager scans that came in with 12.4.4 were somehow a bit too agressive for my environment, but that just looking at the settings restored my previous customizations.
I will note, for the future, that I should take some similar action every time I update BNA to hopefully prevent any recurrance of the issue.