03-25-2013 01:43 AM
On several, but not all, of our Hp blade switches, aka Brocade 5480, after an upgrade from 6.4.1 to 7.0.2b the switches reboots every other day because
the switch thinks it ran out of memory
HP supports only suggestion is to upgrade FOS to a newer version,but this 7.0.2a is supposed to be stable and is a target path release
Does anyone have some troubleshooting tips to find out why the switches run out of memory?
03-25-2013 05:09 AM
How often does the switch experiences this behavior? with command 'memshow' you can check the current status of the memory and review how fast the free memory decreases.
total used free shared buffers cached
Mem: 520437760 394846208 125591552 0 33353728 125112320
Swap: 0 0 0
In the errdump log you should be able to check what process caused the OOM situation and forced the reboot. Also, you can debug the content of supportsave file and check the core files created when there is a OOM situation.
on the other hand, a full reboot may help.
03-25-2013 05:44 AM
Nothing in the errdump points to a process..
2013/03/23-15:01:00, , 910, FFDC | CHASSIS, CRITICAL, SW5480, haReboot is automatically triggered for warm recovery from OOM.
2013/03/23-15:01:01, , 911, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.
2013/03/23-15:04:36, , 912, FFDC | CHASSIS, CRITICAL, SW5480, Rebooting the system for recovery - auto-reboot is enabled.
2013/03/23-15:04:36, , 913, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.
2013/03/23-15:05:43, , 914, CHASSIS, INFO, SW5480, Processor rebooted - reboot.
2013/03/23-15:05:51, , 915, CHASSIS, INFO, SW5480, SW/0 Ether/0 IPv4 DHCP 10.103.15.23/24 DHCP On.
2013/03/23-15:05:51, , 916, CHASSIS, INFO, SW5480, CP/0 IPv4 DHCP 10.103.15.254 DHCP On.
the switch restard every 3rd or 4th day,
Memshow shows only 19mb left, and it has been running for only 2 days.
other similar switches has at least 40-60 mb free memory.
I will look into the supportsave and see if I find something
2013/03/23-15:06:33, , 917, CHASSIS, INFO, SW5480, Initializing ports...
2013/03/23-15:06:33, , 918, CHASSIS, INFO, SW5480, Port initialization completed.
03-25-2013 10:45 AM
From my days supporting Brocade switches, this can potentially be a long and involved process.
I would recommend calling Brocade Support as this may require support access to the switch.
FYI: Out of memory *might* be indicating out of storage space (Compact Flash)...
03-26-2013 01:21 AM
with command supportsave -R you'll delete the core files an free up some space. Also, if you have root credentials, you can execute cleanup in order to delete the unused files in the switch.
03-27-2013 01:21 AM
It turned out to be the snmp daemon that took up all the memory.
I ran the top command, hit shift+f and the selected n to sort the processes with the highest memory usage on top, and on all the switches that rebooted the snmp daemon took up over 100mb, now the daemon takes 10-13 mb.
It should be good if the switch logged in the errdump which process had the highest memory usage at the time of the OOM reboot, maybe something for the next release???
03-27-2013 01:33 AM
hahaha, if it were that easy It wouldn't be fun!
On the other hand, i want to remember that with previous code release, FOS 6.x, a switch did not perform a hareboot when this happened, it peformed a full panic, and in that situations, the logs did report the daemon that raised the OOM situation. I suppose that Brocade has considered it better this way.
If the issue reoccurs, and you see the SNMP daemon eating up all the memory again, you should check the snmp applications accessing this switch, since an intensive polling could make the daemon consume high percentages of cpu and mem.