Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 10
Registered: ‎06-15-2011

Out of memory errors after 7.0.2a upgrade

Hello

On several, but not all, of our Hp blade switches, aka Brocade 5480, after an upgrade from 6.4.1 to 7.0.2b the switches reboots every other day because

the switch thinks it ran out of memory

2013/03/23-11:37:37, , 721, FFDC | CHASSIS, CRITICAL, SW5480, haReboot is automatically triggered for warm recovery from OOM

HP supports only suggestion is to upgrade FOS to a newer version,but this 7.0.2a is supposed to be stable and is a target path release

Does anyone have some troubleshooting tips to find out why the switches run out of memory?

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Out of memory errors after 7.0.2a upgrade

Hi,

How often does the switch experiences this behavior? with command 'memshow' you can check the current status of the memory and review how fast the free memory decreases.

SWITCH:admin> memshow

            total      used      free   shared    buffers    cached

Mem:    520437760  394846208  125591552         0  33353728  125112320

Swap:            0          0          0

In the errdump log you should be able to check what process caused the OOM situation and forced the reboot. Also, you can debug the content of supportsave file and check the core files created when there is a OOM situation.

on the other hand, a full reboot may help.

Kind regards,

Felipon

Occasional Contributor
Posts: 10
Registered: ‎06-15-2011

Re: Out of memory errors after 7.0.2a upgrade

Nothing in the errdump points to a process..

2013/03/23-15:01:00, , 910, FFDC | CHASSIS, CRITICAL, SW5480, haReboot is automatically triggered for warm recovery from OOM.

2013/03/23-15:01:01, , 911, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.

2013/03/23-15:04:36, , 912, FFDC | CHASSIS, CRITICAL, SW5480, Rebooting the system for recovery - auto-reboot is enabled.

2013/03/23-15:04:36, , 913, CHASSIS, INFO, SW5480, First failure data capture (FFDC) event occurred.

2013/03/23-15:05:43, , 914, CHASSIS, INFO, SW5480, Processor rebooted - reboot.

2013/03/23-15:05:51, , 915, CHASSIS, INFO, SW5480, SW/0 Ether/0 IPv4 DHCP 10.103.15.23/24 DHCP On.

2013/03/23-15:05:51, , 916, CHASSIS, INFO, SW5480, CP/0 IPv4 DHCP 10.103.15.254 DHCP On.

the switch restard every 3rd or 4th day,

Memshow shows only 19mb left, and it has been running for only 2 days.

other similar switches has at least 40-60 mb free memory.

I will look into the supportsave and see if I find something

2013/03/23-15:06:33, , 917, CHASSIS, INFO, SW5480, Initializing ports...

2013/03/23-15:06:33, , 918, CHASSIS, INFO, SW5480, Port initialization completed.

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Out of memory errors after 7.0.2a upgrade

Hi,

In the supportsave, in the RAS_POST, you'll find the output of errdumpall command, that may provide some additional info.

Contributor
Posts: 26
Registered: ‎09-11-2012

Re: Out of memory errors after 7.0.2a upgrade

From my days supporting Brocade switches, this can potentially be a long and involved process.

I would recommend calling Brocade Support as this may require support access to the switch.

FYI: Out of memory *might* be indicating out of storage space (Compact Flash)...

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Out of memory errors after 7.0.2a upgrade

's right. in order to check the free space of  the flash storage, you can review the content of xxx.SSHOW_NET.tar.gz inside the supportsave and look into the output of command df.

with command supportsave -R you'll delete the core files an free up some space. Also, if you have root credentials, you can execute cleanup in order to delete the unused files in the switch.

rgds

Occasional Contributor
Posts: 10
Registered: ‎06-15-2011

Re: Out of memory errors after 7.0.2a upgrade

It turned out to be the snmp daemon that took up all the memory.

I ran the top command, hit shift+f and the selected n to sort the processes with the highest memory usage on top, and on all the switches that rebooted the snmp daemon took up over 100mb, now the daemon takes 10-13 mb.

It should be good if the switch logged in the errdump which process had the highest memory usage at the time of the OOM reboot, maybe something for the next release???

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Out of memory errors after 7.0.2a upgrade

hahaha, if it were that easy It wouldn't be fun!

On the other hand, i want to remember that with previous code release, FOS 6.x, a switch did not perform a hareboot when this happened, it peformed a full panic, and in that situations, the logs did report the daemon that raised the OOM situation. I suppose that Brocade has considered it better this way.

If the issue reoccurs, and you see the SNMP daemon eating up all the memory again, you should check the snmp applications accessing this switch, since an intensive polling could make the daemon consume high percentages of cpu and mem.

Rgds

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

Click to Register
Download FREE NVMe eBook