Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 5
Registered: ‎08-12-2008

Eight 3900 switches perform unscheduled reboot simultaneously

We have had an issue where eight 3900 SAN switches all suddenly rebooted at exactly the same time in two seperate fabrics (four in one fabric and four in the other)!!

In the error log on each switch we see the message:

Err 01 0x10339820 (tSwitch): Oct  2 04:12:59      INFO SYS-BOOT, 4, Restart reason: Watchdogport Command Group

And:

Reset reason 10l: Watchdog NMI

Our hardware support team have been unable to fully explain this event saying that there was some issue deep in the firmware, but we were wondering whether anyone else would have any idea what could have caused this issue.

External Moderator
Posts: 5,034
Registered: ‎02-23-2004

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Simon,

this was a Bug in the FabricOS at long Time ago, and the Defect as be Closed by Brocade.

Which FabricOS Rel.is loaded in this 8 Switches ? Tell me the EXACT Release PLEASE.

TechHelp24
Occasional Contributor
Posts: 5
Registered: ‎08-12-2008

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Thanks for your prompt reply.

No don't laugh but the FOS level on these switches is v3.2.0a !!

It is a very old Legacy environment.

External Moderator
Posts: 5,034
Registered: ‎02-23-2004

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Simon,

3900 don't support FabricOS 3.x

3900 are supported only by FabOS 4.x.x and 5.x ( latest 5.3.2c )

I assume you mean 3800 Switches ? can you please confirm ?

log from the command line to the switch, whit the command "switchshow" you see here in the Line

SwitchType:

9= 3800

10 = 3900

This reboot behavior was caused by a Old 4.x fabos, but i will check for 3.x

TechHelp24
Occasional Contributor
Posts: 5
Registered: ‎08-12-2008

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Whoops, sorry yes they are 3800 not 3900.

External Moderator
Posts: 5,034
Registered: ‎02-23-2004

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Indeed , 3800 are Oldtimer

One thing is sure, Watchdog NMI error is Timer caused ...like SNMP

Can you please check if all switch have the correct - current date and time -  ? the command is "date"

what say the command "errshow"

I must check if i can find other info about this error.

TechHelp24
Occasional Contributor
Posts: 5
Registered: ‎08-12-2008

Re: Eight 3900 switches perform unscheduled reboot simultaneously

The date/time are all in synch on the switches affected. They all use an NTP server to synchronise their time.

The errshow only shows the error events after the reboot. The first entry on each is:

Error 01
--------
0x103397c0 (tSwitch): Oct  2 04:12:58
    INFO SYS-BOOT, 4, Restart reason: Watchdog

External Moderator
Posts: 5,034
Registered: ‎02-23-2004

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Simon,

the only thing I've found is the bug with FabOS version 4.4.1a but not for the 3800.

This error was caused as mentioned by SNMP crash.

I can think of that can be a same errors, but this version is no longer fully supportet and I do not think if is a Bug, Brocade

corrects this with an update in the future.

I have no other idea, maybe someone else here know this error.

TechHelp24
Occasional Contributor
Posts: 5
Registered: ‎08-12-2008

Re: Eight 3900 switches perform unscheduled reboot simultaneously

Hi there,

Just though I would give you an update on this issue.

We traced the root cause to some network vunerability scans that mistakenly targeted our SAN management subnet. These scans attempted to login to our switches repeatedly and caused the tHttp daemon to hang and the switches to perform a watchdog reboot. We also have the added complication that a lot of the switches are aggregated to a network hub which was the reason for the simultaneous reboots when the network scan hit the hub.

Thanks for your assistance

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook