Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 7
Registered: ‎04-19-2006

Why is my switch "running slowly" ?

Hi all,

There seems to be a fair bit of commentary on the Web, but perhaps not quite deep enough to help, so here goes.

I have 16 blade centre SAN Switch Modules of varying vintages ( FOS 5.3.1 through 6.2.2 ). Most run fine, even if the FOS could do with upgrading ( another story, believe me )

However, of late, one or two are responding very slowly to access via Telnet and the issue of subsequent FOS commands.

I must emphasise that accessing the switches is not the problem. as I have admin, factory and root passwords and can log in successfully.

Commands can take up to a minute, maybe more, to return data, though they always work in the end.

By logging on to a well behaved module as root and running vmstat 1, I get output like this:

bc2san3:root> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
2  0      0   4648      0  30048    0    0     5     0    0    2  5  8 77 10
0  0      0   4648      0  30048    0    0     0     0    9  245 20  0 80  0
1  0      0   4656      0  30048    0    0     0     0   48  326  0  0 100  0
0  0      0   4656      0  30048    0    0     0     0    2  286  1  1 98  0
0  0      0   4656      0  30048    0    0     0     0    2  221  0  0 100  0
0  0      0   4656      0  30048    0    0     0     0    2  225  0  0 100  0

As you can see, CPU Idle time is always high, Wait time is always 0 and there are no blocked threads ( b column )

However, if I do the same thing on a misbehaving switch, I get this:

bc6san3:root> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
0  1      0   2704      0   8032    0    0     2     0    1    4  5  5 79 11
0  1      0   3244      0   7468    0    0   444     0 1021  282  0  1  0 99
0  1      0   4016      0   6724    0    0   156     0  314  310  0  1  0 99
0  1      0   4076      0   6588    0    0   708     0 1418  260  0  2  0 98
2  2      0   4556      0   6168    0    0   588     0 1183  304  0  5  0 95
0  1      0   3776      0   6904    0    0  1844     0 3733  392  7 28  0 65
0  1      0   4796      0   5892    0    0   268     0  538  324  0  1  0 99
0  1      0   5396      0   5284    0    0   140     0  282  288  0  4  0 96
5  4      0   5576      0   5156    0    0   500     0  747  286  1  6  0 93
0  1      0   4024      0   6736    0    0  2208     0 4685  309  1 29  0 70

In this case, CPU Idle time is always zero, Wait time is always high and there are blocked threads ( b column ). From a UNIX standpoint, not a healthy state of affairs.

However, is I run ps -ef on both switches, the list of processes is pretty much identical - no obvious runaway processes on the misbehaving switch.

bc6san3:root> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0  2011 ?        00:41:55 init
root         2     1  0  2011 ?        00:03:32
root         3     1  0  2011 ?        00:10:56
root         4     1  0  2011 ?        00:00:00
root         5     1  0  2011 ?        00:00:00
root        18     5  0  2011 ?        00:00:02
root        49     5  0  2011 ?        00:00:50
root        50     5  0  2011 ?        00:00:44
root        52     5  0  2011 ?        00:00:00
root        53     5  0  2011 ?        00:00:00
root        54     5  0  2011 ?        00:00:00
root        55     5  0  2011 ?        00:39:29
root        51     1  0  2011 ?        11:39:05
root       689     5  0  2011 ?        00:00:03
root       707     1  0  2011 ?        00:02:32 /sbin/wdtd -p 30 -n 5
root       766     5  0  2011 ?        00:00:03
bin        904     1  0  2011 ?        00:00:00 /sbin/portmap
root       930     1  0  2011 ?        00:00:00 /usr/sbin/inetd
root       954     1  0  2011 ?        00:18:34 /usr/sbin/syslogd -m 0
root       955     1  0  2011 ?        00:00:16 /usr/sbin/klogd -x
root       956     1  0  2011 ?        00:16:44 /usr/sbin/crond
root       969     1  0  2011 ?        00:06:27
root      1122     1  0  2011 ?        00:00:00
root      1197     1  0  2011 ?        00:37:18 /fabos/libexec/raslogd
root      1204     1  0  2011 ?        03:09:03 /fabos/libexec/ipadmd
root      1207     1  0  2011 ?        00:00:00 /fabos/libexec/telnetmond
root      1209     1  0  2011 ?        00:00:01 superd SWBD22 XLO
root      1217  1209  0  2011 ?        00:06:47 superd SWBD22 XLO
root      1265     1  0  2011 ttyS0    00:00:00 /sbin/getty -h ttyS0 console
root      1266     1  0  2011 ?        00:00:00 /sbin/getty -h ttyS1
root      1273     1  0  2011 ?        00:00:00 /usr/sbin/sshd
root      1285     1  0  2011 ?        00:15:08 /fabos/libexec/traced -p 22
root      1287  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1290  1209  0  2011 ?        00:00:21 superd SWBD22 XLO
root      1291  1209  0  2011 ?        00:01:25 superd SWBD22 XLO
root      1292  1209  0  2011 ?        00:00:00 porttestd -S chassis -s 0
root      1293  1209  0  2011 ?        08:27:05 superd SWBD22 XLO
root      1294  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1289     1  0  2011 ?        00:00:49
root      1298     1  0  2011 ?        02:36:14
root      1300  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1304  1209  0  2011 ?        00:00:27 superd SWBD22 XLO
root      1305  1209  0  2011 ?        04:52:08 superd SWBD22 XLO
root      1306  1209  0  2011 ?        01:01:24 superd SWBD22 XLO
root      1310  1209  0  2011 ?        01:49:33 superd SWBD22 XLO
root      1311  1209  0  2011 ?        02:57:42 superd SWBD22 XLO
root      1312  1209  0  2011 ?        00:01:36 superd SWBD22 XLO
root      1325  1209  0  2011 ?        04:14:22 superd SWBD22 XLO
root      1326  1209  0  2011 ?        02:29:38 superd SWBD22 XLO
root      1327  1209  0  2011 ?        03:31:01 superd SWBD22 XLO
root      1328  1209  0  2011 ?        00:00:24 superd SWBD22 XLO
root      1337  1209  0  2011 ?        02:46:25 superd SWBD22 XLO
root      1338  1209  0  2011 ?        01:26:55 superd SWBD22 XLO
root      1339  1209  0  2011 ?        00:00:16 superd SWBD22 XLO
root      1351  1209  0  2011 ?        00:32:31 superd SWBD22 XLO
root      1352  1209  0  2011 ?        00:00:32 superd SWBD22 XLO
root      1363  1209  0  2011 ?        00:00:18 superd SWBD22 XLO
root      1364  1209  0  2011 ?        00:00:22 superd SWBD22 XLO
root      1375  1209  0  2011 ?        02:45:42 superd SWBD22 XLO
root      1376  1209  0  2011 ?        22:36:22 snmpd -S fcsw -s 0
root      1377  1209  0  2011 ?        16:39:41 superd SWBD22 XLO
root      1385  1209  0  2011 ?        00:50:30 superd SWBD22 XLO
root     32287  1209  0  2011 ?        02:35:10 superd SWBD22 XLO
root     32599     1  0 Feb16 ?        00:02:35 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
root      7657  1209  1 Feb26 ?        01:08:25 superd SWBD22 XLO
nobody    8737 32599  0 11:19 ?        00:00:00 /usr/apache/bin/fcgi-pm -f /fabos/webtools/bin/httpd
root      8740  8737 10 11:19 ?        00:04:54 /fabos/webtools/htdocs/0.weblinker.fcg
root      9012   930  0 11:31 ?        00:00:07 in.telnetd: 10.21.5.230
root      9016  9012  0 11:31 ?        00:00:01 login -- root0
root      9020  9016  0 11:31 pts/0    00:00:03 -sh
nobody    9422 32599  0 11:55 ?        00:00:05 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
nobody    9528 32599  0 12:00 ?        00:00:02 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
nobody    9553 32599  1 12:02 ?        00:00:02 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
root      9603  9020 23 12:05 pts/0    00:00:01 ps -ef

It's worth noting that from a fibrechannel perspective, the switch is functioning normally and all blades connected to it are running problem-free.

I have already run supportsave -R but it hasn't helped.

As a reboot isn't practical, can anyone with deeper knowledge than me suggest where to go next?

Regular Contributor
Posts: 226
Registered: ‎01-08-2011

Re: Why is my switch "running slowly" ?

Hi, have you tried an 'hareboot' (only restart services)?

I have seen this several times and those times it was the weblinker / web service that used up all the CPU.

Does 'top' provide any clues?

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Why is my switch "running slowly" ?

hi,

take a look at errdump and errdumpall outputs, since you me get some clue about why the switch is misbehaving. It looks to me like a firmware issue, so I suggest you to upgrade the FOS code. Although an hareboot or a reboot may alleviate the situation, at least temporarily.

rgds

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

Click to Register
Download FREE NVMe eBook