Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 7
Registered: ‎04-19-2006

Why is my switch "running slowly" ?

Hi all,

There seems to be a fair bit of commentary on the Web, but perhaps not quite deep enough to help, so here goes.

I have 16 blade centre SAN Switch Modules of varying vintages ( FOS 5.3.1 through 6.2.2 ). Most run fine, even if the FOS could do with upgrading ( another story, believe me )

However, of late, one or two are responding very slowly to access via Telnet and the issue of subsequent FOS commands.

I must emphasise that accessing the switches is not the problem. as I have admin, factory and root passwords and can log in successfully.

Commands can take up to a minute, maybe more, to return data, though they always work in the end.

By logging on to a well behaved module as root and running vmstat 1, I get output like this:

bc2san3:root> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
2  0      0   4648      0  30048    0    0     5     0    0    2  5  8 77 10
0  0      0   4648      0  30048    0    0     0     0    9  245 20  0 80  0
1  0      0   4656      0  30048    0    0     0     0   48  326  0  0 100  0
0  0      0   4656      0  30048    0    0     0     0    2  286  1  1 98  0
0  0      0   4656      0  30048    0    0     0     0    2  221  0  0 100  0
0  0      0   4656      0  30048    0    0     0     0    2  225  0  0 100  0

As you can see, CPU Idle time is always high, Wait time is always 0 and there are no blocked threads ( b column )

However, if I do the same thing on a misbehaving switch, I get this:

bc6san3:root> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
0  1      0   2704      0   8032    0    0     2     0    1    4  5  5 79 11
0  1      0   3244      0   7468    0    0   444     0 1021  282  0  1  0 99
0  1      0   4016      0   6724    0    0   156     0  314  310  0  1  0 99
0  1      0   4076      0   6588    0    0   708     0 1418  260  0  2  0 98
2  2      0   4556      0   6168    0    0   588     0 1183  304  0  5  0 95
0  1      0   3776      0   6904    0    0  1844     0 3733  392  7 28  0 65
0  1      0   4796      0   5892    0    0   268     0  538  324  0  1  0 99
0  1      0   5396      0   5284    0    0   140     0  282  288  0  4  0 96
5  4      0   5576      0   5156    0    0   500     0  747  286  1  6  0 93
0  1      0   4024      0   6736    0    0  2208     0 4685  309  1 29  0 70

In this case, CPU Idle time is always zero, Wait time is always high and there are blocked threads ( b column ). From a UNIX standpoint, not a healthy state of affairs.

However, is I run ps -ef on both switches, the list of processes is pretty much identical - no obvious runaway processes on the misbehaving switch.

bc6san3:root> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0  2011 ?        00:41:55 init
root         2     1  0  2011 ?        00:03:32
root         3     1  0  2011 ?        00:10:56
root         4     1  0  2011 ?        00:00:00
root         5     1  0  2011 ?        00:00:00
root        18     5  0  2011 ?        00:00:02
root        49     5  0  2011 ?        00:00:50
root        50     5  0  2011 ?        00:00:44
root        52     5  0  2011 ?        00:00:00
root        53     5  0  2011 ?        00:00:00
root        54     5  0  2011 ?        00:00:00
root        55     5  0  2011 ?        00:39:29
root        51     1  0  2011 ?        11:39:05
root       689     5  0  2011 ?        00:00:03
root       707     1  0  2011 ?        00:02:32 /sbin/wdtd -p 30 -n 5
root       766     5  0  2011 ?        00:00:03
bin        904     1  0  2011 ?        00:00:00 /sbin/portmap
root       930     1  0  2011 ?        00:00:00 /usr/sbin/inetd
root       954     1  0  2011 ?        00:18:34 /usr/sbin/syslogd -m 0
root       955     1  0  2011 ?        00:00:16 /usr/sbin/klogd -x
root       956     1  0  2011 ?        00:16:44 /usr/sbin/crond
root       969     1  0  2011 ?        00:06:27
root      1122     1  0  2011 ?        00:00:00
root      1197     1  0  2011 ?        00:37:18 /fabos/libexec/raslogd
root      1204     1  0  2011 ?        03:09:03 /fabos/libexec/ipadmd
root      1207     1  0  2011 ?        00:00:00 /fabos/libexec/telnetmond
root      1209     1  0  2011 ?        00:00:01 superd SWBD22 XLO
root      1217  1209  0  2011 ?        00:06:47 superd SWBD22 XLO
root      1265     1  0  2011 ttyS0    00:00:00 /sbin/getty -h ttyS0 console
root      1266     1  0  2011 ?        00:00:00 /sbin/getty -h ttyS1
root      1273     1  0  2011 ?        00:00:00 /usr/sbin/sshd
root      1285     1  0  2011 ?        00:15:08 /fabos/libexec/traced -p 22
root      1287  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1290  1209  0  2011 ?        00:00:21 superd SWBD22 XLO
root      1291  1209  0  2011 ?        00:01:25 superd SWBD22 XLO
root      1292  1209  0  2011 ?        00:00:00 porttestd -S chassis -s 0
root      1293  1209  0  2011 ?        08:27:05 superd SWBD22 XLO
root      1294  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1289     1  0  2011 ?        00:00:49
root      1298     1  0  2011 ?        02:36:14
root      1300  1209  0  2011 ?        00:00:00 superd SWBD22 XLO
root      1304  1209  0  2011 ?        00:00:27 superd SWBD22 XLO
root      1305  1209  0  2011 ?        04:52:08 superd SWBD22 XLO
root      1306  1209  0  2011 ?        01:01:24 superd SWBD22 XLO
root      1310  1209  0  2011 ?        01:49:33 superd SWBD22 XLO
root      1311  1209  0  2011 ?        02:57:42 superd SWBD22 XLO
root      1312  1209  0  2011 ?        00:01:36 superd SWBD22 XLO
root      1325  1209  0  2011 ?        04:14:22 superd SWBD22 XLO
root      1326  1209  0  2011 ?        02:29:38 superd SWBD22 XLO
root      1327  1209  0  2011 ?        03:31:01 superd SWBD22 XLO
root      1328  1209  0  2011 ?        00:00:24 superd SWBD22 XLO
root      1337  1209  0  2011 ?        02:46:25 superd SWBD22 XLO
root      1338  1209  0  2011 ?        01:26:55 superd SWBD22 XLO
root      1339  1209  0  2011 ?        00:00:16 superd SWBD22 XLO
root      1351  1209  0  2011 ?        00:32:31 superd SWBD22 XLO
root      1352  1209  0  2011 ?        00:00:32 superd SWBD22 XLO
root      1363  1209  0  2011 ?        00:00:18 superd SWBD22 XLO
root      1364  1209  0  2011 ?        00:00:22 superd SWBD22 XLO
root      1375  1209  0  2011 ?        02:45:42 superd SWBD22 XLO
root      1376  1209  0  2011 ?        22:36:22 snmpd -S fcsw -s 0
root      1377  1209  0  2011 ?        16:39:41 superd SWBD22 XLO
root      1385  1209  0  2011 ?        00:50:30 superd SWBD22 XLO
root     32287  1209  0  2011 ?        02:35:10 superd SWBD22 XLO
root     32599     1  0 Feb16 ?        00:02:35 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
root      7657  1209  1 Feb26 ?        01:08:25 superd SWBD22 XLO
nobody    8737 32599  0 11:19 ?        00:00:00 /usr/apache/bin/fcgi-pm -f /fabos/webtools/bin/httpd
root      8740  8737 10 11:19 ?        00:04:54 /fabos/webtools/htdocs/0.weblinker.fcg
root      9012   930  0 11:31 ?        00:00:07 in.telnetd: 10.21.5.230
root      9016  9012  0 11:31 ?        00:00:01 login -- root0
root      9020  9016  0 11:31 pts/0    00:00:03 -sh
nobody    9422 32599  0 11:55 ?        00:00:05 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
nobody    9528 32599  0 12:00 ?        00:00:02 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
nobody    9553 32599  1 12:02 ?        00:00:02 /usr/apache/bin/httpd.0 -f /fabos/webtools/bin/httpd
root      9603  9020 23 12:05 pts/0    00:00:01 ps -ef

It's worth noting that from a fibrechannel perspective, the switch is functioning normally and all blades connected to it are running problem-free.

I have already run supportsave -R but it hasn't helped.

As a reboot isn't practical, can anyone with deeper knowledge than me suggest where to go next?

Regular Contributor
Posts: 226
Registered: ‎01-08-2011

Re: Why is my switch "running slowly" ?

Hi, have you tried an 'hareboot' (only restart services)?

I have seen this several times and those times it was the weblinker / web service that used up all the CPU.

Does 'top' provide any clues?

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Why is my switch "running slowly" ?

hi,

take a look at errdump and errdumpall outputs, since you me get some clue about why the switch is misbehaving. It looks to me like a firmware issue, so I suggest you to upgrade the FOS code. Although an hareboot or a reboot may alleviate the situation, at least temporarily.

rgds

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.