Fibre Channel (SAN)

Reply
Contributor
Posts: 28
Registered: ‎11-11-2010

DCX CPU usage 100%

Hello All,

I have 2 fabrics (A&B) consist of 1 unit of DCX as core and 2 units of 12K as edge switch. These switches are registered in DCFM. DCX running on FOS v6.3.2d

yesterday I was running command sysmonitor --show cpu at DCX Fabric A to know current CPU utilization (this is the 1st time I run this command). then the result it beyond my expectation because the CPU usage are 100%. this is log that i captured:

admin> sysmonitor --show cpu

Showing Cpu Usage:

    Cpu Usage            : 100%

    Cpu Usage limit      : 75%

    Number of Retries    : 3

    Polling Interval     : 120 seconds

    Actions              : none

at the other fabric dcx cpu util just 6%

There are no error message at errdump. So i am so worried of my DCX status, Is it just a defect of the FOS or this is the real CPU usage??

DCX spec:

20disc port, 9 host port, available 67ports (total port 96 ports)

note: this fabric located at DRC site, so the IO is not so big..

i also capture the top command via root

root> top

top - 15:18:17 up 112 days, 23:38,  1 user,  load average: 2.82, 2.51, 2.30

Tasks: 110 total,   3 running, 107 sleeping,   0 stopped,   0 zombie

Cpu(s): 42.0%us, 57.3%sy,  0.3%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Mem:   1863188k total,  1140520k used,   722668k free,    33344k buffers

Swap:        0k total,        0k used,        0k free,   786368k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

3113 root      25   0 28108 4016 3376 R 94.8  0.2  11225:22 tracestore

4364 root      16   0  139m  84m  80m S  2.3  4.7   2891:40 iswitchd

2859 root      16   0 72688 4492 3300 S  1.7  0.2   2595:11 emd

4358 root      33  18 64732 7532 3572 S  1.0  0.4   1165:59 fwd

    1 root      16   0  1696  592  524 S  0.0  0.0   0:31.96 init

    2 root      34  19     0    0    0 S  0.0  0.0   0:03.51 ksoftirqd/0

    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.03 events/0

    4 root      19  -5     0    0    0 S  0.0  0.0   0:00.02 khelper

    5 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kthread

   29 root      10  -5     0    0    0 S  0.0  0.0   0:00.17 kblockd/0

   62 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush

   63 root      15   0     0    0    0 S  0.0  0.0   0:00.51 pdflush

   65 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0

   64 root      25   0     0    0    0 S  0.0  0.0   0:00.00 kswapd0

  756 root      15   0     0    0    0 S  0.0  0.0   0:07.14 kjournald

  774 root      RT   0  1676  400  336 S  0.0  0.0   0:00.02 wdtd

  835 root      15   0     0    0    0 S  0.0  0.0   0:02.36 kjournald

  991 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 eth2/0

1002 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 eth1/0

1023 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 eth0/0

1025 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 eth3/0

1038 bin       16   0  1688  428  336 S  0.0  0.0   0:00.33 portmap

1058 root      16   0  2116  652  508 S  0.0  0.0   0:00.02 inetd

1063 root      15   0     0    0    0 S  0.0  0.0   0:00.00 nfsd

1064 root      15   0     0    0    0 S  0.0  0.0   0:00.00 nfsd

1065 root      15   0     0    0    0 S  0.0  0.0   0:00.00 nfsd

1066 root      23   0     0    0    0 S  0.0  0.0   0:00.00 lockd

1067 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 rpciod/0

1068 root      15   0     0    0    0 S  0.0  0.0   0:00.01 nfsd

1070 root      16   0  2336  556  420 S  0.0  0.0   0:00.01 rpc.mountd

1084 root      25   0  2552 1088  916 S  0.0  0.1   0:55.60 kmsghandler

1098 root      16   0  1700  376  304 S  0.0  0.0   0:11.76 klogd

1099 root      15   0  1808  620  528 S  0.0  0.0   0:04.04 crond

1106 root      16   0  1944  680  532 S  0.0  0.0   0:07.81 syslogd

1128 root      15   0     0    0    0 S  0.0  0.0   0:00.11 RASLOGK_TH

1926 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kwt_nb_thread

2200 root      19   0     0    0    0 S  0.0  0.0   0:00.00 module-182-th

2208 root      15   0     0    0    0 S  0.0  0.0  10:53.57 module-99-th

2230 root      19   0     0    0    0 S  0.0  0.0   0:00.01 module-107-th

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: DCX CPU usage 100%

Hi there,

That looks like DEFECT000363516 to me. It seems to be fixed in FOS 7.0.1

Rgds

Contributor
Posts: 28
Registered: ‎11-11-2010

Re: DCX CPU usage 100%

Oh i see.. but is there any different action beside upgrading to FOS v7.0.1 because the DCX is connected to 12K SAN Switch which prohibited to direct connection with DCX FOS v7.0

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: DCX CPU usage 100%

you could try to failover to the Standby CP.

Rgds

External Moderator
Posts: 4,974
Registered: ‎02-23-2004

Re: DCX CPU usage 100%

azakiyy,

->but is there any different action beside upgrading to FOS v7.0.1 because the DCX is connected to 12K SAN Switch which prohibited to direct connection with DCX FOS v7.0

This is correct, but if you want to continued to work with 12K then you can implement FCR on DCX, trough Integrated Routing, and upgrade to latest FOS 7.1.x release.

Keep in mind IR is optional License.

TechHelp24
Contributor
Posts: 28
Registered: ‎11-11-2010

Re: DCX CPU usage 100%

i should get permission 1st from my customer.. i'll inform later

Contributor
Posts: 28
Registered: ‎11-11-2010

Re: DCX CPU usage 100%

I'm afraid i cant do the FOS upgrade for DCX coz I should stick with this topology (core edge)

Is it safe to kill the most consume service (PID 3113/tracestore)?

Is it possible to use command "kill -9 (PID)"?

anyone know what tracestore stand for??

Valued Contributor
Posts: 931
Registered: ‎12-30-2009

Re: DCX CPU usage 100%

If your active CP is at 100%.

Then try the ha failover as suggested by felipon.

If your now passive CP (assuming the fail over was successful) still is at 100% CPU, you can also reboot that CP.

Contributor
Posts: 28
Registered: ‎11-11-2010

Re: DCX CPU usage 100%

Thanks all for ur reply,

i have escalated the problem to support and they replied with same recommendations.

1. try to hafailover the active CP (Check the CPU utit)

2. if the DCX still has 100% CPU then try to hafailover again (My problem fix in this step)

3. use kill -9 pid 3113 (3113 is the PID of tracestore)


note: hafailover is disruptive, do it in less IO time



Valued Contributor
Posts: 931
Registered: ‎12-30-2009

Re: DCX CPU usage 100%

Great that you've got support telling the almost same thing.

However I don't believe hafailover is disruptive to IO traffic.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.