Fibre Channel (SAN)

Reply
Contributor
Posts: 21
Registered: ‎04-23-2008

What could i do to solve (workaround) "slow drain devices" problem?

Hi All,

Lately i'm dealing with some slow drain devices in mine SAN. Basicaly it comes from a AIX host running SAS (Business Analytics software) and i would like to hear any comments/sugestion in order to avoid/work around this problem, cause it is impacting another hosts.

I thought in some things to do in our SAN standpoint (switch commands) to prevent that this bad behavior do not impact another hosts in this SAN:

- change the port speed to 1 GB (actual is 2 GB);

- implement some kind of QOS (Ingress Rate Limiting), does it is possible in Brocade 48K?;

Below i'm sending an overview of our current enviroment:

- Brocade 48K (Core-Edge)

- Fabric OS: v6.3.0b

Some outputs:

SW_xpto:admin> portstatsshow 10/7
stat_wtx                924033164  4-byte words transmitted
stat_wrx                4279649732  4-byte words received
stat_ftx                291058435  Frames transmitted
stat_frx                445573076  Frames received
stat_c2_frx            0          Class 2 frames received
stat_c3_frx            445573076  Class 3 frames received
stat_lc_rx              0          Link control frames received
stat_mc_rx              0          Multicast frames received
stat_mc_to              0          Multicast timeouts
stat_mc_tx              0          Multicast frames transmitted
tim_rdy_pri            0          Time R_RDY high priority
tim_txcrd_z            9234748    Time BB credit zero (2.5Us ticks)
er_enc_in              0          Encoding errors inside of frames
er_crc                  0          Frames with CRC errors
er_trunc                0          Frames shorter than minimum
er_toolong              0          Frames longer than maximum
er_bad_eof              0          Frames with bad end-of-frame
er_enc_out              0          Encoding error outside of frames
er_bad_os              0          Invalid ordered set
er_rx_c3_timeout        0          Class 3 receive frames discarded due to timeout
er_c3_dest_unreach      0          Class 3 frames discarded due to destination unreachable
er_other_discard        0          Other discards
er_zone_discard        0          Class 3 frames discarded due to zone mismatch
er_crc_good_eof        0          Crc error with good eof
er_inv_arb              0          Invalid ARB
open                    0          loop_open
transfer                0          loop_transfer
opened                  0          FL_Port opened
starve_stop            0          tenancies stopped due to starvation
fl_tenancy              0          number of times FL has the tenancy
nl_tenancy              0          number of times NL has the tenancy
zero_tenancy            0          zero tenancy

SW_xpto:admin> bottleneckmon --show -interval 5 -span 300 10/7
=============================================================
Tue Apr 27 12:22:14 Localtime 2010
=============================================================
      Percentage of
From   To   affected secs
=============================================================
Apr 27 12:22:09  Apr 27 12:22:14  100.00%
Apr 27 12:22:04  Apr 27 12:22:09  80.00%
Apr 27 12:21:59  Apr 27 12:22:04  100.00%
Apr 27 12:21:54  Apr 27 12:21:59  60.00%
Apr 27 12:21:49  Apr 27 12:21:54  80.00%
Apr 27 12:21:44  Apr 27 12:21:49  40.00%
Apr 27 12:21:39  Apr 27 12:21:44  60.00%
Apr 27 12:21:34  Apr 27 12:21:39  80.00%
Apr 27 12:21:29  Apr 27 12:21:34  60.00%
Apr 27 12:21:24  Apr 27 12:21:29  100.00%
Apr 27 12:21:19  Apr 27 12:21:24  40.00%
Apr 27 12:21:14  Apr 27 12:21:19  100.00%
Apr 27 12:21:09  Apr 27 12:21:14  60.00%
Apr 27 12:21:04  Apr 27 12:21:09  60.00%
Apr 27 12:20:59  Apr 27 12:21:04  20.00%
Apr 27 12:20:54  Apr 27 12:20:59  40.00%
Apr 27 12:20:49  Apr 27 12:20:54  40.00%
Apr 27 12:20:44  Apr 27 12:20:49  100.00%
Apr 27 12:20:39  Apr 27 12:20:44  0.00%
Apr 27 12:20:34  Apr 27 12:20:39  60.00%
Apr 27 12:20:29  Apr 27 12:20:34  80.00%
Apr 27 12:20:24  Apr 27 12:20:29  100.00%
Apr 27 12:20:19  Apr 27 12:20:24  40.00%
Apr 27 12:20:14  Apr 27 12:20:19  80.00%
Apr 27 12:20:09  Apr 27 12:20:14  75.00% (no data for 1 seconds)
Apr 27 12:20:04  Apr 27 12:20:09  60.00%
Apr 27 12:19:59  Apr 27 12:20:04  80.00%
Apr 27 12:19:54  Apr 27 12:19:59  40.00%
Apr 27 12:19:49  Apr 27 12:19:54  80.00%
Apr 27 12:19:44  Apr 27 12:19:49  80.00%
Apr 27 12:19:39  Apr 27 12:19:44  60.00%
Apr 27 12:19:34  Apr 27 12:19:39  60.00%
Apr 27 12:19:29  Apr 27 12:19:34  40.00%
Apr 27 12:19:24  Apr 27 12:19:29  60.00%
Apr 27 12:19:19  Apr 27 12:19:24  40.00%
Apr 27 12:19:14  Apr 27 12:19:19  40.00%
Apr 27 12:19:09  Apr 27 12:19:14  20.00%
Apr 27 12:19:04  Apr 27 12:19:09  80.00%
Apr 27 12:18:59  Apr 27 12:19:04  20.00%
Apr 27 12:18:54  Apr 27 12:18:59  0.00%
Apr 27 12:18:49  Apr 27 12:18:54  40.00%
Apr 27 12:18:44  Apr 27 12:18:49  100.00%
Apr 27 12:18:39  Apr 27 12:18:44  60.00%
Apr 27 12:18:34  Apr 27 12:18:39  60.00%
Apr 27 12:18:29  Apr 27 12:18:34  60.00%
Apr 27 12:18:24  Apr 27 12:18:29  100.00%
Apr 27 12:18:19  Apr 27 12:18:24  20.00%
Apr 27 12:18:14  Apr 27 12:18:19  40.00%
Apr 27 12:18:09  Apr 27 12:18:14  80.00%
Apr 27 12:18:04  Apr 27 12:18:09  100.00%
Apr 27 12:17:59  Apr 27 12:18:04  100.00%
Apr 27 12:17:54  Apr 27 12:17:59  100.00%
Apr 27 12:17:49  Apr 27 12:17:54  80.00%
Apr 27 12:17:44  Apr 27 12:17:49  100.00%
Apr 27 12:17:39  Apr 27 12:17:44  60.00%
Apr 27 12:17:34  Apr 27 12:17:39  100.00%
Apr 27 12:17:29  Apr 27 12:17:34  100.00%
Apr 27 12:17:24  Apr 27 12:17:29  100.00%
Apr 27 12:17:19  Apr 27 12:17:24  60.00%
Apr 27 12:17:14  Apr 27 12:17:19  80.00%

Please guys, everything will be really appreciated.

Regards,

Daniel Volochen

Contributor
Posts: 44
Registered: ‎05-22-2009

Re: What could i do to solve (workaround) "slow drain devices" problem?

Ahhhh, the slow drain devices...always the most difficult.

Your attempts were correct, sometimes you can hardcode the port to a lower speed.

You could potentially use ingress rate limiting, but it is a licensed feature under Advance Performance Monitor License

From my own experience, try this:

Do a portstatsclear and slotstatsclear.

Then run portstatsshow (portnumber) and record the tm_buffer-credit=zero and the class 3 frame discards

You should be at zero if you did the statsclear

Now give it 5 minutes and run portstatsshow (portnumber)

and see what the deltas are

If you see it incrementing in the millions or billions, you definitely have a problem.

Also, are your devices traversing an ISL?

Some common slow-drain causes are:

Misbehaving device drivers

Incorrectly configured or misbehaving application software

Faulty hardware

If you would be willing to post your portstatsshow output after running the statsclear commands, it might provide a bit more insight.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.