Fibre Channel (SAN)

Reply
Contributor
Posts: 21
Registered: ‎04-23-2008

What could i do to solve (workaround) "slow drain devices" problem?

Hi All,

Lately i'm dealing with some slow drain devices in mine SAN. Basicaly it comes from a AIX host running SAS (Business Analytics software) and i would like to hear any comments/sugestion in order to avoid/work around this problem, cause it is impacting another hosts.

I thought in some things to do in our SAN standpoint (switch commands) to prevent that this bad behavior do not impact another hosts in this SAN:

- change the port speed to 1 GB (actual is 2 GB);

- implement some kind of QOS (Ingress Rate Limiting), does it is possible in Brocade 48K?;

Below i'm sending an overview of our current enviroment:

- Brocade 48K (Core-Edge)

- Fabric OS: v6.3.0b

Some outputs:

SW_xpto:admin> portstatsshow 10/7
stat_wtx                924033164  4-byte words transmitted
stat_wrx                4279649732  4-byte words received
stat_ftx                291058435  Frames transmitted
stat_frx                445573076  Frames received
stat_c2_frx            0          Class 2 frames received
stat_c3_frx            445573076  Class 3 frames received
stat_lc_rx              0          Link control frames received
stat_mc_rx              0          Multicast frames received
stat_mc_to              0          Multicast timeouts
stat_mc_tx              0          Multicast frames transmitted
tim_rdy_pri            0          Time R_RDY high priority
tim_txcrd_z            9234748    Time BB credit zero (2.5Us ticks)
er_enc_in              0          Encoding errors inside of frames
er_crc                  0          Frames with CRC errors
er_trunc                0          Frames shorter than minimum
er_toolong              0          Frames longer than maximum
er_bad_eof              0          Frames with bad end-of-frame
er_enc_out              0          Encoding error outside of frames
er_bad_os              0          Invalid ordered set
er_rx_c3_timeout        0          Class 3 receive frames discarded due to timeout
er_c3_dest_unreach      0          Class 3 frames discarded due to destination unreachable
er_other_discard        0          Other discards
er_zone_discard        0          Class 3 frames discarded due to zone mismatch
er_crc_good_eof        0          Crc error with good eof
er_inv_arb              0          Invalid ARB
open                    0          loop_open
transfer                0          loop_transfer
opened                  0          FL_Port opened
starve_stop            0          tenancies stopped due to starvation
fl_tenancy              0          number of times FL has the tenancy
nl_tenancy              0          number of times NL has the tenancy
zero_tenancy            0          zero tenancy

SW_xpto:admin> bottleneckmon --show -interval 5 -span 300 10/7
=============================================================
Tue Apr 27 12:22:14 Localtime 2010
=============================================================
      Percentage of
From   To   affected secs
=============================================================
Apr 27 12:22:09  Apr 27 12:22:14  100.00%
Apr 27 12:22:04  Apr 27 12:22:09  80.00%
Apr 27 12:21:59  Apr 27 12:22:04  100.00%
Apr 27 12:21:54  Apr 27 12:21:59  60.00%
Apr 27 12:21:49  Apr 27 12:21:54  80.00%
Apr 27 12:21:44  Apr 27 12:21:49  40.00%
Apr 27 12:21:39  Apr 27 12:21:44  60.00%
Apr 27 12:21:34  Apr 27 12:21:39  80.00%
Apr 27 12:21:29  Apr 27 12:21:34  60.00%
Apr 27 12:21:24  Apr 27 12:21:29  100.00%
Apr 27 12:21:19  Apr 27 12:21:24  40.00%
Apr 27 12:21:14  Apr 27 12:21:19  100.00%
Apr 27 12:21:09  Apr 27 12:21:14  60.00%
Apr 27 12:21:04  Apr 27 12:21:09  60.00%
Apr 27 12:20:59  Apr 27 12:21:04  20.00%
Apr 27 12:20:54  Apr 27 12:20:59  40.00%
Apr 27 12:20:49  Apr 27 12:20:54  40.00%
Apr 27 12:20:44  Apr 27 12:20:49  100.00%
Apr 27 12:20:39  Apr 27 12:20:44  0.00%
Apr 27 12:20:34  Apr 27 12:20:39  60.00%
Apr 27 12:20:29  Apr 27 12:20:34  80.00%
Apr 27 12:20:24  Apr 27 12:20:29  100.00%
Apr 27 12:20:19  Apr 27 12:20:24  40.00%
Apr 27 12:20:14  Apr 27 12:20:19  80.00%
Apr 27 12:20:09  Apr 27 12:20:14  75.00% (no data for 1 seconds)
Apr 27 12:20:04  Apr 27 12:20:09  60.00%
Apr 27 12:19:59  Apr 27 12:20:04  80.00%
Apr 27 12:19:54  Apr 27 12:19:59  40.00%
Apr 27 12:19:49  Apr 27 12:19:54  80.00%
Apr 27 12:19:44  Apr 27 12:19:49  80.00%
Apr 27 12:19:39  Apr 27 12:19:44  60.00%
Apr 27 12:19:34  Apr 27 12:19:39  60.00%
Apr 27 12:19:29  Apr 27 12:19:34  40.00%
Apr 27 12:19:24  Apr 27 12:19:29  60.00%
Apr 27 12:19:19  Apr 27 12:19:24  40.00%
Apr 27 12:19:14  Apr 27 12:19:19  40.00%
Apr 27 12:19:09  Apr 27 12:19:14  20.00%
Apr 27 12:19:04  Apr 27 12:19:09  80.00%
Apr 27 12:18:59  Apr 27 12:19:04  20.00%
Apr 27 12:18:54  Apr 27 12:18:59  0.00%
Apr 27 12:18:49  Apr 27 12:18:54  40.00%
Apr 27 12:18:44  Apr 27 12:18:49  100.00%
Apr 27 12:18:39  Apr 27 12:18:44  60.00%
Apr 27 12:18:34  Apr 27 12:18:39  60.00%
Apr 27 12:18:29  Apr 27 12:18:34  60.00%
Apr 27 12:18:24  Apr 27 12:18:29  100.00%
Apr 27 12:18:19  Apr 27 12:18:24  20.00%
Apr 27 12:18:14  Apr 27 12:18:19  40.00%
Apr 27 12:18:09  Apr 27 12:18:14  80.00%
Apr 27 12:18:04  Apr 27 12:18:09  100.00%
Apr 27 12:17:59  Apr 27 12:18:04  100.00%
Apr 27 12:17:54  Apr 27 12:17:59  100.00%
Apr 27 12:17:49  Apr 27 12:17:54  80.00%
Apr 27 12:17:44  Apr 27 12:17:49  100.00%
Apr 27 12:17:39  Apr 27 12:17:44  60.00%
Apr 27 12:17:34  Apr 27 12:17:39  100.00%
Apr 27 12:17:29  Apr 27 12:17:34  100.00%
Apr 27 12:17:24  Apr 27 12:17:29  100.00%
Apr 27 12:17:19  Apr 27 12:17:24  60.00%
Apr 27 12:17:14  Apr 27 12:17:19  80.00%

Please guys, everything will be really appreciated.

Regards,

Daniel Volochen

Contributor
Posts: 44
Registered: ‎05-22-2009

Re: What could i do to solve (workaround) "slow drain devices" problem?

Ahhhh, the slow drain devices...always the most difficult.

Your attempts were correct, sometimes you can hardcode the port to a lower speed.

You could potentially use ingress rate limiting, but it is a licensed feature under Advance Performance Monitor License

From my own experience, try this:

Do a portstatsclear and slotstatsclear.

Then run portstatsshow (portnumber) and record the tm_buffer-credit=zero and the class 3 frame discards

You should be at zero if you did the statsclear

Now give it 5 minutes and run portstatsshow (portnumber)

and see what the deltas are

If you see it incrementing in the millions or billions, you definitely have a problem.

Also, are your devices traversing an ISL?

Some common slow-drain causes are:

Misbehaving device drivers

Incorrectly configured or misbehaving application software

Faulty hardware

If you would be willing to post your portstatsshow output after running the statsclear commands, it might provide a bit more insight.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook