Fibre Channel (SAN)

Reply
Highlighted
Contributor
Posts: 21
Registered: ‎04-23-2008

How to identity and monitor "Slow Drain Devices"?

Hi All,

Recently i have been dealing with some slow drain devices in our SAN wich are impacting a bunch of AIX Hosts. On host side we received a lot of Temp "SC_DISK_ERR4" errors and we can identify that response time increses as well.

In order to identify this kind of situation, we enabled "bottleneckmon" (our enviroment is already running FOS v6.3.0b) on all switches, but in this way we only can act in a passive way because it logs on our RASLOG. I'm looking to find any other way to send this message to BMC Patrol or EMC Control Center.

Example:

2010/03/30-13:19:43, , 8539, SLOT 5 | FID 128, WARNING, SW_xpto, Slot 10, port 7 is a latency bottleneck due to the device attached to it. 91.666667 percent of last 300 seconds were affected by this co

Type <CR> to continue, Q<CR> to stop:

If does anyone could share any expirence on that, such as commands or anyting that could help i'll really appreciate.

PS: Some "bottleneckmon" tips:

- How to enable "bottleneckmon" on all ports with default thresholds and send it to RASLOG:

bottleneckmon

--enable -alert *

- How to display what "bottleneckmon" is monitoring (status):

bottleneckmon --status

- How to display bottleneck statistics on a specific port:

SW_xpto:admin> bottleneckmon --show -interval 5 -span 180 2/2
=============================================================
        Tue Mar 30 13:29:19 Localtime 2010
=============================================================
                                                Percentage of
From                    To                      affected secs
=============================================================
Mar 30 13:29:14         Mar 30 13:29:19         0.00%
Mar 30 13:29:09         Mar 30 13:29:14         0.00%
Mar 30 13:29:04         Mar 30 13:29:09         0.00%
Mar 30 13:28:59         Mar 30 13:29:04         0.00%
Mar 30 13:28:54         Mar 30 13:28:59         0.00%
Mar 30 13:28:49         Mar 30 13:28:54         0.00%
Mar 30 13:28:44         Mar 30 13:28:49         0.00%
Mar 30 13:28:39         Mar 30 13:28:44         0.00%
Mar 30 13:28:34         Mar 30 13:28:39         0.00%
Mar 30 13:28:29         Mar 30 13:28:34         0.00%
Mar 30 13:28:24         Mar 30 13:28:29         0.00%
Mar 30 13:28:19         Mar 30 13:28:24         0.00%
Mar 30 13:28:14         Mar 30 13:28:19         0.00%
Mar 30 13:28:09         Mar 30 13:28:14         0.00%
Mar 30 13:28:04         Mar 30 13:28:09         0.00%

Mar 30 13:27:59         Mar 30 13:28:04         0.00%
Mar 30 13:27:54         Mar 30 13:27:59         0.00%
Mar 30 13:27:49         Mar 30 13:27:54         0.00%
Mar 30 13:27:44         Mar 30 13:27:49         0.00%
Mar 30 13:27:39         Mar 30 13:27:44         0.00%
Mar 30 13:27:34         Mar 30 13:27:39         0.00%
Mar 30 13:27:29         Mar 30 13:27:34         0.00%
Mar 30 13:27:24         Mar 30 13:27:29         0.00%
Mar 30 13:27:19         Mar 30 13:27:24         0.00%
Mar 30 13:27:14         Mar 30 13:27:19         0.00%
Mar 30 13:27:09         Mar 30 13:27:14         0.00%
Mar 30 13:27:04         Mar 30 13:27:09         0.00%
Mar 30 13:26:59         Mar 30 13:27:04         0.00%
Mar 30 13:26:54         Mar 30 13:26:59         0.00%
Mar 30 13:26:49         Mar 30 13:26:54         0.00%
Mar 30 13:26:44         Mar 30 13:26:49         0.00%
Mar 30 13:26:39         Mar 30 13:26:44         0.00%
Mar 30 13:26:34         Mar 30 13:26:39         0.00%
Mar 30 13:26:29         Mar 30 13:26:34         0.00%
Mar 30 13:26:24         Mar 30 13:26:29         0.00%
Mar 30 13:26:19         Mar 30 13:26:24         0.00%

Regards,

Daniel Volochen

Super Contributor
Posts: 425
Registered: ‎03-03-2010

Re: How to identity and monitor "Slow Drain Devices"?

HI,

you can come to know about slow drain device like this:

do portstatsshow :

HDFC_BLR03:user> portstatsshow 1/2
stat_wtx                7215        4-byte words transmitted
stat_wrx                2003        4-byte words received
stat_ftx                590         Frames transmitted
stat_frx                29          Frames received
stat_c2_frx             0           Class 2 frames received
stat_c3_frx             0           Class 3 frames received
stat_lc_rx              15          Link control frames received
stat_mc_rx              0           Multicast frames received
stat_mc_to              0           Multicast timeouts
stat_mc_tx              0           Multicast frames transmitted
tim_rdy_pri             0           Time R_RDY high priority
tim_txcrd_z             0           Time TX Credit Zero (2.5Us ticks)
tim_txcrd_z_vc  0- 3:  0           0           0           0
tim_txcrd_z_vc  4- 7:  0           0           0           0
tim_txcrd_z_vc  8-11:  0           0           0           0
tim_txcrd_z_vc 12-15:  0           0           0           0
er_enc_in               0           Encoding errors inside of frames
er_crc                  0           Frames with CRC errors
er_trunc                0           Frames shorter than minimum
er_toolong              0           Frames longer than maximum
er_bad_eof              0           Frames with bad end-of-frame
er_enc_out              310817      Encoding error outside of frames
er_bad_os               2941109     Invalid ordered set
er_c3_timeout           0           Class 3 frames discarded due to timeout
er_c3_dest_unreach      0           Class 3 frames discarded due to destination unreachable
er_other_discard        0           Other discards
er_type1_miss           0           frames with FTB type 1 miss
er_type2_miss           0           frames with FTB type 2 miss
er_type6_miss           0           frames with FTB type 6 miss
er_zone_miss            0           frames with hard zoning miss
er_lun_zone_miss        0           frames with LUN zoning miss
er_crc_good_eof         0           Crc error with good eof
er_inv_arb              0           Invalid ARB
open                    0           loop_open
transfer                0           loop_transfer
opened                  0           FL_Port opened
starve_stop             0           tenancies stopped due to starvation
fl_tenancy              0           number of times FL has the tenancy
nl_tenancy              0           number of times NL has the tenancy
zero_tenancy            0           zero tenancy
HDFC_BLR03:user>

-------------------------------------------------------------------------------------

doa portstatsclear and again do a portstatsshow, many times.

during heavy dataflow, if the value of tim_txcrd_z is increasing then check it time and again. you will come to know. if you will get SCCI cmnd time out error on the server, then chekc this parameter also.

help portstatsshow:

tim_txcrd_z    The  number of times that the port was unable
                    to transmit frames because  the  transmit  BB
                    credit  was zero. The purpose of this statis-
                    tic is to detect congestion or a  slow  drain
                    device.  This  parameter is sampled at inter-
                    vals of 2.5Us (microseconds), and the counter
                    is  incremented  if  the  condition  is true.

Fabric OS                   2009-05-12                          1

User Commands                                    portStatsShow(1)

                    Each sample represents  2.5Us  of  time  with
                    zero  Tx  BB  Credit.   An  increment of this
                    counter means that the frames  could  not  be
                    send  to the attached device for 2.5Us, indi-
                    cating degraded performance.p
Type <CR> or <SPACE BAR> to continue, <q> to stop
     tim_txcrd_z_vc The number of times that the port was  unable
                    to  transmit  frames  because the transmit BB
                    credit was zero for each  of  the  port's  16
                    Virtual  Channels  (VC 0-15).  The purpose of
                    this statistic is to detect congestion  or  a
                    slow  drain  device.  This  parameter is sam-
                    pled at intervals  of  2.5Us  (microseconds),
                    and  the counter is incremented if the condi-
                    tion is true.  Each sample  represents  2.5Us
                    of time with zero Tx BB Credit.  An increment
                    of this counter means that the  frames  could
                    not be send to the attached device for 2.5Us,
                    indicating  degraded  performance  (platform-
                    and port-specific).q> to stop

New Contributor
Posts: 2
Registered: ‎01-27-2004

Re: How to identity and monitor "Slow Drain Devices"?

Is "bottleneckmon" a FOS command ? If so, from which FOS version did this get introduced ?

Regards - Suman

External Moderator
Posts: 4,973
Registered: ‎02-23-2004

Re: How to identity and monitor "Slow Drain Devices"?

--->>>...from which FOS version did this get introduced ?

In FOS 6.3.0

TechHelp24
Super Contributor
Posts: 425
Registered: ‎03-03-2010

Re: How to identity and monitor "Slow Drain Devices"?

Yes,

The command is available from 6.3.0 onwards,

Hi Daniel,

Di you check the parameters ?Tape drives are also slow drain devices and bad citizens of SAN.

Contributor
Posts: 21
Registered: ‎04-23-2008

Re: How to identity and monitor "Slow Drain Devices"?

Hi Hemant.kumar!

I did. As you told, this example below is a port used by our Disk Library (Virtual Tape Library) wich was detected by Brocade as a slow drain device.

SW_xpto:admin> portstatsshow 2/8
stat_wtx                1827118128  4-byte words transmitted
stat_wrx                2698502572  4-byte words received
stat_ftx                3967102662  Frames transmitted
stat_frx                2482038397  Frames received
stat_c2_frx             0           Class 2 frames received
stat_c3_frx             2482038397  Class 3 frames received
stat_lc_rx              0           Link control frames received
stat_mc_rx              0           Multicast frames received
stat_mc_to              0           Multicast timeouts
stat_mc_tx              0           Multicast frames transmitted
tim_rdy_pri             0           Time R_RDY high priority
tim_txcrd_z             80314820    Time BB credit zero (2.5Us ticks)
er_enc_in               0           Encoding errors inside of frames
er_crc                  0           Frames with CRC errors
er_trunc                0           Frames shorter than minimum
er_toolong              0           Frames longer than maximum
er_bad_eof              0           Frames with bad end-of-frame
er_enc_out              0           Encoding error outside of frames
er_bad_os               0           Invalid ordered set
er_rx_c3_timeout        0           Class 3 receive frames discarded due to timeout
er_c3_dest_unreach      0           Class 3 frames discarded due to destination unreachable
er_other_discard        0           Other discards
er_zone_discard         0           Class 3 frames discarded due to zone mismatch
er_crc_good_eof         0           Crc error with good eof
er_inv_arb              0           Invalid ARB
open                    3600908140  loop_open
transfer                0           loop_transfer
opened                  610462648   FL_Port opened
starve_stop             0           tenancies stopped due to starvation
fl_tenancy              1538815453  number of times FL has the tenancy
nl_tenancy              1562731517  number of times NL has the tenancy
zero_tenancy            78531643    zero tenancy

Many Tks,

Daniel Volochen

Super Contributor
Posts: 425
Registered: ‎03-03-2010

Re: How to identity and monitor "Slow Drain Devices"?

could you pls mark it correct

Contributor
Posts: 21
Registered: ‎04-23-2008

Re: How to identity and monitor "Slow Drain Devices"?

Hi Hemant.kumar,

It's already marked as correct.

Tks,

Daniel Volochen

Contributor
Posts: 21
Registered: ‎04-23-2008

Re: How to identity and monitor "Slow Drain Devices"?

Guys,

Does anyone have done such kind of configuration? Send those "Bottleneck Mon" Alerts to another framework such as BMC Patrol, SCOM ...

Regards

Daniel Volochen

Super Contributor
Posts: 635
Registered: ‎04-12-2010

Re: How to identity and monitor "Slow Drain Devices"?

Did you have configured the syslog function of the switches?

Add a syslogserver which is under control of BMC. Use the syslogdipadd command to do that on the SAN switch.All RASLOG messages will be send to this server. Than BMC can pick up this message.

I do it in that way with Tivoli TEC but I filter with a Perl script all useless massages out and add some infromation to each message and send it to the TEC after that manipulation. I hope this answer the question.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.