Fibre Channel (SAN)

Reply
Contributor
Posts: 27
Registered: ‎05-27-2013

portbuffershow to diagnose zero buffer credits on F-port

[ Edited ]

Hello Community!

 

In our environment we have an IBM DS8870 storage box connected to DCX 8510-8 on each fabric with 3x 8G short distance links. This DS8870 reports high number of zero send (so rx from switch point of view) buffer credits events available on one port per each fabric (port 3/36 - index 292). According to IBM's Tivoli Productivity Center DS8870 has no send credits available 5% of the time.

 

Does it make any sense at all to use portbuffershow for troubleshooting here? I've browsed the forum and see that people ask use this command to check issues with ISLs rather than regular F-ports.

 

 

> portbuffershow 3/36
User     Port     Lx     Max/Resv  Avg Buffer Usage & FrameSize    Buffer Needed     Link     Remaining
Port     Type    Mode    Buffers        Tx         Rx              Usage  Buffers   Distance  Buffers
----     ----    ----    -------   ----------------------------    ------ -------   --------- ----------
  34       F       -         8       - (  64)       1(1684)           8       -          -
  38       F       -        45        4( 888)       4( 776)          45       -          -
 168       F       -         8        1(1340)      - ( 344)           8       -          -
 169       F       -         8       - (  - )      - (  - )           8       -          -
 173       F       -         8       - ( 964)       2(2184)           8       -          -
 290       F       -         8       - ( 112)       2(2892)           8       -          -
 292       F       -         8        1(1868)       2(2208)           8       -          -       4730



> portloginshow 3/36
Type  PID     World Wide Name        credit df_sz cos
=====================================================
  fe  6db6c0 50:05:07:63:08:10:84:5d    40  2048   c  scr=0x3
  ff  6db6c0  50:05:07:63:08:10:84:5d    12  2048   c  d_id=FFFFFA
  ff  6db6c0  50:05:07:63:08:10:84:5d    12  2048   c  d_id=FFFFFC


> portstatsshow 3/36
stat_wtx                2030258415  4-byte words transmitted
stat_wrx                924545779   4-byte words received
stat_ftx                82743311    Frames transmitted
stat_frx                693948823   Frames received
stat_c2_frx             0           Class 2 frames received
stat_c3_frx             693948823   Class 3 frames received
stat_lc_rx              0           Link control frames received
stat_mc_rx              0           Multicast frames received
stat_mc_to              0           Multicast timeouts
stat_mc_tx              0           Multicast frames transmitted
tim_rdy_pri             0           Time R_RDY high priority
tim_txcrd_z             0           Time TX Credit Zero (2.5Us ticks)
tim_txcrd_z_vc  0- 3:  0           0           0           0
tim_txcrd_z_vc  4- 7:  0           0           0           0
tim_txcrd_z_vc  8-11:  0           0           0           0
tim_txcrd_z_vc 12-15:  0           0           0           0
er_enc_in               0           Encoding errors inside of frames
er_crc                  0           Frames with CRC errors
er_trunc                0           Frames shorter than minimum
er_toolong              0           Frames longer than maximum
er_bad_eof              0           Frames with bad end-of-frame
er_enc_out              0           Encoding error outside of frames
er_bad_os               0           Invalid ordered set
er_pcs_blk              0           PCS block errors
er_rx_c3_timeout        0           Class 3 receive frames discarded due to timeout
er_tx_c3_timeout        0           Class 3 transmit frames discarded due to timeout
er_unroutable           0           Frames that are unroutable
er_unreachable          0           Frames with unreachable destination
er_other_discard        0           Other discards
er_type1_miss           0           frames with FTB type 1 miss
er_type2_miss           0           frames with FTB type 2 miss
er_type6_miss           0           frames with FTB type 6 miss
er_zone_miss            0           frames with hard zoning miss
er_lun_zone_miss        0           frames with LUN zoning miss
er_crc_good_eof         0           Crc error with good eof
er_inv_arb              0           Invalid ARB
er_single_credit_loss   0           Single vcrdy/frame loss on link
er_multi_credit_loss    0           Multiple vcrdy/frame loss on link
phy_stats_clear_ts      02-11-2016 CET Thu 14:01:30     Timestamp of phy_port stats clear
lgc_stats_clear_ts      02-11-2016 CET Thu 14:01:30     Timestamp of lgc_port stats clear




> portcfgshow 3/36
Area Number:              182
Octet Speed Combo:        1(16G|8G|4G|2G)
Speed Level:              AUTO(SW)
AL_PA Offset 13:          OFF
Trunk Port                ON
Long Distance             OFF
VC Link Init              OFF
Locked L_Port             OFF
Locked G_Port             OFF
Disabled E_Port           OFF
Locked E_Port             OFF
ISL R_RDY Mode            OFF
RSCN Suppressed           OFF
Persistent Disable        OFF
LOS TOV enable            OFF
NPIV capability           ON
QOS Port                  AE
Port Auto Disable:        OFF
Rate Limit                OFF
EX Port                   OFF
Mirror Port               OFF
SIM Port                  OFF
Credit Recovery           ON
F_Port Buffers            OFF
E_Port Credits            OFF
Fault Delay:              0(R_A_TOV)
NPIV PP Limit:            126
CSCTL mode:               OFF
D-Port mode:              OFF
D-Port over DWDM:         OFF
Compression:              OFF
Encryption:               OFF
FEC:                      ON
Non-DFE:                  OFF


> portshow 3/36
portIndex: 292
portName: IBM
portHealth: HEALTHY

Authentication: None
portDisableReason: None
portCFlags: 0x1
portFlags: 0x1024b03     PRESENT ACTIVE F_PORT G_PORT U_PORT LOGICAL_ONLINE LOGIN NOELP LED ACCEPT FLOGI
LocalSwcFlags: 0x0
portType:  24.0
portState: 1    Online
Protocol: FC
portPhys:  6    In_Sync         portScn:   32   F_Port
port generation number:    106
state transition count:    2

portId:    6db6c0
portIfId:    4332002e
portWwn:   2e:24:00:27:f8:56:82:3f
portWwn of device(s) connected:
        50:05:07:63:08:10:84:5d
Distance:  normal
portSpeed: N8Gbps

FEC: Inactive
Credit Recovery: Inactive
Aoq: Inactive
FAA: Inactive
F_Trunk: Inactive
LE domain: 0
FC Fastwrite: OFF
Interrupts:        0          Link_failure: 0          Frjt:         0
Unknown:           0          Loss_of_sync: 0          Fbsy:         0
Lli:               0          Loss_of_sig:  0
Proc_rqrd:         0          Protocol_err: 0
Timed_out:         0          Invalid_word: 0
Rx_flushed:        0          Invalid_crc:  0
Tx_unavail:        0          Delim_err:    0
Free_buffer:       0          Address_err:  0
Overrun:           0          Lr_in:        0
Suspended:         0          Lr_out:       0
Parity_err:        0          Ols_in:       0
2_parity_err:      0          Ols_out:      0
CMI_bus_err:       0

 

Contributor
Posts: 36
Registered: ‎07-19-2007

Re: portbuffershow to diagnose zero buffer credits on F-port

It seems as though what you are asking is how to confirm that Tivoli isn't lying to you.

 

How about assuming, just for the sake of argument, that it isn't.

 

Then you might ask:

 

Is the utilization between the six storage paths balanced?

 

If not, then is that a zoning issue (host(s) are only zoned to that one path per CPC)?

 

Or a MPIO issue (host(s) aren't properly configured to round robin, or otherwise employ all available paths?

 

Or you might want to know if you can assign more buffer credits to those ports...you can.

 

Or you might ask if the condition is really impacting latency...maybe it isn't.  You can see a whole lot of buffer-credit-zeroes and still not have a performance problem that needs to be addressed.  That counter is incremented every 2.5u second, so it takes 400 to equal a single millisecond of delay for one I/O.

Contributor
Posts: 27
Registered: ‎05-27-2013

Re: portbuffershow to diagnose zero buffer credits on F-port

Hi,

 

thank you very much for your suggestions.

 

 

It is really not about me not trusting Tivoli. Promise :-)

 

Checking the zoning is actually what I started with, because not so long ago with other DS8870 I run into a case like you describe: there was a grand discrepancy in what hosts the particular DS8870 ports were zoned to, serving therefore very differing workloads.

 

In this current case however, the workload seems to be quite evenly distributed among DS8870 ports, but only one of the ports on a fabric exibits lack of credits (zero send credits 4% to 8% of the time according to TPC). If only I was able to attach screenshot file here, I would be even able to prove it....

 

I was thinking that maybe, for some odd reason, one of the ports on a fabric did not receive couple Receive Ready messages from a switch and since that moment it runs not using all the credits it could.

I suppose this is something that cannot be seen easily on the switch and only disabling/enabling the switchport would show if this theory proves right?

Contributor
Posts: 36
Registered: ‎07-19-2007

Re: portbuffershow to diagnose zero buffer credits on F-port

When you mention that it is one port per fabric that has me thinking that is isn't a physical connection issue; the switch ports stats seem to back that up.

 

You are suggesting that you might have 'lost' credits, and that is plausible.  As you suggest, you could toggle one of the offending ports to force the storage to log back in and re-negotiate the connection.  You could try that on one port/fabric to see if it makes a difference.

 

Is there a symmetry between the problem ports in the two fabrics?  are they both first/last whatever?

 

In the 8510, are the ports from the 8870 distributed among several blades, or are they bunched together?  I was wondering if the odd ports might be in a different port group/blade and perhaps it was the hops between target and initiator that were different for that one (those two...).  You could manually assign more buffer credits to the port; and I have seen storage units that did recommend doing that; but that would still leave the question of why that pair and not the others.

 

You do mention three ports per fabric...Is it that there are two ports to one CPC and one port to the other CPC in each fabric and that the single port is just showing a higher traffic density?

 

...

Contributor
Posts: 27
Registered: ‎05-27-2013

Re: portbuffershow to diagnose zero buffer credits on F-port

Hi bphammon
thanks for further thoughts!


As for the symmetry, there actually is some on the SAN side: each problem port is connected to port 3/36 of a different switch. The three DS8700 ports per fabric connect to a different blade each and the servers that are zoned to them... that's something I've gotta check yet. Are you thinking here about a possibility of a credit loss on those so called back-end ports of the switch?

On the DS8870 the ports are called I0202 and I0602. All I know, these are not on the same Host Adapter card but I don't know enough about inner workings of DS8870 to say if they can share some internal buffer, link, path, ASIC or how they are assigned to Central Processor Complex (that's what you meant by CPC, right?). I will have to force the guy who is taking care of the DS8870 to think about it. The more I think about it, it is quite suspicious that both ports with troubles connect to 3/36 switchports.

Right now i put my efforts into convincing server side to accept experiment with port disable/enable. They have a bad history of some Oracle servers panicking when seeing a path coming back alive (strange but true) so it is not that easy as it should be.

Frequent Contributor
Posts: 107
Registered: ‎04-05-2011

Re: portbuffershow to diagnose zero buffer credits on F-port

Hello Mika,

 

I have seen this before. IMy environment is exactly yours: DS8870, Brocade SAN, TPC Alerting Zero Send Buffer.

 

I have found an Old server with very old HBA firmware version. The server itself had a very low workload on the SAN. In my case, fortunately the server was decomissioned and the TPC messages stopped.

 

it was an old Unisys wintel server running a very old Emulex HBA.

Emulex LP9002 FV3.93A0 DV8.2.0.29

 

I would start looking if there is similar situation in your environment.

 

wish you luck!

Contributor
Posts: 27
Registered: ‎05-27-2013

Re: portbuffershow to diagnose zero buffer credits on F-port

Hi anovelli,

 

So your culprit server might have been acting as the dreaded 'slow drain device'.

Admittedly, a 2G HBA can be considered somewhat slow in today's SAN.

 

That's an interesting clue. I don't think we have any links working below 8G now in the environment but I can easily double-check. Unfortunately it could also be just some misbehaving obsolete firmware/driver on a server side, in which case it will be extremely difficult to find out from storage admin's perspective.

 

Anyway, thank you very much!

External Moderator
Posts: 4,973
Registered: ‎02-23-2004

Re: portbuffershow to diagnose zero buffer credits on F-port

Mika,

 

->Unfortunately it could also be just some misbehaving obsolete firmware/driver on a server side, in which case it will be extremely difficult to find out from storage admin's perspective.

 

SAN-Health can help you to identify quickly most common HBA drivers and firmware

TechHelp24
Contributor
Posts: 27
Registered: ‎05-27-2013

Re: portbuffershow to diagnose zero buffer credits on F-port

[ Edited ]

Hi Antonio,

 

But San Health would not show anything more then a nodefind would, right? "Only" without that much effort...

 

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.