12-20-2017 12:06 AM
12-20-2017 12:45 AM - edited 12-20-2017 12:46 AM
Yes looking like storage port is not able to handle frames from the servers.
What are the port speed of the servers and storage?
This might be a Q depth issue, an issue on storage itself or a wrong optimization on it.
12-20-2017 04:17 AM
12-20-2017 07:06 AM
Troubleshooting class 3 discards, and device latency issues is a large discussion and can include many aspects of a SAN. A slow draining device can cause this, a poor performing HBA, or a fan ratio issue can contribute, and ISL oversubscription can also affect latency. Please note that port 18 in the report is the port which is BEING affected, and is not likely the port which is the cause of the problem.
First things to do: run the commands: statsclear; slotstatsclear on each switch in the fabric. After 24-48 hours run porterrshow on each switch in the fabric. Review those outputs for errors, and advise what you find. Also provide your output of firmwareshow, fabricshow, so that we know what kind of equipment we are looking at.
I would advise you to run the SAN Health report, and gather some info on your connections, and througput. Without a complete picture of the fabric, and all F and E port connections it will be impossible to diagnose. There are also records in the log file which may be useful in determining what ports are causing latency within the fabric.
You will get a report emailed to you showing the switch connections, and many attributes of the fabric. Once that report is checked, we can proceed with some options. As a first guess, try to find legacy devices which may be running at 4GB and are traversing the fabric via ISL(E_ports). This is a common issue causing class 3 discards, but it is only one possible issue. There are many other things which affect throughput and congestion.
08-07-2018 03:30 AM
Still the errors are popping up as follows:-
F-Port 18, Condition=ALL_PORTS(DEV_LATENCY_IMPACT==IO_FRAME_LOSS), Current Value:[ DEV_LATENCY_IMPACT,IO_FRAME_LOSS, (1408 C3TX Timeouts) ], RuleName=defALL_PORTS_IO_FRAME_LOSS, Dashboard Category=Fabric Performance Impact.
Port error shows disc c3 and c3timeout tx value is 154.5k
admin> portstatsshow 18
stat_wtx 10212000744426 4-byte words transmitted
stat_wrx 63486130803777 4-byte words received
stat_ftx 1681883768 Frames transmitted
stat_frx 1658633119 Frames received
stat_c2_frx 0 Class 2 frames received
stat_c3_frx 1658728827 Class 3 frames received
stat_lc_rx 0 Link control frames received
stat_mc_rx 0 Multicast frames received
stat_mc_to 0 Multicast timeouts
stat_mc_tx 0 Multicast frames transmitted
tim_rdy_pri 0 Time R_RDY high priority
tim_txcrd_z 146519310 Time TX Credit Zero (2.5Us ticks)
tim_txcrd_z_vc 0- 3: 0 0 0 0
tim_txcrd_z_vc 4- 7: 146519310 0 0 0
tim_txcrd_z_vc 8-11: 0 0 0 0
tim_txcrd_z_vc 12-15: 0 0 0 0
tim_latency_vc 0- 3: 1 1 1 1
tim_latency_vc 4- 7: 1 1 1 1
tim_latency_vc 8-11: 1 1 1 1
tim_latency_vc 12-15: 1 1 1 1
fec_cor_detected 0 Count of blocks that were corrected by FEC
fec_uncor_detected 0 Count of blocks that were left uncorrected by FEC
er_enc_in 0 Encoding errors inside of frames
er_crc 0 Frames with CRC errors
er_trunc 0 Frames shorter than minimum
er_toolong 0 Frames longer than maximum
er_bad_eof 0 Frames with bad end-of-frame
er_enc_out 0 Encoding error outside of frames
er_bad_os 0 Invalid ordered set
er_pcs_blk 0 PCS block errors
er_rx_c3_timeout 0 Class 3 receive frames discarded due to timeout
er_tx_c3_timeout 154586 Class 3 transmit frames discarded due to timeout
er_unroutable 0 Frames that are unroutable
er_unreachable 0 Frames with unreachable destination
er_other_discard 0 Other discards
er_type1_miss 0 frames with FTB type 1 miss
er_type2_miss 0 frames with FTB type 2 miss
er_type6_miss 0 frames with FTB type 6 miss
er_zone_miss 0 frames with hard zoning miss
er_lun_zone_miss 0 frames with LUN zoning miss
er_crc_good_eof 0 Crc error with good eof
er_inv_arb 0 Invalid ARB
er_single_credit_loss 0 Single vcrdy/frame loss on link
er_multi_credit_loss 0 Multiple vcrdy/frame loss on link
phy_stats_clear_ts 07-06-2018 IST Fri 16:39:55 Timestamp of phy_port stats clear
lgc_stats_clear_ts 07-06-2018 IST Fri 16:39:55 Timestamp of lgc_port stats clear
Can it be a FC cable fault?
08-07-2018 04:36 AM - edited 08-07-2018 04:37 AM
If the cable is wrong you will get enc_out errors.
If you have 750MB/s on storage port, it looks like that the FE port is overloaded ( I never see greater value that 750 on any 8Gb FC port). Try to remap some huge servers to another pair of storage ports to reduce pressure on storage port.
I had an similar issue with cluster where the huge database utilised 2 pair of FE ports (both at 750MB ), so the timeouts occured.
We have solved it with another pair of HBA and different less utiized FE ports.