Fibre Channel (SAN)

Reply
New Member
Posts: 1
Registered: ‎09-26-2017

tim_txcrd_z count rising coupled with PORTSC3TXTO__20_C 1000's Timeouts

Hi, all 4 ports from one of our AIX VIO servers are suffering tim_txcrd_z count  counters rising.

We have swapped the fibre out. The SFP looks ol and the HDS port is serving other servers with no issues.

 

I have a feeling the HBA's on the server aren't coping with workload at particluar times rather than it being a component 

fail issue.

 

Anyone have any thoughts?

 

thanks Rob

 

 

 

some diag below: running FOS 7.4.1.b / DCX8510-4

 

One port in particular is suffering - small excerpt below:

 

Category(Rule Count)|RepeatCount|Rule Name |Execution Time |Object |Triggered Value(Units)|
------------------------------------------------------------------------------------------------------------------------
Port Health(683) |20 |defALL_OTHER_F_PORTSC3TXTO_|09/25/17 02:19:52| F-Port 2/25 |6472 Timeouts |
| |20_C | | | |
| | | | F-Port 2/25 |4122 Timeouts |
| | | | F-Port 2/25 |4814 Timeouts |
| | | | F-Port 2/25 |8941 Timeouts |
| | | | F-Port 2/25 |6752 Timeouts |

 

C3TXTO(Timeouts)    -           2/25(1245171) -           -           -           2/25(5267)  2/25(56501)

 

tim_txcrd_z 1110 Time TX Credit Zero (2.5Us ticks)
tim_txcrd_z_vc 0- 3: 0 0 0 1110
tim_txcrd_z_vc 4- 7: 0 0 0 0
tim_txcrd_z_vc 8-11: 0 0 0 0
tim_txcrd_z_vc 12-15: 0 0 0 0
tim_latency_vc 0- 3: 1 1 1 3
tim_latency_vc 4- 7: 1 1 1 1
tim_latency_vc 8-11: 1 1 1 1
tim_latency_vc 12-15: 1 1 1 1

 

doc
Broadcom
Posts: 70
Registered: ‎03-29-2010

Re: tim_txcrd_z count rising coupled with PORTSC3TXTO__20_C 1000's Timeouts

Hi Rob,

 

Yes, there is an issue. This is one small snapshot of one port, of one switch in a fabric. What I'm getting at is, this issue is representative of a much larger holistic traffic shaping function on your fabric. It would be very difficult if not impossible to make a decent diagnosis of where the issue lies without a great deal more data/info than is presented. All I can say is that yes, you have some data traffic problems, and it's serious.

 

Quite often in cases like this, it's not the device(s) which are discarding class3 frames which are the cause, but this represents the effect. It would be that other slower traffic transiting the fabric is slowing down the faster traffic coming from your AIX host port(s). Much as a group of cement trucks(2 or 4GB traffic) is holding up the sedans, and coupes that can go faster(8 or 16GB traffic).

 

Brocade switches play no favorites with respect to speed in default configuration. It is up to the admin to either segregate the slower traffic, provide more ISLs, use traffic shaping tools(QOS), or upgrade legacy devices to faster adapters to manage this. It's also a function of where, and how ISLs are placed on your 8510.

 

For a simple example: A legacy 2GB HBA device is ingressing on edge dir slot/port 2/0. The only ISLs in use are on slot 10 port 28-31. ISL ingress is on core switch slot 4 port 0-3, and the ultimate destination target is on slot 10 port 5. Each 2GB exchange from host to target 'touches' (transits) many ASICS in each director. From the edge slot 2 ASIC, to the edge coreblade slot(chassis dependent), from there to the edge ISL slot/port, and out the edge switch. Then the ingress of the core on slot 4 and on to the coreblade slot of the core switch, and finally to the egress slot on port 10 of the core switch and finally to the target. That's a lot of ASICs to transit!

 

All that time, the TX credit zero counter is ticking, waiting for free buffers down the line. If the timer runs out, we have no choice but to discard the class 3 frame(s), and start over. Brocade has remarkable cut-through-routing speed, which is why it's the best in class. However, like a high performance Ferrari, it requires regular maintenance, tuning, and good technique.

 

From here, I suggest you read up on SAN Health reporting. Download the utility, and run it to capture each switch in the fabric. http://www.brocade.com/en/support/support-tools/support-download-san-health-diagnostics-capture.html

 

SAN Health is the best and easiest tool to use for traffic and latency issues. You can also gather quite a bit more info from the MAPS Fabric Performance Impact output. http://www.brocade.com/content/html/en/deployment-guide/brocade-san-resiliency-admin-dp/GUID-9EF972C8-F33B-43B8-9CFC-BCC678071B2D.html which is a more detail oriented report, but is not as easy to view and decode.

 

Designing quality fabrics is sometimes complex as the port count, and device diversity increase. There are many tools, and reports which will help resolve this, and if you have a vendor asset who has taken the Brocade Fabric Design course, they can assist you to change some features for better throughput.

 

 

doc

Any and all information provided by me is for entertainment value and should not be relied upon as a guaranteed solution or warranty of mechantability. All systems and all networks are different and unique. If you have a concern about data loss, or network disconnection, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, Please mark it with the button at the bottom "Accept as solution".

Broadcom
Posts: 445
Registered: ‎03-29-2011

Re: tim_txcrd_z count rising coupled with PORTSC3TXTO__20_C 1000's Timeouts

Hi Steph,

 

I assume that you see no error logs in portstatsshow / portstats64show either for all four ports?

Do you have any errors logged on the hba or it is clean too?

If the physical layers is OK, then it is either HBA/drivers issue or the server is running out of CPU/memory.

The amount of discard is large for a F-ports, for a day, so it is really looking a server/hba issue.

 

Can you share the portlogshow output so we can see how many credit the HBA have - it maybe time to upgrade the HBA to something with more memory?

 

 




If this provided you with a solution to this issue, please mark it with the button at the bottom "Accept as solution".


Any and all information provided by me is not reviewed, approved or endorsed by Brocade and is provided solely as a convenience for Brocade customers. All systems and all networks are different and unique. If you have a service affecting network problem, please open a TAC service request for service through Brocade, or through your OEM equipment provider. If this provided you with a solution to this issue, please mark it with the button at the bottom "Accept as solution"

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook