Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 13
Registered: ‎06-16-2014

tim_txcrd_z rapid increase on AIX TSM server HBAs for tape traffic

Greetings,

I've got a bit of a headscratcher here:

I have the following phenomenon on an AIX TSM server:
A TSM admin is complaining about poor backup storage pool performance.

The performance isn't terrible (+/- 150MB/s) but the TSM admin is expecing something more along the lines of 400 MB/s.

I was asked to have a look on the SAN side to identify any possible underlying SANissues that could cause this.

Data is being copied from library A to library B via 2 HBAs on TSM server X.
Tape drives are LTO6.
The data traverses over ISLs and DWDMs.

I have already checked for congestion on the ISLs and DWDMs, there is none, the lines are not saturated at all.
The TSM server has 2 8G HBAs dedicated for this traffic (tape traffic only), there is also no congestion/saturation on the 2 fixed speed 8G SAN ports that connect to these 2 HBAs.

However, when I check the portstats for the 2 SAN ports that connect to to these 2 TSM server HBAs, I see a constant high increase in the "tim_txcrd_z" counter value.

When I look at the portstats of the ports connecting to the tape drives themselves, I see no issues whatsoever, counters are all very clean.

 

To completely rule out the possibility of buffer credit starvation on the ISL links, during a maintenance window, I have doubled the amount of buffer credits:
- With the portcfgeportcredits command for normal ISLs
- With the portcfglongdistance command for LS-ports (by doubling the distance)
This did not at all change anything, the situation is still exactly the same.

I don't feel that there is much more that I could investigate from a SAN perspective.
I feel like the problem is probably situated at server level (AIX OS level or TSM application level).
However, the only observation that I can make on the SAN side is that the tim_txcrd_z counter constantly goes up with big numbers on the F_Ports that connect to the 2 server HBAs that handle the tape traffic, and I can't really explain why.
Could this be caused by a bottleneck on the TSM (AIX) server itself?

Any ideas are welcome :)

New Contributor
Posts: 2
Registered: ‎08-31-2009

Re: tim_txcrd_z rapid increase on AIX TSM server HBAs for tape traffic

I have encountered the same type of a problem within our data center no long distance ISL's involved. The problem turned out to be the one of the tapes drives wonderful slow draining device. The problem I had finally resulted in discarded c3 frames which helped to isloate the problem. I would be something to dig into. 

Frequent Contributor
Posts: 80
Registered: ‎01-28-2010

Re: tim_txcrd_z rapid increase on AIX TSM server HBAs for tape traffic

hi

 

you might wanna have a look at additional options you can set on that port

 

i had to add ostp option (open systems tape pipelining) and also had to set the fastwrite option (to my 360k san over ip connection)

 

http://www.brocade.com/downloads/documents/technical_briefs/brocade-fastwrite-ostp-tb.pdf

 

here you can find additional infos

 

i need to go but will check back tomorrow if my answer was what you have been looking for or if additonal question arise

 

Occasional Contributor
Posts: 13
Registered: ‎06-16-2014

Re: tim_txcrd_z rapid increase on AIX TSM server HBAs for tape traffic

Thanks a lot for the suggestions.

I've checked the information about ostp and fastwrite, but it seems that these are only to be used when using FCIP for extending the fabric?

We're not using FCIP though, we are using DWDM lines, so I don't think it can help in this case.

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: tim_txcrd_z rapid increase on AIX TSM server HBAs for tape traffic

Hi,

 

A constant increase of tim_txcrd_z counter in the ports connected to the TSM HBAs indicate that there are periods of times that the switch cannot send frames to the HBA because there are no BB credits available. If this behavior increases, it could lead to Tx discards on the port.

 

You can ask the TSM Admin to perform an independent backup on each of the tapes to check the throughput achieved during Write operations and the opposite one to check it out during READs. If he gets a better performance, I would point at the TSM server itself as the bottleneck since it has to read from the tape in Library A an write it to the tape in library B.

 

Rgds,

Felipon

 

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.

vADC is now Pulse Secure
Download FREE NVMe eBook