03-29-2010 09:50 AM
HDFC_KM_3:user> portstatsshow 1
stat_wtx 1270537296 4-byte words transmitted
stat_wrx 3215450649 4-byte words received
stat_ftx 3924474765 Frames transmitted
stat_frx 3341506114 Frames received
stat_c2_frx 0 Class 2 frames received
stat_c3_frx 3341031668 Class 3 frames received
stat_lc_rx 266088 Link control frames received
stat_mc_rx 0 Multicast frames received
stat_mc_to 0 Multicast timeouts
stat_mc_tx 0 Multicast frames transmitted
tim_rdy_pri 204 Time R_RDY high priority
tim_txcrd_z 3319608200 Time TX Credit Zero (2.5Us ticks)
time_txcrd_z_vc 0- 3: 17054567 0 3303103985 0
time_txcrd_z_vc 4- 7: 0 0 0 0
time_txcrd_z_vc 8-11: 0 0 0 0
time_txcrd_z_vc 12-15: 0 0 0 0
er_enc_in 0 Encoding errors inside of frames
er_crc 0 Frames with CRC errors
er_trunc 0 Frames shorter than minimum
er_toolong 0 Frames longer than maximum
er_bad_eof 0 Frames with bad end-of-frame
er_enc_out 7782 Encoding error outside of frames
er_bad_os 8691969 Invalid ordered set
er_c3_timeout 0 Class 3 frames discarded due to timeout
er_c3_dest_unreach 0 Class 3 frames discarded due to destination
er_other_discard 0 Other discards
er_type1_miss 0 frames with FTB type 1 miss
er_type2_miss 0 frames with FTB type 2 miss
er_type6_miss 0 frames with FTB type 6 miss
er_zone_miss 0 frames with hard zoning miss
er_lun_zone_miss 0 frames with LUN zoning miss
er_crc_good_eof 0 Crc error with good eof
er_inv_arb 0 Invalid ARB
open 0 loop_open
transfer 0 loop_transfer
opened 0 FL_Port opened
starve_stop 0 tenancies stopped due to starvation
fl_tenancy 0 number of times FL has the tenancy
nl_tenancy 0 number of times NL has the tenancy
zero_tenancy 0 zero tenancy
what is the meaning of the above mentioned in BOLD. In a doc I have read that if the value is increasing rapidly, SCSI cmd time out error may be observed on Host side (Server). Is this related to congestion.
Also I am observing same dataflow on HBA port(IBM P-6 595 and USP-1100 Tagma Storage port, i.e if the data is flowing on 2Gbps on HBA connected SW port, the same 2 Gbps is flowing on Storage Sw ports.Is it good or bad. as on host side I/O wait is there.
03-30-2010 12:20 AM
Increase in VC0 is definitely going to cause TEMP disk errors on your AIX host. If you want to eliminate this error quickly, resort to the best possible localization in your fabric for the host. Bring it as close as possible to the storage ports. A P595 is really large host and capable of pumping in high IO.
I had a 3 tier core edge and the way I resolved the same issue is by moving the host to the same blade/switch as the storage port. We had directors on the edge as well.
I assume the the port stats are for the port connected to the USP. In that case look for the fan-out ratio of all the storage ports connected to the same switch ASIC. Look at the CHA utilization of the storage port in concern, this is more relevant than the b/w you are looking at.
03-30-2010 05:06 AM
The server and the storage port is localized in core SW 48K and in the same blade and same port group.It is a 4-48 blade . I had put the servers on 7/34 & 35 and storage ports on 7/42 & 43. Though these are not in the same portgroup, but in same ASIC. USP-1100 storage also has one processor and 2 ports dedicated to 2 HBAs in each fabric, thus total 4 HBAs and 4 dedicated USP ports which has 2 processors.No other HBAs are zoned with these storage ports, so theer is no question of FAN out. Only the thing is we have seen CHA port utilization is busy for 4/5 hrs , which is our concern. This is a banking app with OLTP and also during batch run, we get I/O wait.So we have decided to provide dedicated CHA ports with no sharing port in storage side, but I was asking about this parameter, which is increasing and though I know that there is some issue, I needed a confirmation on this.
03-30-2010 11:29 AM
I give you confirmation from my past experience working on a similar case.
time_txcrd_z_vc 0- 3: 17054567 0 3303103985 0
An increasing VC0=0 is a sign of congestion. And now you have the cause with you, CHA utilization if above 70% will cause numerous TEMP DISK errors in errpt.
Also look at CWP on the HDS array. It may not help just moving the ports, a high CHA utilization could be due to other factors as well.
03-31-2010 09:04 AM
There is no CWP. also there is no such errors on errpt related to this.L3 support of Sun has recommended to provide one extra FED port, which will not be shared with any.Let me see, if everything will be fine, I will write here. Anyway I knew about this parameter, but I needed confirmation from someone having experience. thanks