05-06-2009 06:00 AM
At our side, we are using 2 SANs made of 3*Brocade 200E, 1* 5300 and 1* 4020 SAN Switches,
all 4Gbit GBICS and Qlogic HBAs, all switches running FabricOS 6.1.x.
I started monitoring our SAN Ports for performance, 1 sampling per 5 Minutes.
Queried OIDs :
swFCPortTxWords 126.96.36.199.4.1.15188.8.131.52.184.108.40.206.11 Transmit Frames
swFCPortRxWords 220.127.116.11.4.1.1518.104.22.168.22.214.171.124.12 Received Frames
To get Data in Bytes, I simply multiple the gathered data by 4.
Up to 20/25MB/Sec Rx or Tx Port load, MRTG/RRDTool graph correctly the Swich Port Load.
When port load is higher than above, MRTG/RRDTool start reporting abnormal performance
compared with what StorageArray and/or Host are reporting :
Ex: While DDing on raw disk and Storage Array seeing IO Rate at 180MB/Sec,
MRTG graph 12, 16, 30, 20 MB/Sec.
Same behaviour when I monitor my ISL Links between SAN Switches.
So, no direct link with connected infrastructure to port.
Investingating a lot around it, I discovered the problem is caused by switch counters being reset.
The swFCPortTxWords and swFCPortRxWords values are stored as Counter32, or 2^32.
Assuming (Avg Speed in bytes * 300 sec sampling) / 4 = Delta Tx or Rx Frames ....
and Counter32 storing up to 4294967295,
At a rate of 100MB/Sec, a 5 minutes Frame Rx or Tx delta is
(100*1024*1024*300/4) = avg 7864320000 4 Bytes frames
meaning counters are reset nearly twice per 5 min.
At a rate of 50MB/Sec, a 5 minutes Frame Rx or Tx delta is
(50*1024*1024*300/4) = avg 3932160000 4 Bytes frames
meaning counters are reset nearly once per 5 min.
That said, MRTG correctly see performance reduction and/or handle
reduced counter as an exception and handling as 0 and causing wrong performance board effects.
Using PortStatsShow command, same kind of info stored on Counter32. See sample :
SANSWITCH:admin> portstatsshow 11
stat_wtx 4289468676 4-byte words transmitted
stat_wrx 818304572 4-byte words received
...Wait 10 Secs ...
SANSWITCH:admin> portstatsshow 11
stat_wtx 25652200 4-byte words transmitted
stat_wrx 818329772 4-byte words received
Any idea how to monitor port performance with reliable data ?
Thanks in advance,
Kind regards - Bien cordialement - Vriendelijke groeten,
Backup/Storage & System Management
05-11-2009 07:48 AM
You're right 32bits couters are going to be reseted each time they reach the max value
Then you better use "portstats64show"
> portstatsshow 1/10
stat_wtx 2486236280 4-byte words transmitted
stat_wrx 373861032 4-byte words received
stat_ftx 1964791586 Frames transmitted
stat_frx 3432227480 Frames received
> portstats64show 1/10
stat64_wtx 5540 top_int : 4-byte words transmitted
2492715400 bottom_int : 4-byte words transmitted
stat64_wrx 13021 top_int : 4-byte words received
373942308 bottom_int : 4-byte words received
stat64_ftx 14 top_int : Frames transmitted
1964806620 bottom_int : Frames transmitted
stat64_frx 53 top_int : Frames received
3432231841 bottom_int : Frames received
Hope this will help you
05-12-2009 02:36 AM
thanks for your reply. In the meantime, I fixed the problem :
Assuming TX/RX 32bits counter reset takes 32 Secs at 4Gbits to be resetted,
( 2^32 frames of 4 Bytes * 8 Bits) / (4Gigabit port speed) = 32sec, in other
words, a theoretical Port IO Rate of 257MB/Sec, I can safely poll each minute,
comparing value with previously gathered one.
If new>=old -> Ok, no prob, I compute delta=new-old.
If new<old, then I compute delta=(2^32-old)+new.
I store the 5 last computed values for the last 5 min IO interval.
Then I can deduce Avg Speed (Sum of 5 saved 1 minute deltas)/360 Sec.
As far as my port load is lower than 125MB/Sec, counter will never be
reset within a minute, so no prob. If case of perf higher than 125MB/S,
I just have to query each 30 Sec SNMP 32counters in way of 1 minute and
archive 10 collected data in way of 5. And the cat is in the bag. :-)
Thanks again for answer.