03-29-2010 12:55 AM
we have 4 DCX Directors in a Two Fabric topology and two datacenters
Between the datacenters the directors are connectet with longwave ISLs. In the last time appear on some ISL ports "C3 Frames Discarded" errors.
In the meantime the errorcounters are at the most ISL Ports on 4-8 erorrs.
Can someone tell me whether these errors are bad, or can we ignore them?
What can cause such errors?
Other errors don´t appear.
03-29-2010 03:58 AM
--->>>.....the directors are connectet with longwave ISLs.
The number of Class 3 frames discarded.
Class 3 frames can be discarded due to timeouts or invalid/unreachable destinations.
04-01-2010 03:53 AM
you have to keep an eye on this.it may come again if the server reboots or HBA replaced or simply portenable disable command. so wheever you find it do a portstatsclear and again see it for soem time, also doa porterrshow
04-05-2010 10:37 PM
this weekend the errorcounter for "Class 3 frames discarded due to timeout" increased again.
What does "Class 3 frames discarded due to timeout" mean, and what could be reason for the increasing of the errorcounter?
Could it be a server problem?
04-08-2010 09:34 AM
The E_D_TOV is the basic error timeout used for all fibre channel error detection.
What happens is that if a frame is older than the E_D_TOV (2 seconds by default) the switch will throw the frame out (discards it). Of course the host/storage are not notified (in class 3) when a frame is discarded and thus must rely on upper level protocols to handle it, so you will probably see i/o retries and the like.
So you have to figure out why a frame is in flight in your fabric for that long. It can be a symptom of port congestion, buffer credit starvation, or a misbehaving port. I recently saw an issue where a ESX host (emulex) HBA port was not returning R_RDYs to free up buffer credits in a timely manner and it was causing a backpressure on the ISLs causing frames to age to the point of expiration. This only happened intermittently so it was very hard to track down.
FOS 6.3 has the new bottleneck detection stuff which makes finding the culprit much easier. But I'm not going to lie, it does help to have a very good understanding of fibre channel protocol to figure this out. A protocol analyzer (Xgig) can also help tremendously..