Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 5
Registered: ‎03-09-2011

does enc out counter means bad?

Hello,

We have two switches in a fabric. On ISL port we observed enc out errors. I read enc out and crc err together means sfp or cabling problem.


s152f003-sw01:admin> porterrshow
                   frames            enc    crc    crc       too    too    bad    enc   disc   link   loss   loss   frjt   fbsy
                    tx     rx              in       err    g_eof  shrt   long   eof     out      c3      fail    sync   sig
      =========================================================================================================
     4:  223.0m 222.1m     17      2      0              0        0         2        29      0        0        0          0      0      0 

s152f004-sw03:admin> porterrshow
             frames             enc    crc    crc      too    too    bad    enc  disc   link   loss   loss   frjt   fbsy
                  tx     rx            in     err    g_eof  shrt   long   eof     out      c3    fail    sync   sig
      =========================================================================================================
   4:  221.8m 222.6m    4      1         1           0      0          0        48       0      0        0         0        0      0  

Then we changed SFPs and also cable but there is no healing on counters. We still see nearly 100 enc out err in a day. Is it normal. How many enc out errors is appcepteble in a day? Or we must see 0 enc out err?

We have a problem on storage side. There are two flapping ports which have been used for cluster interconnect (through the ISL) for Netapp controllers. When cluster interconnect ports are flapping there is no flapping on ISLs. There are 3 suspicious elements for us and one of them is these counters. Do you think enc out errors can cause this kind of problem?

Gokhan

Frequent Contributor
Posts: 76
Registered: ‎04-17-2010

Re: does enc out counter means bad?

enc_out means broken ordered sets, so non-data bit patterns. Most of the ordered sets on the wire are IDLE's (or ARBff whatever your fillword). If you lose some of those it will mostly go unnoticed and not cause issues (because they're not processed). But if several ordered sets in a row are corrupt you may see loss of sync / disparity errors. What's more critical is when you're losing R_RDY's, because that will impact performance over time and may eventually lead to a link reset (LR is invoked when you're at BB_Credit=0 for >E_D_TOV which is 2 sec).

It is said that statistically enc_out alone implies cable problems, but this counter also grows when ports reset etc.

It is also possible that a bad ASIC or port on the ASIC is causing it (SerDes problem), but that's less likely.

Anyways, do some trial and error, that is, try different ports etc. The other problems you're seeing may not be caused by this, but possibly the link resets associated with those events may be causing the enc_out to grow.

Super Contributor
Posts: 635
Registered: ‎04-12-2010

Re: does enc out counter means bad?

Hello Gokhan,

Are you talking here about a NetApp Metrocluster and are these switches the backend switches of a Metrocluster?

Anyway, I would say 100 enc_out per day is too much and should be fixed.

What are the error counters from the flapping ports?

Did you have checked the error counter on your NetApp controllers?

Andreas

Occasional Contributor
Posts: 5
Registered: ‎03-09-2011

Re: does enc out counter means bad?

Hi Andreas,

yes this is a Netapp Metrocluster.

Actually there is another known issue about Qlogic FC VI adapters. Out-of -order FC frames are detected as errors by FC VI adapters (Cluster Interconnect Card). This is related with exchange based load-balancing algorithm of the switch. This is a known issue and Netapp suggests to change this algorithm with port based algorithm.

SW01 and SW03 are connected via an ISL from from port 4. And FC VI adapters are connected to port 0 on both switches. As i mentioned there is a known issue which is related with FC VI adapters. But here i see also enc out errors on ISL link. Here enc out values is not so much. But currently it is nearly 100 errors in a day. Do you think errors on port 0 can affect error rate on port 4? Or conversely?

s152f003-sw01:admin> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig
     =========================================================================================================
  0:  304.4m 300.6m   0      0      0      0      0      0    104      8      1      1      1      0      0  
  1:  409.6m 175.8m   0      0      0      0      0      0      0      8      0      0      0      0      0  
  2:  473.4m 147.1m   0      0      0      0      0      0      0      8      0      0      0      0      0  
  3:   96.4m 463.5m   0      0      0      0      0      0      0      0      0      0      0      0      0  
  4:  483.6m 473.2m   4      1      0      0      0      1      6     31      0      0      0      0      0

s152f004-sw03:admin> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig
     =========================================================================================================
  0:  301.8m 305.3m   0      0      0      0      0      0      0      0      0      0      0      0      0  
  1:  543.5m 167.6m   0      0      0      0      0      0      0      0      0      0      0      0      0  
  2:  524.2m 186.8m   0      0      0      0      0      0      0      0      0      0      0      0      0  
  3:  458.6m   1.7g   0      0      0      0      0      0      0      0      0      0      0      0      0  
  4:  473.8m 484.4m   0      0      0      0      0      0     32      0      0      0      0      0      0

Gokhan

Super Contributor
Posts: 635
Registered: ‎04-12-2010

Re: does enc out counter means bad?

Hi Gokhan,

On sw01 port 0 you have loss of signal. This can cause enc_out errors. Check if you have a loss of signal event each day and enc_out increase at the same time. If this is the case I would not worry about enc_out instead I would worry about the "regular" loss of signal.

On port 4 you have some discards. Discards will have an impact on the performance if you have a high error counter. 31 Discards are not much but better if this counter is on a zero level.

Check if the disacrds appear at the same time if the link loss happens. In this case you have to solve the loss of signal event.

Ask NetApp for a more detailed investigation. I hope this helps.

Regards,

Andreas

Occasional Contributor
Posts: 5
Registered: ‎03-09-2011

Re: does enc out counter means bad?

Hi Andreas,

Currently i dont have access to the system. When i get the access i will observe your recommendations. Thanks for your help Andreas.

Regards,

Gokhan

Frequent Contributor
Posts: 76
Registered: ‎04-17-2010

Re: does enc out counter means bad?

Quick note on changing the routing policy (i.e. 'dlsReset ; aptPolicy 1 ; iodSet'), this does not need to be set on Fabric MetroCluster backend switches. It is in fact unsupported to do so.


If you're seeing "state = 0x3 code = 0x6" FCVI errors, that means the receiver has received an out of order Sequence, but that doesn't mean out of order delivery is actually occurring (on the contrary, in a Fabric MetroCluster backend there cannot be any out of order delivery because there's only one path between any two points. Even when implementing two ISL's per FMC backend fabric, you required special TI zoning configuration in order to ensure the ISL used for FCVI is exclusive to FCVI traffic).

IOW: out of order delivery is not possible in a Fabric MetroCluster backend unless: the switches are misconfigured, or this is in fact a V-Series with an open fabric or you've got frame drop (which you do, that the disc_c3 counter). When a frame gets dropped, it looks exactly the same as out of order delivery to the receiving node and will be handled accordingly (in FCVI firmware, a regular HBA would handle it slightly different). So the soft reset / unsynchronized log will occur etc.

Hope this helps.


Contributor
Posts: 34
Registered: ‎09-04-2007

Re: does enc out counter means bad?

While implementing our new DCX backbones we found along the way that a few dozen ports had very high 'enc out' counters. Some of the ports had counts in the millions overnight, after clearing counters the night before. These ports also had 'link fail', 'loss sig', and 'loss sig' counters in the single digits.

After checking the hosts associated with the ports, we found them to all be servers that are rebooted nightly and that have older 2Gbps QLogic adapters (QLA234x). Our switch vendor suggested we hard set the switch and host ports (with 8Gbps SFPs) to 2Gbps instead of allowing the adapter to negotiate speed. I changed the switch port speed to 2 for one of the hosts and the 'enc out'  dropped significantly. I'm hoping that hard setting the adapter will bring those counters to zero.

Occasional Visitor
Posts: 1
Registered: ‎11-11-2016

Re: does enc out counter means bad?

[ Edited ]

If you are getting a lot of Link, and loss errors on 8GB HBAs, and it works at 2GB and not at 8, I would pull the fiber and recertify it end to end.  Many times when I have had those conditions, they were from dirty or unpolished fiber.

 

If they are 2GB HBAs, they should be hard set but don't connect to NPIV storage

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.