Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 6
Registered: ‎10-08-2008

Time out errors on Aix, Solaris servers due to Marginal ISL ports in fabric SAN 48k.

All,

I am seeking some help from you all regarding this issue.

Fabric SAN running with 48k directors using FOS v6.4.2a generates marginal ports such as below

2013/03/08-10:38:37, , 43749, SLOT 6 | FID 128, WARNING, ML_PRV5M48036, Switch status changed from HEALTHY to DOWN.

2013/03/08-10:38:37, , 43750, SLOT 6 | FID 128, WARNING, ML_PRV5M48036, Switch status change contributing factor Marginal ports: 4 marginal ports. (Port(s) 16(0x10),17(0x11),18(0x12),19(0x13))

2013/03/08-10:39:14, , 43751, SLOT 6 | FID 128, INFO, ML_PRV5M48036, Switch status changed from DOWN to HEALTHY.

ML_PRV5M48036:in08815f> trunkshow

  1: 17-> 17 10:00:00:05:1e:36:37:f4  76 deskew 38 MASTER

     19-> 19 10:00:00:05:1e:36:37:f4  76 deskew 36

     18-> 18 10:00:00:05:1e:36:37:f4  76 deskew 36

     16-> 16 10:00:00:05:1e:36:37:f4  76 deskew 15

16:   18.8m  19.2m   0      0      0      0      0      0      0     17      0      0      0      0      0

17:  264.8m 401.6m   0      0      0      0      0      0      0     17      0      0      0      0      0

18:   32.4m  43.0m   0      0      0      0      0      0      0     17      0      0      0      0      0

19:    1.0g   1.6g   0      0      0      0      0      0      0     17      0      0      0      0      0

These are Tx transmit discards on ports. And are running on Long distance mode (LE mode) - 10 kms.

Around the same time, I see a lot of hdisk path errors on AIX and scsi time out errors on solaris servers.

Actions tried:

1. Cleared stats and monitored to find a lot of discards appearing on ISL and few discards on few other ports.

2. I am getting these ports/cables checked and replaced.

3. Enhanced ISL by adding new trunks between this switch.

Now I am running out of ideas, I would like to know if this could be due to

1. buffer credit starvation on ISL ports ?

2.bottle neck some where in the fabric ?

3.Any other in compatibility issues ?

please throw some light

Thanks in advance

External Moderator
Posts: 4,974
Registered: ‎02-23-2004

Re: Time out errors on Aix, Solaris servers due to Marginal ISL ports in fabric SAN 48k.

1424 and 1436 stay indicate, the switchstatuspolicy is great than or equal as the policy set in use.

command "switchstatuspolicyshow" give th eoutput, with "switchstatuspolicyset" you can definied the Policy.

details are as well descripted in Command Reference Manuals

TechHelp24
Occasional Contributor
Posts: 6
Registered: ‎10-08-2008

Re: Time out errors on Aix, Solaris servers due to Marginal ISL ports in fabric SAN 48k.

Techhelp24

The current overall switch status policy parameters:

                   Down    Marginal

----------------------------------

     PowerSupplies    3           0

      Temperatures    2           1

              Fans    2           1

               WWN    0           1

                CP    0           1

             Blade    0           1

             Flash    0           1

     MarginalPorts    2           1

       FaultyPorts    2           1

The above values are very close to the default values. Agree with you that this will limit the instance of switch status going down, but dont you think it will prevent me from knowing what was going on switches while the time out errors appear on servers. In fact with this values I got to know that something is happening to ISL's at the time of issues. Do you still think changing the policy values, will help in identifying the actual reason for ISL ports to go ,marginal or time out errors on servers ?

Valued Contributor
Posts: 761
Registered: ‎06-11-2010

Re: Time out errors on Aix, Solaris servers due to Marginal ISL ports in fabric SAN 48k.

Hi,

review the remote switch, since there may be a bottleneck in it. Do you see any eror in that one? any discard?

Occasional Contributor
Posts: 6
Registered: ‎10-08-2008

Re: Time out errors on Aix, Solaris servers due to Marginal ISL ports in fabric SAN 48k.

It was identified that a server which was accessing its storage from other site was having 2 hops, we reduced the number of hops to 1 by bringing the server closer to storage.This seems to have reduced the load on ISL's and thus no time out errors.

Thanks All.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.