Fibre Channel (SAN)

Reply
Occasional Contributor
Posts: 6
Registered: ‎04-07-2009

hosts lost access to targets, no message related at time of occurrence.

At 12:33 of 16 Agost, 4 hosts log messages showing disks going offline;

Aug 16 12:33:09 gcxoemsl01 kernel: lpfc 0000:06:01.0: 0:0749 SCSI layer issued abort device Data: x0 x0 x0 x176ee4d
Aug 16 12:33:15 gcxoemsl01 kernel: lpfc 0000:06:01.0: 0:0713 SCSI layer issued LUN reset (0, 0) Data: x0 x0 x0
Aug 16 12:33:15 gcxoemsl01 kernel: lpfc 0000:06:01.0: 0:0713 SCSI layer issued LUN reset (0, 1) Data: x0 x0 x0
Aug 16 12:33:15 gcxoemsl01 kernel: lpfc 0000:06:01.0: 0:0714 SCSI layer issued Bus Reset Data: x2003
Aug 16 12:33:15 gcxoemsl01 kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
Aug 16 12:33:15 gcxoemsl01 kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 1
Aug 16 12:33:15 gcxoemsl01 kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0

Other host:

Aug 16 12:32:50 gcxnassl02 kernel: lpfc 0000:02:01.0: 0:0203 Nodev timeout on NPort x21600 Data: x2010808 x7 x4

Aug 16 12:32:50 gcxnassl02 kernel: lpfc 0000:02:01.0: 0:0260 Remove Target scsi id x0

Aug 16 12:32:50 gcxnassl02 kernel: lpfc 0000:02:01.0: 0:0203 Nodev timeout on NPort x20200 Data: x2010808 x7 x5

Aug 16 12:32:50 gcxnassl02 kernel: lpfc 0000:02:01.0: 0:0260 Remove Target scsi id x1

Aug 16 12:32:50 gcxnassl02 kernel: SCSI error : <0 0 0 1> return code = 0x10000

Aug 16 12:32:50 gcxnassl02 kernel: end_request: I/O error, dev sdb, sector 259128287

Aug 16 12:32:50 gcxnassl02 kernel: Buffer I/O error on device dm-19, logical block 148292

Aug 16 12:32:50 gcxnassl02 kernel: lost page write due to I/O error on dm-19

Looking at errshow, no errors reported at that time, the only error is from day  13:

2010/08/13-14:57:28, , 12472, FFDC, WARNING, SilkWorm48000, kSWD: Detected unexpected termination of: ''rpcd:0'RfP=24514,RgP=24514,DfP=0,died=1,rt=3901369736,dt=40447,to=50000,aJc=-393649060,aJp=-393665661,abiJc=-677885080,abiJp=-677901680,aSeq=235065,kSeq=0,kJc=0,kJp=0

I would like to understand what happened to cause this disrruption.

Regards

Paulo Serra

Super Contributor
Posts: 260
Registered: ‎04-09-2008

Re: hosts lost access to targets, no message related at time of occurrence.

1. Run the fabriclog -s command on all switches and check for any ports that have gone offline during this period. Check on all switches, those connected to storage and those connected the servers. This might give you some clues.

2. It could be due to human error like change in zoning Or removal of storage mappings. If all the hosts lost connectivity to the LUN, then I would suspect the storage more than the fabric.

Super Contributor
Posts: 425
Registered: ‎03-03-2010

Re: hosts lost access to targets, no message related at time of occurrence.

if it is AIX server, then sometimes multipathing sw also plays a demon. you may have to upgrade the same.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.