01-03-2011 08:16 PM
Happy New Year!
the customer will arrange schedule time to fix the issues.
if any issues, i will update to you.
thank you very much.
01-10-2011 08:12 PM
I don't know the entire topology but this seems like a known FOS defect.
Mon Dec 13 01:29:57 2010
01:29:57.612 FSSK msg 0 1 00020001,0000d7f6,00000000, ACK
01:29:57.618 PORT Rx3 18 116 22fffffe,16000000,00faffff,04000000 <<<<<<<<<<<<<< Flogi
01:29:57.619 FSSK msg 0 0 00020001,00300006,00000000,UPDA
01:29:57.619 PORT debug 18 00000074,20120005,1e366afe,dca00044
01:29:57.619 PORT Tx3 18 116 230a1200,00fffffe,00fa3d0a,02000000 <<<<<<<<<<<<<< ELS reply Accept
01:29:57.740 FCPH write 18 116 000a1200,00fffc0a,00000000,00000000,00000000
01:29:57.740 FCPH seq 18 74 00210000,00000000,000003a5,000101c2,00000000
01:29:57.740 PORT Tx3 18 116 220a1200,00fffc0a,3d15ffff,03000000 <<<<<<<<<<<<< Plogi (trace OXID 3d15)
01:29:58.180 FCPH write 18 116 000a1200,00fffc0a,00000000,00000000,00000000
01:29:58.180 FCPH seq 18 74 00210000,00000000,000003a5,000101c2,00000000
01:29:58.180 PORT Tx3 18 116 220a1200,00fffc0a,459bffff,03000000 <<<<<<<<<<<<<< Another PLOGI (trace OXID 459b)
01:29:58.620 PORT scn 18 5 00000000,00000000,00000001
01:30:00.200 FCPH write 18 116 000a1200,00fffc0a,00000000,00000000,00000000
01:30:00.200 FCPH seq 18 74 00210000,00000000,000003a5,000101c2,00000000
01:30:00.200 PORT Tx3 18 116 220a1200,00fffc0a,4257ffff,03000000 <<<<<<<<<<<<<< Another PLOGI (trace OXID 4257)
01:30:00.650 FCPH write 18 116 000a1200,00fffc0a,00000000,00000000,00000000
01:30:00.650 FCPH seq 18 74 00210000,00000000,000003a5,000101c2,00000000
01:30:00.650 PORT Tx3 18 116 220a1200,00fffc0a,4140ffff,03000000 <<<<<<<<<<<<<< Another PLOGI (trace OXID 4140)
01:30:02.108 INTR pstate 18 LR2 <<<<<<<<<<<<<< LR2 is recieved link reset. PSM status message. This is 5 seconds after the first PLOGI. What does your E_D_TOV value say?
01:30:02.108 INTR pstate 18 AC
01:30:02.108 PORT scn 18 11 00000000,00000000,00000002
01:30:02.108 FSSK msg 0 0 00020001,000d0001,00000001,UPDA
01:30:02.109 FSSK msg 0 1 00020001,0000d7f8,00000000, ACK
01:30:02.112 FSSK msg 0 0 0002000a,0000002f,00000000,UPDA * 4
01:30:02.115 PORT Rx3 18 116 22fffffe,16000000,00fbffff,04000000 <<<<<<<<<<<<<<<<< New round of logins proceed.
PLOGI somehow ends up in LaLa land due to this (or these) defect(s). There is never an Accept for OXID 3d15 (or others) and thus the link gets reset.
One other reason might be that there are incorrect FLOGI parameters passed to the F-Port controller. When you look at the FLOGI the second word starts with 16. This is the second word of the FC frame which is the CS_CTL part. The meaning of this field in FLOGI is however determined by bit 17 of the F_CTL field in the frame which is not captured in the portlogdump.
An example normal FLOGI/PLOGI process however would have this field set to 00.
14:27:07.908 PORT Rx3 14 116 22fffffe,00000000,0026ffff,04000000 << FLOGI (First byte of 2 word is set to 00)
14:27:07.935 PORT Tx3 14 116 238f0e00,00fffffe,00260e3e,02000000 << Normal accept
14:27:07.936 PORT Rx3 14 116 22fffffc,008f0e00,0058ffff,03000000 << Normal PLOGI
14:27:07.936 PORT Tx3 14 116 238f0e00,00fffffc,00580e4d,02000000 << Normal accept
My guess is that some setting on the HBA/Storage port is incorrect. That including the FOS defects that might play part would end up with this port bouncing all over the place.
Hope this helps.
01-11-2011 06:11 PM
now the customer will take a windows to fix the issues.
would you please send a manual for decode log of portlogdump to me?
01-11-2011 10:58 PM
I whish it was that simple. The portlogdump follows the Fibre Channel standards (in some sense. :-)) and that is something you should study before being able to make some sense out of the PLD. Also be aware that the switches only capture device to switch and switch to switch frames and NOT device to device frames. PLOGI from an HBA to a storage port are therefor not captured (with some ecxeptions :-)) Also the PLD only captures word 0,1,4 and the 1st word of the payload. (depends a bit on the event code)
Still being on the subject I noticed that Brocade uses a proprietary coding mechanism for CS_CTL in class 2 and 3. I talked about CS_CTL field in the FLOGI being 16. As per the official FC standard it should be 00 (unless specific bits in the F_CTL field are used, this has to do with Delivery preference and/or Priority and Preemption) but Brocade uses this for specific type coding. In this case 16 stands for :
IU_BAD_S_ID which means Invalid S_ID. IU stands for Information Unit which is a FCP terminology. I can't determine what effect it has on the login process in this case but it seems it invalidates something. Very interesting case this is. I'm curious if this was an HBA or storage port. Haven't seen this on an HBA so it might be a primer.
In order to fully get the meaning why this is happening you would need to get a fibre-channel analyzer in between the HBA and the switch to capture the entire frameheader plus some words from the payload.
Hope this helps.
01-19-2011 06:33 AM
hello, Andreas & Erwin,
sorry for my mistake, i didn't described the entire topology. now we have upgraded FOS of Brocade 4900 which connect to Brocade 48K,the issues is fixed.
But i have another question, the defective port 18 is connect to storage port, not Brocade 4900, why to cause this issues?
thank you again.
01-19-2011 09:19 AM
on port 18 you have many link resets.
This can be related to different thinks.
1.) physical conntion issue between switch and storage port (this includes both SFPs and cables)
2.) the storage device gets frames or incomplete data from a device. Some storage arrays trys to clear the situation via LR.
Check if you have done the all my suggestions from above.