03-09-2018 10:55 PM - edited 03-09-2018 10:56 PM
Recently we had a degraded ports on one of the storage pool (IBM SVC). We logged a call with IBM and Brocade. Brocade confirmed that there were some stuck VC's and asked to execute below commands to release Stuck VC. Switch firmware is v7.3.1. Switches are DCX8510 model and uses ICL ports for ISL connectivity.
Stuck VC Condition List.
slot 2 internal port 48
slot 1 internal port 47
slot 5 internal ports 26, 128
slot 8 internal ports 16, 129
slot 9 internal port 43
To release credits, creditrecovmode --linkreset slot/blade_port. We have executed these commands.
I have few queries;
1) How to check on daily basis, the total number of Stuck VC's(backend and frontend) on a switch (apart from RAS logs)?
2) How to detect which front end port has caused the backend stuck VC, or which front end port mapped to the backend port causing stuck VC?
3) How can we check to how many ports this stuck VC has caused issues?
Thanks in advance!
Solved! Go to Solution.
03-12-2018 05:06 PM
This is rather complex and you don't want to go down that rabbit hole.
First make sure that you have the latest firmware installed on that box. That already has measures built in to prevent this from happening. Secondly make sure that your have creditrecovery turned on via the "creditrecovmode" command. This monitors the flow and lack ofcredit on the back-end. If something is not right you will get notified via a RAS event.(Cx-1014/1015/1016/1017)
03-12-2018 09:16 PM
I see its already enabled,
FC101_SW1:root> creditrecovmode --show
Internal port credit recovery is Enabled with LrOnly
LR threshold (not currently activated): 2
Fault Option (not currently activated): EDGEBLADE
C2 FE Complete Credit Loss Detection is Disabled
03-12-2018 10:18 PM
Depending on the platform a back-end link may be a trunk. If that is the case one trunk-member may still pass traffic and as such will not be flagged as a stuck VC. As such the back-end link-reset may not trigger.
Only a few OEM's and Brocade are able to determine that. It is too complex to explain here plus I might betrodding on some Broadcom IP here.
Condor 3 and 4 ASIC have improved logic so chances of happening this over there is less.