04-03-2014 05:28 PM
I have a month and a half old MLXe-4. The issue is a very random one which I have had TAC look at several times, including when it acts up. I never lose connectivity from my CPE to the MLX or vice versa. Randomly it will quit forwarding traffic (including ICMP) to the destination IP, including devices that hang directly from the same MLX. This is happening with several different Ethernet ports, subnets, and vlans. The customers SIP/UDP traffic is not affected during these times.
The other odd thing is when these business customers have hardly any traffic on circuit, it does not act up. So pretty much all night (which the MLX has the most load, due to residential use) things run great for all these business customers. With the exception of the ones who operate at night.
I am getting desparate for a resolution. We did not have this issue with the CER before this.
04-03-2014 06:03 PM
What did the TAC say? And what did they test? How is CPU load looking at the bad times vs good? Also what version are you running? Is it happening on one for more line cards?
Also check 'Brocade# show lp-cpu packet statistics' and compare the good and bad times.
04-03-2014 06:53 PM
They did not say much for the several hours I have spent on the phone with them. All stats look good all the time. The problems come and go and usually never more then once customer acts up at a time. CPU usage is low always, problems happen on all cards/ports.
04-04-2014 07:32 AM
Are you running a dynamic routing protocol or just static routing? If you are running bgp do you have a default route also?
You may want to setup an sflow analyzer tool to see traffic patterns when this happens. If its not too much 2-300Mbps you could setup a prtg sensor on a stiffer port to get that days traffic to see if you see any patterns. I have 3 of these mlx's and 2 of them are running 5.2c with no issues and one is 5.4d with no issues. Check for tcp burst settings "sho run | inc tcp" and see if you have any configuration for this here. If you have this config you may be experiences the effects of a syn attack to the device. Try setting
"ip tcp burst-normal 400 bust-max 1600 lockup 2" The 1600 is the max burst per second per ip that if it triggers it will stop passing traffic for 2 seconds. You may want to set this to an much higher value say 2500 or even higher. This depends on your traffic patterns. the burst-normal setting will efectively block the source ip after it hits the burst-normal value. You can check these by running the show statistics dos-attack. You may already know all this but its good to just spell it out as this caught me by suprise when we had a syn flood attack and the mlx just quit passing traffic repeatedly! After i tweaked this its working great now!
Another thing you can check is if you are running BGP or other routing protocols is to setup a syslog server and run debug on bgp events or ospf events (depending on your setup) to see if you are getting any weird messages during an event. I have had to do this with a bgp peer connecting to a much older router and found that there was a bgp compatibility issue on the other router that effected the MLX.