Hi,
I use ServerIron ADX in router mode to NAT outgoing traffic from the network. I am observing delay (3sec) for any tcp connections while getting established. When tcpdumping the traffic on source host (internal network) and destination (in the Internet) I can see following:
Source host (internal network):
01:44:02.636285 IP 192.168.129.158.41932 > 109.104.XX.YY.9103: S 2419229496:2419229496(0) win 5840 <mss 1460,sackOK,timestamp 71628984 0,nop,wscale 7>
01:44:05.634126 IP 192.168.129.158.41932 > 109.104.XX.YY.9103: S 2419229496:2419229496(0) win 5840 <mss 1460,sackOK,timestamp 71629734 0,nop,wscale 7>
01:44:05.640589 IP 109.104.XX.YY.9103 > 192.168.129.158.41932: S 1576520605:1576520605(0) ack 2419229497 win 5792 <mss 1460,sackOK,timestamp 1020703064 71629734,nop,wscale 7>
01:44:05.640605 IP 192.168.129.158.41932 > 109.104.XX.YY.9103: . ack 1 win 46 <nop,nop,timestamp 71629735 1020703064>
Target Host (internet):
01:44:05.638000 IP 92.48.NAT.NAT.45993 > 109.104.XX.YY.9103: S 2419229496:2419229496(0) win 5840 <mss 1460,sackOK,timestamp 71629734 0,nop,wscale 7>
01:44:05.638079 IP 109.104.XX.YY.9103 > 92.48.NAT.NAT.45993: S 1576520605:1576520605(0) ack 2419229497 win 5792 <mss 1460,sackOK,timestamp 1020703064 71629734,nop,wscale 7>
01:44:05.644468 IP 92.48.NAT.NAT.45993 > 109.104.XX.YY.9103: . ack 1 win 46 <nop,nop,timestamp 71629735 1020703064>
that shows the first tcp syn packet never makes it to the target hosts so after 3 seconds (see the timers) the source host sends another syn and only now it is getting through to the target. that's the obvious reason for the delays in any tcp establishment. (it is just the first syn which is getting lost, the rest of the session is then smooth as expected).
when I do the same test without the serveriron in path this doesn't happen - the first syn is delivered and there is no connection setup delay. that makes me believe it is the serveriron causing this problem.
is there any way to get around this? is my NAT configuration wrong or is it something the lodbalancer is doing deliberately?
thanks,
Steve.
Hi Steve,
It will be good if you can paste some parts of your config which can tell if you are using dynamic nat or static nat , what other features are you using along with NAT and what software version are you running ?
You may want to capture and see what arp requests adx is sending out. You can use debug filter for that.
Thanks
Mohit
Hi Steve,
When you send packet to outside network at the very first time, it happens whenBP has no idea of next hop mac address. The next hop mac address is typically default gateway. BP is ADX's L4-7 CPU. If you issue "show arp" or "show mac", that will only display MP's mac address. If you want to look at BP's mac address, do "rconsole 1 1" and "show arp" or "show mac". "rcon-exit" to exit. MP and BP does synchronization of arp/mac address periodically, but if we focus on the very first packet, something like your issue happens.
Anyway, the workaround to you is to configure static mac address for default route.
ServerIronADX 1000(config)#
ServerIronADX 1000(config)# vlan 1
ServerIronADX 1000(config-vlan-1)# static-mac-address 001b.ed96.8a80 ethernet 3
or, you do periodic health check for default gateway such as below.
server remote gw-health <default-gateway-ip-address>
Note that you don't need to bind this remote server to vip. without binding to VIP, adx will send arp for default gateway.
If you configure real, that is also okay. In that case, adx will do both arp and icmp health check.
I prefer server remote in this specific case. This issue will happen only when we perform IP-NAT as far as I'm aware.
Thanks.
//Kono
thanks guys for your suggestions.
let me provide some more details as I still cannot get to the bottom of this problem.
I am not sure this is ARP caching related because of following reasons:
see arp cache records for two servers where first one has no problems at all and the second always gets first SYN lost:
SSH@lb2#show arp mac-address 0016.366f.90fc
IP Address MAC Address Type Age Port
1 192.168.129.153 0016.366f.90fc Dynamic 2 3/1-3/4
SSH@lb2#show arp mac-address 0016.3e62.bd70
IP Address MAC Address Type Age Port
1 192.168.129.158 0016.3e62.bd70 Dynamic 0 3/1-3/4
interestingly the servers affected with the loss are always shown with Age 0 and those working ok have non-zero Age in the cache (so does this suggests it really has to do something with arp caching?)
some details about my configuration that might be relevant:
please find the attached lb1.txt and lb2.txt showing most of my configuration (removing all server virtual/real as that is probably irrelevant). as shown I am running version 10.2.00eTD4 of the software.
I will be very grateful for any suggestions as I am running out of ideas and this problem drives me crazy.
thanks,
Steve.
also when doing just ping to the same target for which the first tcp syn gets always lost there are never any losses for ICMP.
Hi Steve,
I noticed that your VRRP-e VIP and NAT-IP is overlapping. i.e. both uses ip address XXX.YYY.ZZZ.7
We don't support setup where VRRP-e and NAT-IP is exactly the same. So, you must change either VRRP-e VIP or NAT-IP
We have several CPU called MP and BP where MP will process VRRP-e and BP will NAT/SLB. MP should not receive NAT-IP traffic, but in your case it is receiving because of VRRP-e cam matching .
If you want to trouble-shoot this by your self, use debug filter. And if you want to keep this discusstion, please show the output of debug filter. debug filter utility will give you the reason we dropped your packet.
Below, 210.210.210.210 is the ip address of www.example.com
LB2# debug filt
LB2(debug-filter)# spec 1
LB2(debug-filter-spec-1)# reset
LB2(debug-filter-spec-1)# ip src 210.210.210.210
LB2(debug-filter-spec-1)# exit
LB2(debug-filter)# spec 2
LB2(debug-filter-spec-2)# reset
LB2(debug-filter-spec-2)# ip dest 210.210.210.210
LB2(debug-filter-spec-2)# exit
LB2(debug-filter)# apply 1or2
LB2(debug-filter)# packet whole
LB2(debug-filter)# buff 1024
LB2(debug-filter)# int
LB2(debug-filter)# start
LB2(debug-filter)#
210.210.210.210 is the ip address of www.example.com
do telnet www.example.com 80
when finished, issue stop command.
LB2(debug-filter)# stop
below, I took only ascii 1 and ascii 2, but repeat as many number as you see based on summ command.
v m
v b 1 1
v b 1 2
v b 1 3
view mp
view bp 1 1
view bp 1 2
view bp 1 3
telnet@LB2(debug-filter-1-3)#
telnet@LB2(debug-filter-1-3)#
telnet@LB2(debug-filter-1-3)#v m
telnet@LB2(debug-filter-MP)#summ
1> 90 TCP :45676->80 Seq:3192243562 Ack:0 SYN
2> 90 TCP :80 ->15416 Seq:2304501568 Ack:3192243563 SYN ACK
telnet@LB2(debug-filter-MP)#ascii 1
Packet 1 captured at 11 minutes 24 seconds ; Packet size is 90(0x005a) bytes
In port: 4/5
fpga optimized: No
Ethernet Version II
Address: 0030.4885.385c ---> 02e0.5265.cd81
Ethernet II Protocol Type: IP
Internet Protocol
Version(MSB 4 bits): 4
Header length(LSB 4 bits): 5 (32-bit word)
Service Type: 0x00
Total length: 60 (Octets)
Fragment ID: 30178
Flags summary: 0x40
0... .... = Reserved
.1.. .... = Do not fragment
..0. .... = Last fragment
Fragment offset(LSB 13 bits): 0 (0x00)
Time to live: 64 seconds/hops
IP protocol type: TCP (0x06)
Checksum: 0xddbe
IP address: 192.168.129.158 ---> 210.210.210.1
No option
Transmission Control Protocol
Port 45676 ---> 80
Sequence Number: 3192243562
Acknowledgement Number: 0
Header Length(MSB 4 bits): 10 (32-bit word)
Reserved(LSB 4 bits): 0
Code: 0x02
RES: 0... ....
CON: .0.. ....
URG: ..0. ....
ACK: ...0 ....
PSH: .... 0...
RST: .... .0..
SYN: .... ..1.
FIN: .... ...0
Window: 5840
Checksum: 0x62cf
Urgent Pointer: 0x0000
Data:
0000: bd 26 03 06 bd 26 00 3c 75 e2 40 00 40 06 dd be | .&...&.<u.@.@...
telnet@LB2(debug-filter-MP)#
telnet@LB2(debug-filter-MP)#
telnet@LB2(debug-filter-MP)#
telnet@LB2(debug-filter-MP)#
telnet@LB2(debug-filter-MP)#v b 1 1
telnet@LB2(debug-filter-1-1)#summ
telnet@LB2(debug-filter-1-1)#
1> 74 TCP :45676->80 Seq:3192243562 Ack:0 SYN
2> 74 TCP :15416->80 Seq:3192243562 Ack:0 SYN
3> 74 TCP :80 ->15416 Seq:2304501568 Ack:3192243563 SYN ACK
4> 74 TCP :80 ->45676 Seq:2304501568 Ack:3192243563 SYN ACK
5> 74 TCP :80 ->15416 Seq:2304501568 Ack:3192243563 SYN ACK
6> 74 TCP :80 ->45676 Seq:2304501568 Ack:3192243563 SYN ACK
7> 66 TCP :45676->80 Seq:3192243563 Ack:2304501569 ACK
8> 66 TCP :15416->80 Seq:3192243563 Ack:2304501569 ACK
9> 218 TCP :45676->80 Seq:3192243563 Ack:2304501569 ACK PSH
10> 218 TCP :15416->80 Seq:3192243563 Ack:2304501569 ACK PSH
11> 66 TCP :80 ->15416 Seq:2304501569 Ack:3192243715 ACK
12> 66 TCP :80 ->45676 Seq:2304501569 Ack:3192243715 ACK
13> 284 TCP :80 ->15416 Seq:2304501569 Ack:3192243715 ACK PSH
14> 284 TCP :80 ->45676 Seq:2304501569 Ack:3192243715 ACK PSH
15> 66 TCP :45676->80 Seq:3192243715 Ack:2304501787 ACK
16> 66 TCP :15416->80 Seq:3192243715 Ack:2304501787 ACK
17> 66 TCP :45676->80 Seq:3192243715 Ack:2304501787 ACK FIN
18> 66 TCP :15416->80 Seq:3192243715 Ack:2304501787 ACK FIN
19> 66 TCP :80 ->15416 Seq:2304501787 Ack:3192243716 ACK FIN
20> 66 TCP :80 ->45676 Seq:2304501787 Ack:3192243716 ACK FIN
21> 66 TCP :45676->80 Seq:3192243716 Ack:2304501788 ACK
22> 66 TCP :15416->80 Seq:3192243716 Ack:2304501788 ACK
telnet@LB2(debug-filter-1-1)#ascii 1
telnet@LB2(debug-filter-1-1)#
Packet 2 captured at 10 minutes 43 seconds ; Packet size is 74(0x004a) bytes
Out port: 4/1
fpga optimized: No
Ethernet Version II
Address: 0030.4885.385c ---> 0030.4883.5f2c
Ethernet II Protocol Type: IP
Internet Protocol
Version(MSB 4 bits): 4
Header length(LSB 4 bits): 5 (32-bit word)
Service Type: 0x00
Total length: 60 (Octets)
Fragment ID: 30179
Flags summary: 0x40
0... .... = Reserved
.1.. .... = Do not fragment
..0. .... = Last fragment
Fragment offset(LSB 13 bits): 0 (0x00)
Time to live: 64 seconds/hops
IP protocol type: TCP (0x06)
Checksum: 0x7b2a
IP address: 210.210.210.7 ---> 210.210.210.1
No option
Transmission Control Protocol
Port 15416 ---> 80
Sequence Number: 3192243562
Acknowledgement Number: 0
Header Length(MSB 4 bits): 10 (32-bit word)
Reserved(LSB 4 bits): 0
Code: 0x02
RES: 0... ....
CON: .0.. ....
URG: ..0. ....
ACK: ...0 ....
PSH: .... 0...
RST: .... .0..
SYN: .... ..1.
FIN: .... ...0
Window: 5840
Checksum: 0x7382
Urgent Pointer: 0x0000
telnet@LB2(debug-filter-1-1)#v b 1 2
telnet@LB2(debug-filter-1-2)#summ
telnet@LB2(debug-filter-1-2)#v b 1 3
telnet@LB2(debug-filter-1-3)#summ
telnet@LB2(debug-filter-1-3)#
1> 74 TCP :80 ->15416 Seq:2304501568 Ack:3192243563 SYN ACK
2> 74 TCP :80 ->15416 Seq:2304501568 Ack:3192243563 SYN ACK
telnet@LB2(debug-filter-1-3)#
telnet@LB2(debug-filter-1-3)#
Thanks.
//Kono
Hi Kono,
this looks very very promising, I am scheduling the change for the end of this week and will post an update when done.
thank you very much,
Steve.
ok, I spent some time with this and tried changing the default NAT pool IP to XXX.YYY.ZZZ.254:
no ip nat pool default
ip nat pool default XXX.YYY.ZZZ.254 XXX.YYY.ZZZ.254 prefix-len 24
ip nat pool default port-pool-range 1
server vip-group 10
ip-nat-pool default
so it is now not colliding with the vrrp address. I still have the 3 second delay caused by first syn loss on some servers. here is the debug filter output:
| debug filter |
|---|
SSH@lb2#debug filt SSH@lb2(debug-filter-MP)#spec 1 SSH@lb2(debug-filter-spec-1)#reset SSH@lb2(debug-filter-spec-1)#ip src AAA.BBB.CCC.18 SSH@lb2(debug-filter-spec-1)#exit SSH@lb2(debug-filter-MP)#spec 2 SSH@lb2(debug-filter-spec-2)#reset SSH@lb2(debug-filter-spec-2)#ip dest AAA.BBB.CCC.18 SSH@lb2(debug-filter-spec-2)#exit SSH@lb2(debug-filter-MP)#apply 1or2 SSH@lb2(debug-filter-MP)#packet whole SSH@lb2(debug-filter-MP)#buff 1024 SSH@lb2(debug-filter-MP)#int SSH@lb2(debug-filter-MP)#start SSH@lb2(debug-filter-MP)#stop SSH@lb2(debug-filter-MP)#summ Number of packets captured: 0 SSH@lb2(debug-filter-MP)#v b 1 1 SSH@lb2(debug-filter-1-1)#summ SSH@lb2(debug-filter-1-1)# 1> 74 TCP :41274->80 Seq:4225151686 Ack:0 SYN 2> 74 TCP :41274->80 Seq:4225151686 Ack:0 SYN 3> 74 TCP :63156->80 Seq:4225151686 Ack:0 SYN 4> 74 TCP :80 ->63156 Seq:3161461361 Ack:4225151687 SYN ACK 5> 74 TCP :80 ->41274 Seq:3161461361 Ack:4225151687 SYN ACK 6> 66 TCP :41274->80 Seq:4225151687 Ack:3161461362 ACK 7> 66 TCP :63156->80 Seq:4225151687 Ack:3161461362 ACK SSH@lb2(debug-filter-1-1)#ascii 1 SSH@lb2(debug-filter-1-1)#
Packet 1 captured at Mar 1 06:27:45 ; Packet size is 74(0x004a) bytes In port: 3/1 fpga optimized: No
System Header Fields Flags : dav, sav, sv, tagged Src port : 64 VLAN Id : 129 Offset : 16 Fid : 55778 Protocol : 3 Ethernet Version II Address: 0016.3e62.bd70 ---> 02e0.5265.cd81 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x10 Total length: 60 (Octets) Fragment ID: 62526 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 64 seconds/hops IP protocol type: TCP (0x06) Checksum: 0x8ae7 IP address: 192.168.129.158 ---> AAA.BBB.CCC.18 No option Transmission Control Protocol Port 41274 ---> 80 Sequence Number: 4225151686 Acknowledgement Number: 0 Header Length(MSB 4 bits): 10 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x02 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...0 .... PSH: .... 0... RST: .... .0.. SYN: .... ..1. FIN: .... ...0 Window: 5840 Checksum: 0xba4a Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 2 SSH@lb2(debug-filter-1-1)#
Packet 2 captured at Mar 1 06:27:48 ; Packet size is 74(0x004a) bytes In port: 3/1 fpga optimized: No
System Header Fields Flags : dav, sav, sv, tagged Src port : 64 VLAN Id : 129 Offset : 16 Fid : 55778 Protocol : 3 Ethernet Version II Address: 0016.3e62.bd70 ---> 02e0.5265.cd81 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x10 Total length: 60 (Octets) Fragment ID: 62527 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 64 seconds/hops IP protocol type: TCP (0x06) Checksum: 0x8ae6 IP address: 192.168.129.158 ---> AAA.BBB.CCC.18 No option Transmission Control Protocol Port 41274 ---> 80 Sequence Number: 4225151686 Acknowledgement Number: 0 Header Length(MSB 4 bits): 10 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x02 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...0 .... PSH: .... 0... RST: .... .0.. SYN: .... ..1. FIN: .... ...0 Window: 5840 Checksum: 0xb75c Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 3 SSH@lb2(debug-filter-1-1)#
Packet 3 captured at Mar 1 06:27:48 ; Packet size is 74(0x004a) bytes Out port: 3/4 fpga optimized: No
System Header Fields Flags : us, dav, sav, sv, txa, sas, tagged Src port : 31 VLAN Id : 10 Offset : 16 Fid : 131 Protocol : 0 Ethernet Version II Address: 0016.3e62.bd70 ---> 6c9c.ed1a.a163 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x10 Total length: 60 (Octets) Fragment ID: 62527 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 64 seconds/hops IP protocol type: TCP (0x06) Checksum: 0xf4fe IP address: XXX.YYY.ZZZ.254 ---> AAA.BBB.CCC.18 No option Transmission Control Protocol Port 63156 ---> 80 Sequence Number: 4225151686 Acknowledgement Number: 0 Header Length(MSB 4 bits): 10 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x02 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...0 .... PSH: .... 0... RST: .... .0.. SYN: .... ..1. FIN: .... ...0 Window: 5840 Checksum: 0xcbfa Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 4 SSH@lb2(debug-filter-1-1)#
Packet 4 captured at Mar 1 06:27:48 ; Packet size is 74(0x004a) bytes In port: 3/3 fpga optimized: No
System Header Fields Flags : dav, sav, dpv, sv, tagged Src port : 66 VLAN Id : 10 Offset : 16 Fid : 55017 Protocol : 3 Ethernet Version II Address: 6c9c.ed1a.a163 ---> 020c.db30.7bfe Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x00 Total length: 60 (Octets) Fragment ID: 0 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 55 seconds/hops IP protocol type: TCP (0x06) Checksum: 0xf24e IP address: AAA.BBB.CCC.18 ---> XXX.YYY.ZZZ.254 No option Transmission Control Protocol Port 80 ---> 63156 Sequence Number: 3161461361 Acknowledgement Number: 4225151687 Header Length(MSB 4 bits): 10 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x12 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...1 .... PSH: .... 0... RST: .... .0.. SYN: .... ..1. FIN: .... ...0 Window: 5792 Checksum: 0x21d8 Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 5 SSH@lb2(debug-filter-1-1)#
Packet 5 captured at Mar 1 06:27:48 ; Packet size is 74(0x004a) bytes Out port: 3/2 fpga optimized: No
System Header Fields Flags : us, dav, sav, dpv, sv, txa, sas, tagged Src port : 31 VLAN Id : 129 Offset : 16 Fid : 129 Protocol : 0 Ethernet Version II Address: 6c9c.ed1a.a163 ---> 0016.3e62.bd70 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x00 Total length: 60 (Octets) Fragment ID: 0 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 55 seconds/hops IP protocol type: TCP (0x06) Checksum: 0x8836 IP address: AAA.BBB.CCC.18 ---> 192.168.129.158 No option Transmission Control Protocol Port 80 ---> 41274 Sequence Number: 3161461361 Acknowledgement Number: 4225151687 Header Length(MSB 4 bits): 10 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x12 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...1 .... PSH: .... 0... RST: .... .0.. SYN: .... ..1. FIN: .... ...0 Window: 5792 Checksum: 0x0d3a Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 6 SSH@lb2(debug-filter-1-1)#
Packet 6 captured at Mar 1 06:27:48 ; Packet size is 66(0x0042) bytes In port: 3/1 fpga optimized: No
System Header Fields Flags : dav, sav, sv, tagged Src port : 64 VLAN Id : 129 Offset : 16 Fid : 55778 Protocol : 3 Ethernet Version II Address: 0016.3e62.bd70 ---> 02e0.5265.cd81 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x10 Total length: 52 (Octets) Fragment ID: 62528 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 64 seconds/hops IP protocol type: TCP (0x06) Checksum: 0x8aed IP address: 192.168.129.158 ---> AAA.BBB.CCC.18 No option Transmission Control Protocol Port 41274 ---> 80 Sequence Number: 4225151687 Acknowledgement Number: 3161461362 Header Length(MSB 4 bits): 8 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x10 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...1 .... PSH: .... 0... RST: .... .0.. SYN: .... ..0. FIN: .... ...0 Window: 46 Checksum: 0x5278 Urgent Pointer: 0x0000 SSH@lb2(debug-filter-1-1)#ascii 7 SSH@lb2(debug-filter-1-1)#
Packet 7 captured at Mar 1 06:27:48 ; Packet size is 66(0x0042) bytes Out port: 3/4 fpga optimized: No
System Header Fields Flags : us, dav, sav, sv, txa, sas, tagged Src port : 31 VLAN Id : 10 Offset : 16 Fid : 131 Protocol : 0 Ethernet Version II Address: 0016.3e62.bd70 ---> 6c9c.ed1a.a163 Ethernet II Protocol Type: IP Internet Protocol Version(MSB 4 bits): 4 Header length(LSB 4 bits): 5 (32-bit word) Service Type: 0x10 Total length: 52 (Octets) Fragment ID: 62528 Flags summary: 0x40 0... .... = Reserved .1.. .... = Do not fragment ..0. .... = Last fragment Fragment offset(LSB 13 bits): 0 (0x00) Time to live: 64 seconds/hops IP protocol type: TCP (0x06) Checksum: 0xf505 IP address: XXX.YYY.ZZZ.254 ---> AAA.BBB.CCC.18 No option Transmission Control Protocol Port 63156 ---> 80 Sequence Number: 4225151687 Acknowledgement Number: 3161461362 Header Length(MSB 4 bits): 8 (32-bit word) Reserved(LSB 4 bits): 0 Code: 0x10 RES: 0... .... CON: .0.. .... URG: ..0. .... ACK: ...1 .... PSH: .... 0... RST: .... .0.. SYN: .... ..0. FIN: .... ...0 Window: 46 Checksum: 0x6716 Urgent Pointer: 0x0000 |
and here is symbolic capture on the client and server side:
| client side |
|---|
05:24:54.435058 IP 192.168.129.158.41274 > AAA.BBB.CCC.18.80: S 4225151686:4225151686(0) win 5840 <mss 1460,sackOK,timestamp 608126712 0,nop,wscale 7> 05:24:57.431021 IP 192.168.129.158.41274 > AAA.BBB.CCC.18.80: S 4225151686:4225151686(0) win 5840 <mss 1460,sackOK,timestamp 608127462 0,nop,wscale 7> 05:24:57.433377 IP AAA.BBB.CCC.18.80 > 192.168.129.158.41274: S 3161461361:3161461361(0) ack 4225151687 win 5792 <mss 1460,sackOK,timestamp 2424392414 608127462,nop,wscale 7> 05:24:57.433396 IP 192.168.129.158.41274 > AAA.BBB.CCC.18.80: . ack 1 win 46 <nop,nop,timestamp 608127462 2424392414> |
| server side |
|---|
05:24:57.898305 IP XXX.YYY.ZZZ.254.63156 > AAA.BBB.CCC.18.80: S 4225151686:4225151686(0) win 5840 <mss 1460,sackOK,timestamp 608127462 0,nop,wscale 7> 05:24:57.898328 IP AAA.BBB.CCC.18.80 > XXX.YYY.ZZZ.254.63156: S 3161461361:3161461361(0) ack 4225151687 win 5792 <mss 1460,sackOK,timestamp 2424392414 608127462,nop,wscale 7> 05:24:57.900657 IP XXX.YYY.ZZZ.254.63156 > AAA.BBB.CCC.18.80: . ack 1 win 46 <nop,nop,timestamp 608127462 2424392414> |
Any more hints? Please note this is happening only for certain servers and not for others although the configurations are very same. Could this be some problem with link aggregation I am using or dynamic routing?
thanks, Steve.
Hi Steve,
Thanks for taking debug filter. The output of debug filter showed that you have no issue in arp/mac.
I'm not sure if this is related to lacp/dynamic routing.
Your lb2 is active and processing SLB/NAT. I'm wondering what will happen if you make lb1 as active.
If you want to do more, please consider contacting me directly. I want full configuration so that I setup exactly the same environment with yours.
I want following logs for lb1 and lb2.
show ip vrrp-e brie
show ip vrrp-e
show interface
show int brie
show span
show tech
rconsole 1 1 and then do "show running-config"
lb1# rconsole 1 1
lb11/1# show running-config
Thanks.
//Kono
Hi Kono,
The problem exists regardless which of the units is active - when I make lb1 active the delay is still there. There is however one interesting thing which I haven't had chance to fully explore but it seems the problem disappears when one of the units is offline - I just noticed when I was rebooting one of the loadbalancers.
I am sending all the reports and full configs to your email.
Thanks a lot for all your help,
Steve.
just for reference - this problem has been solved thanks to Kono who noticed and also reproduced in lab that the issue is due to a software version mismatch on one of the units.
thanks Kono for all your help!
Steve.