05-12-2009 02:46 PM
I moved to a layer 7 health check for one of my real servers - the server was up and running without a problem before doing a layer 4 (TCP) health check. Now using the layer 7 health check I am seeing the server toggling from failed to active to failed to active... every few seconds. What is happening here? The web server seems to be up and running and the configuration of the real server is very simple:
server port 7080
tcp keepalive protocol http
server real balu 172.16.16.17
port 7080 url "GET /health.html"
server virtual vip100 192.168.4.100
bind 80 balu 7080
05-12-2009 02:53 PM
Do you have the command "server no-fast-bringup" in your configuration? I have seen this problem a lot of times already. The reason is pretty simple:
The ServerIron/ADX is going to do the health check step by step:
Step 1: ARP
Step 2: ICMP (Echo request to real server IP)
Step 3: Layer 4 (TCP connection to port 7080)
Step 4: Layer 7 (TCP connection to port 7080 with GET request inside the TCP connection)
By default the ServerIron/ADX is going to declare the real server as up as soon as step 3 is successful. The Layer 7 check did not happen at this time. I assume the layer 7 health check is not successful which would lead to:
step 1: OK
step 2: OK
step 3: OK -> DECLARE SERVER AS ACTIVE
step 4: NOT OK -> DECLARE SERVER IS FAILED AGAIN and start from scratch
This would result in a real server going up and down all the time. Get a trace of the health check traffic at the real server or the ServerIron/ADX itself and see whether the web server is answering correctly to the GET request (HTTP 2xx OK response). A page not found or forbidden message or something else would result in a problem with the layer 7 health check. The command "server no-fast-bringup" command is telling the ServerIron/ADX to wait until ALL four steps were successful - the real server is not getting declared as up after step 3. The ServerIron needs to wait until step 4 is successful as well.
Hope this helps.