07-18-2011 12:17 PM
I am having an issue where a server reboots, and I still see flows going to the server that failed its health check/keepalive. I have l4-check-only enabled for http, and port http keepalive configured on the real server. I see the server failed its keepalive by doing a sh server bind, but I still see a flow when I go into rcon 1 1 and issue sh session all 1 | inc <my IP>. I refresh, and browse around the site, and I still get timeouts. I know the default ummm "flow timeout" value is 2 minutes, where existing connections will still flow to a failed server but new connections will go to another active server in rotation. I figured that reset-on-port-fail on the virtual would take care of this by sending a reset to the client, but it doesnt seem to have.
no session persistence enabled, that is being kept in the DB.
Bad thing the real server is a VM so restarts only take about 6 seconds.
Sorry if some of this is a little vague and terminology is a little off new to this device but familiar with a few other LB venders.
I have configured the port-holddown-timeoute value to the same as I have configured a keepalives on the ports globally (total time for server to fail is 16 seconds) , but need to wait to see if the customer still sees the issue. I havent seen any yet except from the application.
07-19-2011 10:52 AM
It seems, so far, that configuring the port-holddown-timeout value to 16 seconds to match the time a real server would be taken out of rotation after failing its keepalive as helped alleviate the issue the customer was seeing. I think they are just running into application issues now, possibly with persistence they are keeping in the database.