vADC Forum

Reply
New Contributor
Posts: 4
Registered: ‎06-20-2017
Accepted Solution

pool won't reconnect to failed node after it recovers

Two of three nodes failed in a pool overnight with messages like:

Node 123.123.123.123 has failed - Timeout while establishing connection (the machine may be down, or the network congested; increasing the 'max_connect_time' on the pool's 'Connection Management' page may help)

 

When I came in this morning, I was able to verify the ip/port was available and functioning on both nodes, but the pool still had them marked as failed.  I stopped the virtual server and started it back up and the pool connected to all three nodes and has been happy for the last hour.  It seems like it just didn't try to reconnect to those nodes. 

 

I'm new to Stingray load balancers, having only worked with F5 in the past.  Is this normal?  Do I really need to manually intervene to fix this?  Am I missing some obvious config?

 

One attached screenshot is the health monitor config (listed in the catalog as Connect, I'm not sure if that is a standard monitor or was created by my predecessor).

 

The pool also has passive monitoring turned on.

 

The second screen shot shows the connection management settings:

 

Is something in those configs causing the LB to give up on my nodes after some outage and require me to manually intervene?

 

Brocadian
Posts: 17
Registered: ‎05-22-2015

Re: pool won't reconnect to failed node after it recovers

Hi Jasbro,

 

It looks like the pasive monitor did indeed fail your nodes. In which case the passive monitor needs to recover them, a working active monitor will not recover a node which was failed by the passive checks. The most common reasons for a node not recovering from a passive failure are:

 

 1. You don't have any traffic, or

 2. Your traffic is all non-idempotent (eg POSTS)

 

If you traffic is largely POSTs then I would suggest disabling passive monitoring, because without any idempotent requests the passive monitor will never be able to recover the failed nodes.

 

Cheers,

Mark

Highlighted
New Contributor
Posts: 4
Registered: ‎06-20-2017

Re: pool won't reconnect to failed node after it recovers

That makes sense. In this case it is a pool of SMTP servers in our warm-standby DR environment...so there very little traffic there (a few messages from a daily cron really). Sounds like I need to disable the passive monitoring.

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.