vADC Forum

Posts: 40
Registered: ‎03-26-2013

Impossible to rejoin an existing cluster

EDIT: The problem was related to my /etc/ldap.conf, see Post Scriptum.


I have 4 SteelApp servers A, B, C and D running in a cluster (recently switched to Unicast Heartbeat Method).

In order to do an important network change, I took B out of the cluster, changed its network settings (basically IP/VLAN changes) and then tried to put it back using zxtm/configure.

It failed to replicate the configuration from member A, but B was appearing in the Traffic Managers list (using UI running on A, C or D).

I had to remove B again from the cluster (otherwise it was complaining the cluster was already full: 4 nodes, when I tried to join) and I decided to install SteelApp again from scratch on B (zinstall).

This time it went worse: A decided to delete all its configuration (including the users..).

So now I have only C and D running correctly, A and B have no settings anymore.

I tried from scratch to have A join the cluster, at least this one didn't go under an IP address change.

No more luck: "Config Replication Error: replicate-config exited with code 1 and stderr: There was a problem communicating with machine A:9080. This may be because the software is not running. The error was: Write to A failed: Broken pipe."

Like if zxtm/configure was trying to replicate locally before the software is fully running?

I tried to join the cluster with a new name for A, just in case there were stale entries on C and D, no more luck.

Anyway, I can't join the cluster anymore. Any idea what could be the safest way to fix this mess?


PS Edit: Here is the output for a fresh started cluster. As you can see a daemon is segfaulting on startup. It was probably hidden by the zinstall program:

Initializing Riverbed Application Framework.

Copyright (C) 2014, Riverbed Technology, Inc. All rights reserved

Stingray Admin Server - (C) 2014, Riverbed Technology, Inc. All rights reserved.

Version 9.6r1, Build date: Apr 10 2014 10:22:20

INFO Stingray Admin Server started

INFO Version 9.6r1, Build date: Apr 10 2014 10:22:20

INFO https://node_A:9090

INFO Stingray Admin Server running

Stingray Traffic Manager - Copyright (C) 1995 - 2014, Riverbed Technology, Inc. All rights reserved.

Version 9.6r1, Build date: Apr 10 2014 10:15:55

INFO Control connection privileges unrestricted

zeus.zxtm: sbind.c:71: ldap_simple_bind: Assertion `( (ld)->ldc->ldc_options.ldo_valid == 0x2 )' failed.

Aborted (core dumped)

REST API disabled

After moving the UNIX user name running SteelApp to the nss_initgroups_ignoreusers list of the /etc/ldap.conf, SteelApp starts fine and I was able to put A and B back in the cluster!

Join the Community

Get quick and easy access to valuable resource designed to help you manage your Brocade Network.