07-09-2014 10:53 AM
EDIT: The problem was related to my /etc/ldap.conf, see Post Scriptum.
I have 4 SteelApp servers A, B, C and D running in a cluster (recently switched to Unicast Heartbeat Method).
In order to do an important network change, I took B out of the cluster, changed its network settings (basically IP/VLAN changes) and then tried to put it back using zxtm/configure.
It failed to replicate the configuration from member A, but B was appearing in the Traffic Managers list (using UI running on A, C or D).
I had to remove B again from the cluster (otherwise it was complaining the cluster was already full: 4 nodes, when I tried to join) and I decided to install SteelApp again from scratch on B (zinstall).
This time it went worse: A decided to delete all its configuration (including the users..).
So now I have only C and D running correctly, A and B have no settings anymore.
I tried from scratch to have A join the cluster, at least this one didn't go under an IP address change.
No more luck: "Config Replication Error: replicate-config exited with code 1 and stderr: There was a problem communicating with machine A:9080. This may be because the software is not running. The error was: Write to A failed: Broken pipe."
Like if zxtm/configure was trying to replicate locally before the software is fully running?
I tried to join the cluster with a new name for A, just in case there were stale entries on C and D, no more luck.
Anyway, I can't join the cluster anymore. Any idea what could be the safest way to fix this mess?
PS Edit: Here is the output for a fresh started cluster. As you can see a daemon is segfaulting on startup. It was probably hidden by the zinstall program:
Initializing Riverbed Application Framework.
Copyright (C) 2014, Riverbed Technology, Inc. All rights reserved
Stingray Admin Server - (C) 2014, Riverbed Technology, Inc. All rights reserved.
Version 9.6r1, Build date: Apr 10 2014 10:22:20
INFO Stingray Admin Server started
INFO Version 9.6r1, Build date: Apr 10 2014 10:22:20
INFO Stingray Admin Server running
Stingray Traffic Manager - Copyright (C) 1995 - 2014, Riverbed Technology, Inc. All rights reserved.
Version 9.6r1, Build date: Apr 10 2014 10:15:55
INFO Control connection privileges unrestricted
zeus.zxtm: sbind.c:71: ldap_simple_bind: Assertion `( (ld)->ldc->ldc_options.ldo_valid == 0x2 )' failed.
Aborted (core dumped)
REST API disabled
After moving the UNIX user name running SteelApp to the nss_initgroups_ignoreusers list of the /etc/ldap.conf, SteelApp starts fine and I was able to put A and B back in the cluster!