vADC Docs

Ten Administration good practices

by crispin on ‎02-25-2013 03:51 AM (2,105 Views)

Consider...:

Configure an 'error_file' for each virtual server
Drain nodes before removing them from the configuration
Configure Administration Server certificate
Firewall off internal ports
Use different user names for different people
Integrate with your existing authentication systems
Take regular backups
Configure the Event Handling to send notifications of problems
Ensure that you are ready to cope with failures and traffic bursts
Ensure that your software is up-to-date

Configure an error_file for all HTTP Virtual Servers

When a request can't be served by a pool, the traffic manager can respond in several ways. Firstly, it will try the failpool; failing that, it will use the error_file setting from the virtual server. If you haven't configured an error file a default "Service Unavailable" message will be sent to the client. While this works, it isn't best for the image of your site, so it is recommended that an error_file be configured.

The is configured on the VS > Edit > Connection Management page; see also the article Sending custom error pages

Drain nodes before removing them from the configuration

When you are performing infrastructure maintenance which requires you to remove nodes from a pool, you should drain the node before removing it. This allows existing connections to complete, and if you are using session persistence it allows existing sessions to complete.

If you don't have session persistence you may only have to wait a minute or so for existing connections to complete; with session persistence turned on you may have to wait for an hour or so for clients to finish using their sessions. In both cases you can see whether there are any existing connections, and when the node was last used on the Activity > Draining Nodes page.

Configure administration server certificate

By default the administration server is configured with a self-signed SSL certificate. This is vulnerable to man-in-the-middle attacks by an attacker who can intercept and modify the network trafic between the administrator and the admin server. If you anticipate accessing the admin server over an insecure network, you should replace the self-signed certificate with one signed by a known Certificate Authority; this could be an external authority, or an internal corporate authority.  Alternatively, you could configure your browser to trust the self-signed certificate, and beware of situations where you are unexpectedly asked to confirm that the certificate is valid.

Firewall off Internal Ports

Stingray uses several ports for administration, discovery and intra-cluster communication.  Although all of the traffic is encrypted or signed, it is advisable to firewall these ports off.

The administration server is also generally accessible from all IP addresses. It is possible to restrict the IP addresses that can access the administration server. For example, you could limit access to your 10.100.0.0/16 corporate network, ensuring that users outside your network cannot access the administration server.

The administration server security settings can be changed from the System > Security page.

HTTPS: 9090 and 9070 are used for administation traffic (web, SOAP, REST)

HTTPS: 9080 is used for internal communications

Multicast and UDP: 9090 used for discovery and cluster health checks

Refer to the System -> Security tab in the user interface, and the 'Security' chapter in the Stingray Product Documentation

Use different usernames for different people

While it is convenient to have a shared "admin" username for administering the traffic manager, it is not good practice. If an administrator leaves you may have to change the password, impacting everyone who shares the user login. It also means that the audit log does not track the activites of individual admin users.

It is recommended that different people have different usernames. Additional users can be created on the System > Users > Local Users page.

Integrate with your existing authentication system

Even better than specifying different local usernames for different people is to integrate the administration server with your existing authentication infrastructure. This allows people to use the same password, and reduces chances that a system is forgotten about when an employee leaves your company.

You can delegate authentication to RADIUS, LDAP and TACACS+ systems. The authenticators are configured from the System > Users -> Authenticators pages.

Once you have integrated, it is possible to remove all local users, with the exception that at least one user must remain in the "admin" group (this need not be the user named "admin").

Take regular backups

The traffic manager configuration is a vital component in maintaining the operation of your site. You should ensure that backups are created regularly. You can take a backup through the administration server, or automatically using the CLI or SOAP functions.

You should also export backups and store them on another machine in case of catastrophic hardware failure.

Configure Event Handling to send notifications of problems

Stingray Traffic Manager includes a customizable alerting infrastructure. Using this functionality it is possible to let your system administrators know of problems that are occurring that are relevant to them.

It is recommended that at the very least the "Default Events" event type be used to send an email to your administrators. This event type contains all the events that are emitted when a critical failure occurs, and when things recover. If this isn't good enough, it is easy to copy the event type and customize it to just contain the relevant events for you.

Alerting is configured from the System > Alerting page.

Ensure your setup can cope with failures and traffic bursts

While the traffic manager performance scales well with the CPU used, care should be taken to ensure your setup can cope with failures and traffic bursts (such as the slashdot effect - see Detecting and Managing Abusive Referers ).

In particular, it is not good practice to be running an active-active cluster with both machines running at close to 100% CPU usage. If one of the machines fails, the other machine wouldn't be able to take over all the remaining traffic, and you would end up with dropped connections and an overloaded infrastructure.

Traffic bursts are harder to handle, but one option would be to use selective short-term caching to ensure that a sudden burst doesn't overwhelm your web server layer, an example of this is described here: Cache your website - just for one second?

Ensure your software is up to date

Last but by no means least, it is important to ensure that your software is up to date. Newer versions include security fixes and fixes to existing functionality, and so we recommend you use the latest version.

Notifications of released versions are sent to all supported customers and shared on the blog feed for the Stingray section of this site.