This is a subject near and dear to my heart and I have given presentations on the subject at Computer Measurement Group (CMG) conferences with more of a proactive performance management focus. The article I wrote for the current issue of Enterprise Executive magazine will expand on this topic in more detail, and give some recommendations for best practices in your storage network management efforts.
I work for a networking company, and have worked with storage networks for nearly 15 years. I will be the first to admit that at least in the realm of storage network management, management software has historically been far more reactive in nature. Recently there have been some significant efforts made to change this, based in large part on feedback received from end users.
Reactive vs Proactive?
When I say reactive and proactive storage network management, what do I mean?
Reactive management means that you do not know of a problem with your storage network (or any network for that matter) until after it has happened. At the most extreme cases, you are not aware until someone else, such as an application owner or DBA, has made you aware that they have a problem and they think the source of the problem lies in the storage network. You then have to go into troubleshooting and problem determination/resolution mode, often times under a great deal of stress. This is never a good situation to be in. Hopefully you are able to determine the root cause of the problem, and take corrective action to make certain it does not repeat.
Proactive management means that you have the capabilities in your storage network management toolbox to prevent the above scenario from happening. In its simplest form, you have taken advantage of the threshold monitoring and alert setting capabilities in your storage network management software to at a minimum, make certain you know of a problem happening before that DBA has to come to you and complain. Even better, you have taken advantage of your own personal experience, done some research, or perhaps attended an educational conference such as SHARE or CMG and learned some best practices so you can set anticipatory thresholds with accompanying alerts. For example, the most common component to fail in a storage network is the SFP for a port. If this is for an interswitch link (ISL) in a cascaded FICON architecture, this SFP failure would likely impact your synchronous DASD replication and therefore your application response times. Good indications exist to help predict the end of life of an SFP. One example is the SFP runs at higher temperatures. But you, in an effort to be more proactive read Dr. Steve’s column, know about this, set a threshold, monitored it, and were alerted about that ISL SFP running hot before the SFP failed. You scheduled a replacement of that SFP for your next maintenance window planned outage, and it never failed.
Things are only getting better
You can probably see, based on the above scenarios why it is good to have the proactive mindset, and tools. But to quote the old 1980s’ Howard Jones song “Things can only get better”.
Things do continue to get better with storage network management. Over the course of the past 18 months, we have seen dashboard functionality be added to storage network management tools such as Brocade Network Advisor. We also have seen the start of policy based management being introduced with the Fabric Operating System (FOS) Fabric Vision suite. Where is this leading? Is this potentially leading to intelligent analytics and self-healing storage networks? Your guess is as good as mine at this point, but that sounds like a pretty interesting idea to me.